[2020 AAAI] The value of paraphrase for knowledge base predicates

  Bingcong Xue’s paper “The Value of Paraphrase for Knowledge Base Predicates” has been accepted AAAI 2020.

  Paraphrase has proven useful for many natural language processing applications and collecting paraphrase for predicates in knowledge base is the key to comprehend the RDF triples in KBs. Existing works such as WordNet, Patty and PPDB have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don’t cover enough predicates.

  In this paper, we give a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. We build our own crowdsourcing platform and have delicate designs on quality, cost and latency, which does much help to our work. We finally get a dictionary composed of 2284 distinct predicates in DBpedia and more than 30,000 paraphrase pairs in total. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.