[2023 ACL] AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction

Yanzeng Li's paper titled "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction" has been accepted by ACL 2023.

Attribute extraction aims to identify attribute names and the corresponding values from descriptive texts, which is the foundation for extensive downstream applications such as knowledge graph construction, search engines, and e-Commerce. In previous studies, attribute extraction is generally treated as a classification problem for predicting attribute types or a sequence tagging problem for labeling attribute values, where two paradigms, i.e., closed-world and open-world assumption, are involved. However, both of them are limited in terms of real-world applications, and prior studies attempting to integrate these paradigms through ensemble, pipeline, and co-training different models, still face challenges like cascading errors, high computational overhead, and difficulty in training.To address these existing problems, this paper presents Attribute Tree, a unified formulation for real-world attribute extraction application, where closed-world, open-world, and semi-open attribute extraction tasks are modeled uniformly. Then a text-to-tree generation model, AtTGen, is proposed to learn annotations from different scenarios efficiently and consistently. Experiments demonstrate that our proposed paradigm well covers various scenarios for real-world applications. In particular, on the MEPAVE multimodal attribute extraction dataset, our model outperforms many large models and multimodal models, achieving over 96% accuracy without using multimodal information and with only 12M parameters.