2022/05/09

ParaNames: A Massively Multilingual Entity Name Corpus

This preprint describes work in progress on ParaNames, a multilingual parallel name resource consisting of names for approximately 14 million entities. The included names span over 400 languages, and almost all entities are mapped to standardized entity types. Using Wikidata as a source, this is the largest resource of this type to-date. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking.