Implementing AI to further scale and accelerate WorldCat de-duplication
The OCLC Metadata Quality team continuously enhances WorldCat data through manual and automated processes, ensuring its accuracy for global libraries. By integrating AI and human expertise, they refine metadata, improve duplicate detection, and enhance resource discovery.
In August 2023, OCLC launched a machine learning model to detect duplicate bibliographic records, incorporating feedback from 300+ catalogers on 34,000 records, leading to the removal of 5.4 million duplicates.
Now, AI-driven de-duplication expands to all formats, languages, and scripts. A test run in February 2025 merged 500,000 duplicate English print records, with broader cleanup efforts to follow. Libraries should enable WorldCat updates for seamless integration.