How a Fortune 500 enterprise established clean golden records for 11 million customers from a fragmented post-merger data estate.
When two large global enterprises merged, they combined not just their workforces and products — but their customer databases. The result was a data estate of 1.7 billion address entries drawn from multiple legacy CRM and ERP systems, spanning 150+ countries, written in dozens of languages, and formatted inconsistently across every region.
The true customer count was unknown. The same customer might appear dozens of times across systems — under different name formats, address conventions, and local language spellings. Some duplicate records differed by a single character. Others were separated by translation, abbreviation, or regional formatting convention. Standard MDM tooling had been evaluated and found insufficient for the scale and multilingual complexity involved.
Without a reliable single-customer view, cross-sell programmes were misdirected, account managers worked from conflicting records, and global analytics were built on a foundation that nobody fully trusted. The merger's commercial benefits could not be realised until the customer data was clean.
Standard MDM solutions could not handle 1.7 billion multilingual, multi-format records at acceptable accuracy. A custom AI approach was required.
Cross-sell analytics built on duplicate records. Conflicting account ownership. Global marketing campaigns hitting the same customer multiple times under different identities.
We designed a back-office AI orchestration layer that coordinated outputs from multiple specialised pipelines — rather than applying a single model to an inherently multi-dimensional problem. Each agent addressed a specific dimension of the deduplication challenge.
Transformer-based models were trained to match records across language boundaries, address format variations, and regional naming conventions — learning the subtle patterns that distinguish a genuine duplicate from a different entity with a similar name.
NLP translation pipelines normalised multilingual records to a common reference form. Geolocation services resolved address ambiguity. External data validation confirmed or disambiguated entity identity where internal signals alone were insufficient.
A scalable Golden ID approach linked every record belonging to the same real-world entity — persisting across future data updates and providing a stable identifier for use across CRM, analytics, and marketing systems globally.
Every match decision carried a confidence score and a full audit trail — enabling leadership and compliance functions to review, override, and understand the system's reasoning. Transparency was a design requirement from the outset, not an afterthought.
The Golden ID layer became the foundation for the client's global commercial intelligence programmes. Cross-sell analytics, account-based marketing, and global sales intelligence — all of which had been built on unreliable data — could now be reconstructed on a trusted customer identity that the entire organisation shared.
With a reliable single-customer view, the client's cross-sell analytics and account-based marketing programmes could be rebuilt on data they trusted — enabling the commercial benefits the merger had promised.
Off-the-shelf MDM solutions were evaluated and rejected — the multilingual, multi-format scale of the problem required a bespoke AI approach. This engagement demonstrated that the right architecture can succeed where standard tooling cannot.
Full audit trails and confidence scoring were built into every match decision — giving compliance and legal functions the transparency they required, and leadership the confidence to act on the results.
We can assess your data estate and show you what a clean customer identity would unlock for your business.
Assess my data foundation