Firstly just to confirm that as with all data matching projects, it’s the matching of no-exact data that presents the challenges.
So lets look at how a Company Name is defined;
Normally a company name is constructed of [Business Name] + [Legal Definition] i.e. “GlaxoSmithKline PLC”
So one issue we have to address at the start is do we accept matches where the legal entity definition is different.
i.e. “GlaxoSmithKline PLC” = “GlaxoSmithKline Corp” or “GlaxoSmithKline AG”
You have to consider what your going to do with the results to determine if you will accept differences in legal definition, if your working for a public sector body responsible for tax levies for example then legal definition may be very important, but for general Sales and Marketing data its typically of less importance. see post Is legal entity important when matching company names for more on this topic.
Now lets think about the Business Name element, there are several unique issues that can stop some Matching engines from determining Matches including
- Erroneous Words
- Multiple Company Names
Lets have a look at each of these in more detail.
Business Name Acronyms & Abbreviations
Its quite common for Companies to use Acronyms / Abbreviations of their names, and there are many, many examples of well know companies that do this.
These include: “BP”, “GM”, “GE”, “HP”, “P&G”, “D&B”, “BT” etc..
As well as the well known acronyms you also have abbreviations which people use rather than typing out the whole company name, these can differ from the branding acronyms that the companies themselves may use. This is particularly true for public sector entities.
For example people refer to ‘Dell Computer Corporation’ as simply ‘Dell’ and subsidiaries like ‘Barclays Stock Brokers’ may be condensed to just ‘Barclays’. This presents much more of a challenge.
We also have occurrence of nicknames being used for large companies in the United States for example we have ‘Federal National Mortgage Association’ which is more commonly known as ‘Fannie Mae’.
Business Names sometimes get wrapped up in descriptions, departmental or subsidiary information that makes the matching process much more difficult.
For example “The Procter and Gamble Company” where “The” and “Company” add little value from a matching perspective as the company name is often written as just “Procter and Gamble” other examples of these erroneous words include “Group”, “Grupo”, “Holdings” and other foreign language variations.
Also you have to consider things like “Personnel Dept – Hewlett Packard”, this would typically be difficult to match against “Hewlett Packard”.
This presents a difficult problem to overcome as you normally find this kind of issue with large companies and these are often some of the most important records to match. They are also the most obvious omissions for others who are assessing the quality of the results of your project, undermining your credibility.
Another good example of erroneous words is location or geography where its not uncommon to find instances like “Microsoft European Headquarters” or “Pirelli – France”
Multiple Company Names
It”s not uncommon for large companies and public sector organisations to trade under multiple names or acronyms, and you may find that some of your data sources include multiple fields for company name.
As an example take “Burger King” a very well known brand name, but did you know that these are often franchised businesses, and therefore they trade under the “Burger King” name but the actual legal business name could be something completely different, sometimes you will find that one company operates many franchises.
Some companies are now known almost exclusively by there acronym, but in the past the full company name was used; for example “LG” which started out as “Lucky Goldstar”.
Sometimes companies change there names as the result of Mergers & Acquisitions or just as part of a re-branding exercise, for instance in the UK “Royal Mail” re-branded itself as “Consignia”, a monumental mistake and since reversed, but a good example all the same.
In some instances it can get very confusing when companies are split up and sold with elements of the business being sold to different suitors who retain the original business name.
Another consideration is joint ventures such as “Nokia Siemens” or “Sony Ericsson” which are new businesses in the own right, but often operate from the same addresses as one or both of the founding members.
So using fuzzy logic could well result in false matches being suggested.
Most Phonetic / Key based matching solutions work by building a matching key for the entire string and matching datasets using this information, whilst this may be quite successful for some simpler types of data such as Firstnames or Surnames, as we have seen in the examples above looking at the entire string is a flawed approach for matching Company Names.
The key to successfully matching Company Names to build and maintain a knowledge base of Acronyms, Abbreviations, Keywords for Large Corporate Businesses and to leverage this information within your matching logic. If you are working on international data then be careful as the same acronym can be used by different companies to mean different things in different countries.
Use as much corroborative information as possible such as address, telephone numbers, website urls etc… to help locate more potential matches, and ensure that any knowledge discovered when reviewing these matches is captured and inherited within your matching processes.
Note that there is always a trade off between quality and quantity, in order to find as many matches as possible you will need to review more potential matches (Match Candidates), therefore investing in a good visual engine for validating these matches is highly recommended.