Monthly Archives: September 2017

Phonetic Matching

Phonetic Matching is the process of matching data using Algorithms / functions that have been created focusing on how a word is pronounced rather than how it is spelt. Most of the popular phonetic matching algorithms are designed to work with the English language and do not perform as well with foreign languages. The other… Read More »

Data Transformation Logic

Data Transformation is an important aspect of any Data Matching / Integration project and should not be overlooked even when using fuzzy logic. You may have already read the sections on Phonetic & String Comparison algorithms to have seen that fuzzy logic whilst undoubtedly clever is not foolproof and the more you can improve the… Read More »

String Comparison Logic

String Comparison Logic is used to analyse strings and determine how different these two strings are and what changes would be required to convert 1 string into the other. Two of the most popular string comparison functions are: Jaro-Winkler Levenshtein Distance These functions can be helpful to identify typing errors where letters are typed in… Read More »

The challenges Matching Business entity / Company names

Firstly just to confirm that as with all data matching projects, it’s the matching of no-exact data that presents the challenges. So lets look at how a Company Name is defined; Normally a company name is constructed of [Business Name] + [Legal Definition] i.e. “GlaxoSmithKline PLC” So one issue we have to address at the… Read More »

Data Standardization

A critical step to ensure easy data integration across the enterprise is adopting a solid Data Standardization approach. This will ensure that your data is easier to match and integrate, regardless of the source systems. Even with today’s advanced Fuzzy Data Matching technologies it is still much easier and more reliable when the data is… Read More »

Metaphone

Metaphone is a much more modern phoentic algorithm for matching words based upon there English pronunciation, and is a significant advancement on Soundex. There has since been several iterations to the Metaphone Algorithm including Double Metaphone and MetaPhone 3. Again as with the Soundex example I will focus on real examples of using the Metaphone… Read More »

Levenshtein Distance

The Levenshtein Distance algorithm measures the amount of changes required to transform one string into another, essentially comparing 2 strings and returning a number representative of the number of characters needed to make both strings equal. For instance if you implemented a function Levenshtein(“American”, “American”) you would get the return value = 1. Useful Links Wikipedia on Levenshtein… Read More »

Soundex

Soundex is a function designed to return a Phonetic Key for terms that are passed. I will not dwell too much on its heritage only to say that early incarnations were developed to assist in the matching of Peoples Names. See the Useful links below for more information about Soundex. Soundex is a very easy… Read More »