What is Fuzzy Matching?

Simply put Fuzzy Matching is the implementation of algorithmic processes (fuzzy logic) to determine similarity between elements of Data such as Business Names, People Names or Address Information.

Fuzzy Logic is used to predict the probability of non-exact matching data to help in Data Cleansing, De-duplication or Matching of disparate Data-sets.

To illustrate how Fuzzy Matching works let’s have a quick look at one of the most well known approaches namely Soundex. Soundex was originally developed to help match Surnames and was designed to overcome the problem of spelling differences or inaccuracies and is classified as a Phonetic Algorithm aimed at matching based upon pronunciation.

For example the Surname Smith could be written as “Smith”, “Smithe” or even “Smythe” another example might be “Johnson”, “Jonson”, “Jonsen”. When the Soundex Function is used to compare these Surnames it would predict that “Smith” could be the same as “Smithe” and that “Johnson” is likely the same as “Jonsen”

Soundex is one of the simplist and most often derided algorithms for predicting phonetically matching data, but it was undoubtedly the forerunner of the much more sophisticated algorithms in use today.

Why do we need Fuzzy Data Matching?

Fuzzy Matching is an essential technology in enabling us to match non-exact data.

With the proliferation of data today; managing it is becoming more and more complex. Businesses need to ensure their data is accurate, clean and that it complies with the ever increasing legislation. And to do so they need to ensure duplicates are identified and removed, and that newly purchased data is integrated without compromising existing data quality.

With a multitude of disparate internal business systems collecting and storing information, its often important that they share information and to do so the data needs to be matched together.

By integrating their data Companies collaborating on marketing activities can ensure they are not wasting money duplicating efforts, and avoid upsetting potential customers with multiple versions of the same promotional material.