Soundex is a function designed to return a Phonetic Key for terms that are passed. I will not dwell too much on its heritage only to say that early incarnations were developed to assist in the matching of Peoples Names. See the Useful links below for more information about Soundex.
Soundex is a very easy function to use, and is incorporated in many standard database applications such as MySQL, Microsoft SQL Server and Oracle although the implementations vary slightly so results will vary in different across systems.
Soundex is a fairly simple function to use and is available in several business database applications.
Here is an example of how you would use this function in Microsoft SQL Server.
Select Soundex(‘Stephen’), Soundex(‘Steven’), Soundex(‘Stefan’)
If you were to run the above example you will notice that in each instance the function returns: ‘S315’. You will see that in this example the function is actually quite helpful. and would enable you to match all of these records together.
But if you try the following example you will start to understand some of the reasons why Soundex is often derided.
Select Soundex(‘Fielder’), Soundex(‘feltraiger’)
You will notice that these both return: ‘F436’ not really helpful.
So whilst Soundex, like so many Fuzzy Logic algorithms / functions does help identify good matches and potential misspellings it will also generate lots of inaccurate match candidates.
One of the big problems with Soundex is that it focuses on only the first few characters of a string, take the example below where it has been used for matching Company Names.
Select Soundex(‘General Electric’), Soundex(‘General Motors’)
Select Soundex(‘British Airway’), Soundex(‘British Telecommunications’)
Select Soundex(‘The Goodyear Tyre Company’), Soundex(‘The Kodak Company’)
All of the pairs in the above example return the same results and would clearly be very wrong and could have serious implications if they were accepted blindly without reviewing the results.
A popular more advanced function is Metaphone, this function looks at more than just the first few characters of a string.