SoundEx Algorithm
From Xojo Documentation
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It can be a useful way to store names for searching so that similar names can be found. As defined by Wikipedia:
The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms (in part because it is a standard feature of popular database software, although not included with SQLiteDatabase).
SoundEx has these rules:
- Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.
- Replace consonants with digits as follows (after the first letter):
- b, f, p, v → 1
- c, g, j, k, q, s, x, z → 2
- d, t → 3
- l → 4
- m, n → 5
- r → 6
- If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
- If you have too few letters in your word that you can't assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers.
This is a Xojo SoundEx function:
Public Function SoundEx(word As Text) As Text
Const kLength As Integer = 4
Dim value As Text
Dim size As Integer = word.Length
// Make sure the word is at least two characters in length
If (size > 1) Then
word = word.Uppercase
// Convert the word to a character array for faster processing
Dim chars() As Text = word.Split
// For storing the SoundEx character codes
Dim code() As Text
// The current and previous character codes
Dim prevCode As Integer = 0
Dim currCode As Integer = 0
// Add the first character
code.Append(chars(0))
Dim loopLimit As Integer = size - 1
// Loop through all the characters and convert them to the proper character code
For i As Integer = 0 To loopLimit
Select Case chars(i)
Case "H", "W"
currCode = -1
Case "A", "E", "I", "O", "U", "Y"
currCode = 0
Case "B", "F", "P", "V"
currCode = 1
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
currCode = 2
Case "D", "T"
currCode = 3
Case "L"
currCode = 4
Case "M", "N"
currCode = 5
Case "R"
currCode = 6
End Select
If i > 0 Then
// two letters With the same number separated by 'h' or 'w' are coded as a single number
If currCode = -1 Then currCode = prevCode
// Check to see if the current code is the same as the last one
If currCode <> prevCode Then
// Check to see if the current code is 0 (a vowel); do not proceed
If currCode <> 0 Then
code.Append(currCode.ToText)
End If
End If
End If
prevCode = currCode
// If the buffer size meets the length limit, then exit the loop
If (code.Ubound = kLength - 1) Then
Exit For
End If
Next
// Pad the code if required
size = code.Ubound + 1
For j As Integer = size To kLength - 1
code.Append("0")
Next
// Set the return value
value = Text.Join(code, "")
End If
// Return the computed soundex
Return value
End Function
Const kLength As Integer = 4
Dim value As Text
Dim size As Integer = word.Length
// Make sure the word is at least two characters in length
If (size > 1) Then
word = word.Uppercase
// Convert the word to a character array for faster processing
Dim chars() As Text = word.Split
// For storing the SoundEx character codes
Dim code() As Text
// The current and previous character codes
Dim prevCode As Integer = 0
Dim currCode As Integer = 0
// Add the first character
code.Append(chars(0))
Dim loopLimit As Integer = size - 1
// Loop through all the characters and convert them to the proper character code
For i As Integer = 0 To loopLimit
Select Case chars(i)
Case "H", "W"
currCode = -1
Case "A", "E", "I", "O", "U", "Y"
currCode = 0
Case "B", "F", "P", "V"
currCode = 1
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
currCode = 2
Case "D", "T"
currCode = 3
Case "L"
currCode = 4
Case "M", "N"
currCode = 5
Case "R"
currCode = 6
End Select
If i > 0 Then
// two letters With the same number separated by 'h' or 'w' are coded as a single number
If currCode = -1 Then currCode = prevCode
// Check to see if the current code is the same as the last one
If currCode <> prevCode Then
// Check to see if the current code is 0 (a vowel); do not proceed
If currCode <> 0 Then
code.Append(currCode.ToText)
End If
End If
End If
prevCode = currCode
// If the buffer size meets the length limit, then exit the loop
If (code.Ubound = kLength - 1) Then
Exit For
End If
Next
// Pad the code if required
size = code.Ubound + 1
For j As Integer = size To kLength - 1
code.Append("0")
Next
// Set the return value
value = Text.Join(code, "")
End If
// Return the computed soundex
Return value
End Function
You call the SoundEx function like this:
Dim result As Text
result = SoundEx("Robert") // R163
result = SoundEx("Rupert") // R163
result = SoundEx("Rubin") // R150
result = SoundEx("Ashcraft") // A261
result = SoundEx("Ashcroft") // A261
result = SoundEx("Tymczak") // T522
result = SoundEx("Pfister") // P236
result = SoundEx("Robert") // R163
result = SoundEx("Rupert") // R163
result = SoundEx("Rubin") // R150
result = SoundEx("Ashcraft") // A261
result = SoundEx("Ashcroft") // A261
result = SoundEx("Tymczak") // T522
result = SoundEx("Pfister") // P236