UserGuide

SoundEx Algorithm

From Xojo Documentation

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It can be a useful way to store names for searching so that similar names can be found. As defined by Wikipedia:

The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms (in part because it is a standard feature of popular database software, although not included with SQLiteDatabase).

SoundEx has these rules:

  1. Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.
  2. Replace consonants with digits as follows (after the first letter):
    • b, f, p, v → 1
    • c, g, j, k, q, s, x, z → 2
    • d, t → 3
    • l → 4
    • m, n → 5
    • r → 6
  3. If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
  4. If you have too few letters in your word that you can't assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers.

This is a Xojo SoundEx function:

Public Function SoundEx(word As Text) As Text
Const kLength As Integer = 4

Dim value As Text

Dim size As Integer = word.Length

// Make sure the word is at least two characters in length
If (size > 1) Then
word = word.Uppercase

// Convert the word to a character array for faster processing
Dim chars() As Text = word.Split

// For storing the SoundEx character codes
Dim code() As Text

// The current and previous character codes
Dim prevCode As Integer = 0
Dim currCode As Integer = 0

// Add the first character
code.Append(chars(0))

Dim loopLimit As Integer = size - 1
// Loop through all the characters and convert them to the proper character code
For i As Integer = 0 To loopLimit
Select Case chars(i)
Case "H", "W"
currCode = -1
Case "A", "E", "I", "O", "U", "Y"
currCode = 0
Case "B", "F", "P", "V"
currCode = 1
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
currCode = 2
Case "D", "T"
currCode = 3
Case "L"
currCode = 4
Case "M", "N"
currCode = 5
Case "R"
currCode = 6
End Select

If i > 0 Then
// two letters With the same number separated by 'h' or 'w' are coded as a single number
If currCode = -1 Then currCode = prevCode

// Check to see if the current code is the same as the last one
If currCode <> prevCode Then
// Check to see if the current code is 0 (a vowel); do not proceed
If currCode <> 0 Then
code.Append(currCode.ToText)
End If
End If
End If

prevCode = currCode

// If the buffer size meets the length limit, then exit the loop
If (code.Ubound = kLength - 1) Then
Exit For
End If
Next

// Pad the code if required
size = code.Ubound + 1
For j As Integer = size To kLength - 1
code.Append("0")
Next

// Set the return value
value = Text.Join(code, "")
End If

// Return the computed soundex
Return value
End Function

You call the SoundEx function like this:

Dim result As Text
result = SoundEx("Robert") // R163
result = SoundEx("Rupert") // R163
result = SoundEx("Rubin") // R150
result = SoundEx("Ashcraft") // A261
result = SoundEx("Ashcroft") // A261
result = SoundEx("Tymczak") // T522
result = SoundEx("Pfister") // P236

See Also

UserGuide:SQLite Full Text Searching topic