Skip to content

Commit 31ca958

Browse files
committed
FEAT: including original Allen Kamp's Soundex script
1 parent 197e63b commit 31ca958

File tree

1 file changed

+77
-0
lines changed

1 file changed

+77
-0
lines changed

src/modules/soundex.reb

+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
REBOL [
2+
Title: "Soundex"
3+
Date: 17-Jul-1999
4+
File: %soundex.r
5+
Author: "Allen Kamp"
6+
Purpose: {Soundex Encoding returns similar codes for similar sounding words or names. eg Stephens, Stevens are both S315, Smith and Smythe are both S53. Useful for adding Sounds-like searching to databases}
7+
Comment: {
8+
This simple Soundex returns a code that is up to 4 characters
9+
long, the /integer refinement will return an integer code
10+
value instead. An example for searching a simple phone number
11+
database, with Soundex is included. For improved search
12+
speed, you could store the soundex codes in the database.
13+
14+
This is the basic algorithm (There are a number of different
15+
one floating around)
16+
17+
1. Remove vowels, H, W and Y
18+
2. Encode each char with its code value
19+
3. Remove adjacent duplicate numbers
20+
21+
4. Return First letter, followed by the next 3 letter's code
22+
numbers, if they exist.
23+
24+
Others I will implement soon include, Extended Soundex,
25+
Metaphone and the LC Cutter table
26+
}
27+
Language: "English"
28+
Email: %allenk--powerup--com--au
29+
library: [
30+
level: 'intermediate
31+
platform: 'all
32+
type: 'tool
33+
domain: [DB text text-processing]
34+
tested-under: none
35+
support: none
36+
license: none
37+
see-also: none
38+
]
39+
Version: 1.0.0
40+
]
41+
42+
soundex: func[
43+
{Returns the Census Soundex Code for the given string}
44+
string [any-string!] "String to Encode"
45+
/local code val letter
46+
][
47+
48+
code: make string! ""
49+
50+
; Create Rules
51+
set1: [["B" | "F" | "P" | "V"](val: "1")]
52+
set2: [["C" | "G" | "J" | "K" | "Q" | "S" | "X" | "Z"](val: "2")]
53+
set3: [["D" | "T"](val: "3")]
54+
set4: [["L"](val: "4")]
55+
set5: [["M" | "N"] (val: "5")]
56+
set6: [["R"](val: "6")]
57+
; Append val to code if not a duplicate of previous code val
58+
soundex-match: [[set1 | set2 | set3 | set4 | set5 | set6 ]
59+
(if val <> back tail code [append code val]) ]
60+
61+
; If letter not a matched letter its val is 0, but we only care
62+
; about it if it is the first letter.
63+
soundex-no-match: [(if (length? code) = 0 [append code "0"])]
64+
65+
either all [string? string string <> ""] [
66+
string: uppercase trim copy string
67+
68+
foreach letter string [
69+
parse to-string letter [soundex-match | soundex-no-match]
70+
if (length? code) = 4 [break] ;maximum length for code is 4
71+
]
72+
] [
73+
return string ; return unchanged
74+
]
75+
change code first string ; replace first number with first letter
76+
return code
77+
]

0 commit comments

Comments
 (0)