2:0
levenshtein: Levenshtein Distance Metric
Link to this document with
@other-doc['(lib "levenshtein/levenshtein.scrbl")]
Link to this document with
@other-doc['(lib "levenshtein/levenshtein.scrbl")]
1 Introduction
Link to this section with
@secref["Introduction" #:doc '(lib "levenshtein/levenshtein.scrbl")]
Link to this section with
@secref["Introduction" #:doc '(lib "levenshtein/levenshtein.scrbl")]
This package provides a Racket implementation of the Levenshtein Distance algorithm, which is an edit distance metric of string similarity, due to Vladimir Levenshtein. This
package’s implementation started as a transliteration of Jorge Mas Trullenque’s
space-efficient Perl implementation, to R5RS Scheme.
The Levenshtein Distance is a function of two strings that
represents a count of single-character insertions, deletions,and substitions
that will change the first string to the second. More information is available
in
NIST DADS and the Michael Gilleland article, “
Levenshtein Distance in Three Flavors.”
This implementation supports heterogeneous combinations of Scheme
types (strings, lists, vectors), user-supplied predicate functions, and
optionally reusable scratch vectors. Because this is an old Scheme library,
the API isn’t entirely idiomatic for modern Racket.
2 Basic Comparisons
Link to this section with
@secref["Basic_Comparisons" #:doc '(lib "levenshtein/levenshtein.scrbl")]
Link to this section with
@secref["Basic_Comparisons" #:doc '(lib "levenshtein/levenshtein.scrbl")]
In the current implementation, all comparisons are done internally
via vectors.
(vector-levenshtein/predicate/get-scratch | | a | | | | b | | | | pred | | | | get-scratch) | |
|
→ exact-nonnegative-integer? |
a : vector? |
b : vector? |
pred : (-> any/c any/c boolean?) |
get-scratch : (-> exact-positive-integer? vector?) |
Few, if any, programs will use this procedure directly. This is like vector-levenshtein/predicate, but allows get-scratch to be specified. get-scratch is a procedure of one term, n, that yields a vector of length n or greater, which is used for record-keeping during execution of
the Levenshtein algorithm. make-vector can be used for get-scratch, although some programs comparing a large size or quantity of
vectors may wish to reuse a record-keeping vector, rather than each time
allocating a new one that will need to be garbage-collected.
(vector-levenshtein/predicate a b pred)
|
→ exact-nonnegative-integer? |
a : vector? |
b : vector? |
pred : (-> any/c any/c boolean?) |
(vector-levenshtein/eq a b) → exact-nonnegative-integer? |
a : vector? |
b : vector? |
(vector-levenshtein/eqv a b) → exact-nonnegative-integer? |
a : vector? |
b : vector? |
(vector-levenshtein/equal a b) → exact-nonnegative-integer? |
a : vector? |
b : vector? |
(vector-levenshtein a b) → exact-nonnegative-integer? |
a : vector? |
b : vector? |
Calculate the Levenshtein Distance of vectors a and b. pred is the predicate procedure for determining if two elements are equal. The /eq, /eqv, and /equal variants correspond to the standard equivalence predicates, eq?, eqv?, and equal?. vector-levenshtein is an alias for vector-levenshtein/equal.
> (vector-levenshtein '#(6 6 6) '#(6 35 6 24 6 32)) |
3 |
(list-levenshtein/predicate a b pred)
|
→ exact-nonnegative-integer? |
a : list? |
b : list? |
pred : (-> any/c any/c boolean?) |
(list-levenshtein/eq a b) → exact-nonnegative-integer? |
a : list? |
b : list? |
(list-levenshtein/eqv a b) → exact-nonnegative-integer? |
a : list? |
b : list? |
(list-levenshtein/equal a b) → exact-nonnegative-integer? |
a : list? |
b : list? |
(list-levenshtein a b) → exact-nonnegative-integer? |
a : list? |
b : list? |
Calculate the Levenshtein Distance of lists a and b. pred is the predicate procedure for determining if two elements are equal. The /eq, /eqv, and /equal variants correspond to the standard equivalence predicates, eq?, eqv?, and equal?. list-levenshtein is an alias for list-levenshtein/equal. Note that comparison of lists is less efficient than comparison of vectors.
> (list-levenshtein/eq '(b c e x f y) '(a b c d e f)) |
4 |
(string-levenshtein a b) → exact-nonnegative-integer?
|
a : string? |
b : string? |
Calculate the Levenshtein Distance of strings a and b.
> (string-levenshtein "adresse" "address") |
2 |
3 Type-Coercing Comparisons
Link to this section with
@secref["Type-Coercing_Comparisons"
#:doc '(lib "levenshtein/levenshtein.scrbl")]
Link to this section with
@secref["Type-Coercing_Comparisons"
#:doc '(lib "levenshtein/levenshtein.scrbl")]
Procedures levenshtein and levenshtein/predicate provide a convenient interface for comparing a combination of vectors, lists, and strings, the types of which might not be known until runtime.
(levenshtein/predicate a b pred) → exact-nonnegative-integer?
|
a : (or/c vector? string? list?) |
b : (or/c vector? string? list?) |
pred : (-> any/c any/c boolean?) |
Calculates the Levenshtein Distance of two objects a and b,which are vectors, lists, or strings. a and b need not be of the same type. pred is the element equivalence predicate used.
> (levenshtein/predicate '#(#\A #\B #\C #\D) "aBXcD" char-ci=?) |
1 |
(levenshtein a b) → exact-nonnegative-integer?
|
a : (or/c vector? string? list?) |
b : (or/c vector? string? list?) |
Calculate the levenshtein distance of a and b, in a similar manner as using levenshtein/predicate with equal? as the predicate.
> (define g '#(#\g #\u #\m #\b #\o))
> (levenshtein g "gambol") |
2 |
> (levenshtein g "dumbo") |
1 |
> (levenshtein g "umbrage") |
5 |
4 History
Link to this section with
@secref["History" #:doc '(lib "levenshtein/levenshtein.scrbl")]
Link to this section with
@secref["History" #:doc '(lib "levenshtein/levenshtein.scrbl")]
Version 2:0 — 2016-02-28
Moved from PLaneT to new package system.
Converted to McFly, including quickly guessing at contracts.
Fixed two dead URLs, urgently.
Version 1:3 — 2009-03-14
Version 1:2 — 2009-02-24
Version 1:1 — 2005-07-10
Version 1:0 — 2005-07-09
Version 0.2 — 2004-07-06
Version 0.1 — 2004-05-13
5 Legal
Link to this section with
@secref["Legal" #:doc '(lib "levenshtein/levenshtein.scrbl")]
Link to this section with
@secref["Legal" #:doc '(lib "levenshtein/levenshtein.scrbl")]
Copyright 2004, 2005, 2009, 2016 Neil Van Dyke. This program is Free
Software; you can redistribute it and/or modify it under the terms of the GNU
Lesser General Public License as published by the Free Software Foundation;
either version 3 of the License, or (at your option) any later version. This
program is distributed in the hope that it will be useful, but without any
warranty; without even the implied warranty of merchantability or fitness for a
particular purpose. See http://www.gnu.org/licenses/ for details. For other
licenses and consulting, please contact the author.