10.2 Search Implementation
(require (submod ricoeur/term-search/backend/common private)) |
This section documents the common utilities used to implement ricoeur/term-search’s search feature (which is used through functions like term-search), including everything necessary to implement new kinds of search backends.
procedure
→ exact-positive-integer? doc : tei-document?
The result of tei-document->excerpt-max-allow-chars is cached to amortize the cost of calling it multiple times on the same TEI document.
|
method
(send a-searchable-document-set do-term-search norm-term #:ricoeur-only? ricoeur-only? #:languages languages #:book/article book/article #:exact? exact?) → (instance-set/c document-search-results?) norm-term : normalized-term? ricoeur-only? : any/c languages : (set/c language-symbol/c #:cmp 'eq #:kind 'immutable) book/article : (or/c 'any 'book 'article) exact? : any/c The method used to implement searchable-document-set-do-term-search.There are a few notable differences between do-term-search and all of the higher-level search functions or methods:
All of the keyword arguments are mandatory. As there are several different classes that implement searchable-document-set<%>, copying the default values correctly to every definition would be unpleasant and error-prone.
The search term is passed as a normalized term value, rather than a string satisfying term/c. This prevents do-term-search from being called except by searchable-document-set-do-term-search, which allows the implementation of searchable-document-set-do-term-search to rely on the fact that it will be able to interpose on calls. In fact, the implementation does do some normalization when constructing a normalized term value, and it can guarantee that it will always have the chance to do so.
The languages argument is normalized: rather than being passed as a search-languages/c value, which is designed for the convienience of clients, it is given as an immutable set of language-symbol/c symbols. This allows searchable-document-set-do-term-search to take sole responsibility for handling 'any and lists with duplicate symbols, rather than placing that burden on every class that implements searchable-document-set<%>.
procedure
(normalized-term? v) → any/c
v : any/c
procedure
(normalized-term-string norm-term)
→ (and/c term/c trimmed-string-px) norm-term : normalized-term?
procedure
(pregexp-quote-normalized-term norm-term #:exact? exact?) → string-immutable/c norm-term : normalized-term? exact? : any/c
The function pregexp-quote-normalized-term produces a string suitable to be passed to pregexp to construct a regular expression recognizing the encapsulated term. (Some backend implementations combine the resulting string with additional regular expression syntax.) When exact? is non-false, the resulting string will produce a regular expression that will match only exact occurances of the term delimited by a word boundry. (The precise definition of a word boundry is unspecified and specific to pregexp-quote-normalized-term.)
Because the constructor for normalized term values is not exported, the wrapper can serve as a guarantee of some invariants: for example, that the argument to pregexp-quote-normalized-term will always have been normalized. This is particularly important as certain properties of search strings can have security implications, especially with less sophisticated backends.
10.2.1 Constructing Search Results
procedure
(segment-make-search-results seg excerpts)
→ (listof search-result?) seg : segment?
excerpts :
(listof (maybe/c (and/c string-immutable/c #px"[^\\s]")))
procedure
(search-result-nullify-excerpt result) → search-result?
result : search-result?
procedure
(make-document-search-results info results)
→ document-search-results? info : instance-info? results : (non-empty-listof search-result?)
All of the results must be from the same TEI document and must be consistent with the instance info value info. Otherwise, an exception is raised.
10.2.2 Implementing Search Backend Types
signature
search^ : signature
value
A search^ unit should define search-backend/c as a contract recognizing the new type of search backend value it wants to support.A search^ unit’s search backend implementation need only provide a basic contract and initialize it eagerly in initialize-search-backend. The additional variants permitted by the final, public search-backend/c (see lazy+eager-search-backend/c) are added using define-lazy-search-unit.
procedure
(initialize-search-backend backend docs)
→ searchable-document-set? backend : search-backend/c docs : (instance-set/c tei-document?) The search^ unit’s initialize-search-backend will be called with a backend search backend value satisfying the unit’s specific definiton of search-backend/c. The unit’s implementation of initialize-search-backend is responsible for returning a searchable document set: that is, an instance of a class that implements searchable-document-set<%>.Typically, initialize-search-backend will be a wrapper around a constructor for a unit-specific searchable-document-set<%> class, and the unit’s notion of a search-backend/c value will to encapsulate all of the other data needed to initialize the class.
However, this is not mandatory. The implementation of initialize-search-backend from noop@, for example, ignores its arguments and always returns the singleton object noop-searchable-document-set.The search^ signature uses define-values-for-export to define initialize-search-backend/c as the contract for that unit’s implementation of initialize-search-backend.
syntax
(define-compound-search-unit compound-search-unit-id member-search-unit-id ...+)
The new unit’s implementation of search-backend/c applies or/c to the implementations from each of the member-search-unit-id units. Likewise, the new unit’s implementation of initialize-search-backend inspects the given search backend value and dispatches to the implementation of initialize-search-backend from the coresponding member-search-unit-id unit.
syntax
(define-lazy-search-unit lazy-search-unit-id eager-search-unit-id)
In the 'eager case, lazy-search-unit-id will simply dispatch to eager-search-unit-id’s implementation of initialize-search-backend. Otherwise, lazy-search-unit-id will return a proxy searchable document set which calls eager-search-unit-id’s initialize-search-backend in a background thread.
10.2.2.1 Basic search^ Units
value
postgresql@ :
(unit/c (import) (export search^))