4 “Term Search” Tool
(require ricoeur/term-search) | |
package: ricoeur-tei-utils |
procedure
(term-search term [ #:ricoeur-only? ricoeur-only? #:languages languages #:book/article book/article #:exact? exact?]) → (instance-set/c document-search-results?) term : term/c ricoeur-only? : any/c = #t languages : search-languages/c = 'any book/article : (or/c 'any 'book 'article) = 'any exact? : any/c = #f
value
term/c : flat-contract? = (and/c string-immutable/c #px"[^\\s]")
value
= (or/c 'any language-symbol/c (listof language-symbol/c))
If languages is a list of symbols, results will only be returned from TEI documents for which instance-language would have produced one of the symbols in the languages list. If it is a single symbol satisfying language-symbol/c, it is treated like (list languages). Otherwise, if languages is 'any (the default), documents in all languages will be searched. Use 'any rather than listing all currently-supported languages so that, when support for additional languages is added to this library, they will be included automatically.
If book/article is 'book or 'article, only results from TEI documents that would have returned the same symbol from instance-book/article will be returned. If book/article is 'any (the default), all TEI documents will be searched.
If ricoeur-only? is non-false (the default), results will only be returned from passages by Paul Ricœur. Otherwise, results from passages by editors etc. will also be included. See segment-by-ricoeur? for more details.
If exact? is #false (the default), term-search will try to find matches for lexical variants of term. The precise details of how lexical variants are matched are unspecified and depend on the specific search backend used by the corpus object. If exact? is non-false, lexical variants are ignored and only exact matches for term are returned.
value
=
(lazy+eager-search-backend/c (or/c 'noop 'regexp (postgresql-data-source/c)))
procedure
(lazy+eager-search-backend/c base/c) → contract?
base/c : contract
= (or/c base/c (list/c 'eager base/c))
A search backend specifies both the underlying search implementation to be used for functions like term-search and the strategy by which the implementation should be initialized. The actual initialization is handled either by a corpus object or by direct use of initialize-search-backend.
If the search backend is a list beginning with 'eager, the search implementation is initialized synchronously, which is especially useful for debugging. Otherwise, the search implementation is initialized in a background thread, which can provide a substantial improvement in startup time.
'noop indicates a trivial implementation that never returns any search results.
'regexp specifies a simplistic regular-expression-based search implemented in pure Racket, with no system-level dependencies. The performance of 'regexp-based search backends is extremely slow for large corpora. Even for development, using 'regexp with Digital Ricœur’s full set of digitized documents is not viable.
A value satisfying (postgresql-data-source/c) indicates a production-quality implementation using PostgreSQL’s full-text search feature by connecting to the given database. Note that initializing a PostgreSQL search backend will perform destructive modifications to the database. The specified database should be dedicated completely to use by the constructed corpus object or searchable document set: it should not be relied upon for other purposes, and multiple corpus objects or searchable document sets should not use the same database at the same time.
procedure
Values satisfying (postgresql-data-source/c) can be used as search backends.
Aside from checking that the data-source value is well-formed and contains sufficient arguments, the primary purpose of (postgresql-data-source/c) is to prevent mutation. Mutators like set-data-source-args! raise exceptions when applied to values protected by (postgresql-data-source/c). In addition, the first time (postgresql-data-source/c) encounters a given data-source value, the contract copies it (to prevent it from being mutated through another reference) and coerces any strings in the data-source-args field to immutable strings. Therefore, values protected by (postgresql-data-source/c) may not be eq? or even equal? to their originals.
procedure
(corpus-do-term-search corpus term [ #:ricoeur-only? ricoeur-only? #:languages languages #:book/article book/article #:exact? exact?]) → (instance-set/c document-search-results?) corpus : (is-a?/c term-search-corpus<%>) term : term/c ricoeur-only? : any/c = #t languages : search-languages/c = 'any book/article : (or/c 'any 'book 'article) = 'any exact? : any/c = #f
| ||
| ||
|
constructor
(new term-search-corpus-mixin [ [search-backend search-backend]] ...superclass-args...) → (is-a?/c term-search-corpus-mixin) search-backend : search-backend/c = '(eager noop) Constructs a corpus object encapsulating docs.The search-backend argument is used as the corpus object’s search backend and affects the behavior of term-search. See search-backend/c for more details.
method
(send a-term-search-corpus term-search term [ #:ricoeur-only? ricoeur-only? #:languages languages #:book/article book/article #:exact? exact?]) → (instance-set/c document-search-results?) term : term/c ricoeur-only? : any/c = #t languages : search-languages/c = 'any book/article : (or/c 'any 'book 'article) = 'any exact? : any/c = #f Implements term-search and corpus-do-term-search.
| ||
|
4.1 Search Results
procedure
v : any/c
procedure
(document-search-results-count doc-results)
→ exact-positive-integer? doc-results : document-search-results?
procedure
(document-search-results-results doc-results)
→ (non-empty-listof search-result?) doc-results : document-search-results?
match expander
(document-search-results kw-pat ...)
kw-pat = #:count count-pat | #:results results-pat
A document search result value will always contain at least one search result.
The function document-search-results-count is equivalent to (compose1 length document-search-results-results), but document-search-results-count (and the corresponding match pattern with document-search-results) is cached for efficiency of repeated calls
procedure
(search-result? v) → any/c
v : any/c
procedure
(search-result-excerpt search-result)
→
(maybe/c (and/c string-immutable/c trimmed-string-px)) search-result : search-result?
match expander
(search-result excerpt-pat)
A search result’s excerpt may be (nothing) if there were too many results for the search term from that TEI document to return excerpts for all of them.
The trimmed-string-px part of the contract on the result of search-result-excerpt guaranties that, if the returned excerpt is not (nothing), the contained string will be non-empty and will neither start nor end with whitespace.
See also search-result-<? and search-result->?.
procedure
(search-result-<? a b) → boolean?
a : search-result? b : search-result?
procedure
(search-result->? a b) → boolean?
a : search-result? b : search-result?
Sorting search results with search-result-<? will put them in the order in which they appeared in the original TEI document.
It is an error to use search-result-<? or search-result->? on search results that did not come from the same document search result value.
4.2 Searching Without a Corpus Object
procedure
(initialize-search-backend backend docs)
→ searchable-document-set? backend : search-backend/c docs : (instance-set/c tei-document?)
procedure
v : any/c
A searchable document set is recognized by the predicate searchable-document-set? and can be created using initialize-search-backend, which takes a search backend, just like corpus%, and an instance set of TEI documents to be searched. (In fact, corpus% implements term-search by creating a searchable document set internally.)
As with creating an instance of corpus%, creating a new searchable document set with initialize-search-backend involves an appreciable amount of overhead, so creating redundant values should be avoided.
procedure
(searchable-document-set-do-term-search searchable-document-set term [ #:ricoeur-only? ricoeur-only? #:languages languages #:book/article book/article #:exact? exact?]) → (instance-set/c document-search-results?) searchable-document-set : searchable-document-set? term : term/c ricoeur-only? : any/c = #t languages : search-languages/c = 'any book/article : (or/c 'any 'book 'article) = 'any exact? : any/c = #f