On this page:
term-search
term/  c
search-languages/  c
search-backend/  c
lazy+  eager-search-backend/  c
postgresql-data-source/  c
corpus-do-term-search
term-search-corpus-mixin
new
term-search
term-search-corpus<%>
4.1 Search Results
document-search-results?
document-search-results-count
document-search-results-results
document-search-results
search-result?
search-result-excerpt
search-result
search-result-<?
search-result->?
4.2 Searching Without a Corpus Object
initialize-search-backend
searchable-document-set?
noop-searchable-document-set
searchable-document-set-do-term-search
0.5.91

4 “Term Search” Tool

 (require ricoeur/term-search)
  package: ricoeur-tei-utils

procedure

(term-search term 
  [#:ricoeur-only? ricoeur-only? 
  #:languages languages 
  #:book/article book/article 
  #:exact? exact?]) 
  (instance-set/c document-search-results?)
  term : term/c
  ricoeur-only? : any/c = #t
  languages : search-languages/c = 'any
  book/article : (or/c 'any 'book 'article) = 'any
  exact? : any/c = #f

value

term/c : flat-contract? = (and/c string-immutable/c #px"[^\\s]")

value

search-languages/c : flat-contract?

 = (or/c 'any language-symbol/c (listof language-symbol/c))
Searches for term , an immutable string containing at least one non-whitespace character, in the TEI documents encapsulated by (current-corpus). If (current-corpus) is not a term-search-corpus<%>, always returns (instance-set).

If languages is a list of symbols, results will only be returned from TEI documents for which instance-language would have produced one of the symbols in the languages list. If it is a single symbol satisfying language-symbol/c, it is treated like (list languages). Otherwise, if languages is 'any (the default), documents in all languages will be searched. Use 'any rather than listing all currently-supported languages so that, when support for additional languages is added to this library, they will be included automatically.

If book/article is 'book or 'article, only results from TEI documents that would have returned the same symbol from instance-book/article will be returned. If book/article is 'any (the default), all TEI documents will be searched.

If ricoeur-only? is non-false (the default), results will only be returned from passages by Paul Ricœur. Otherwise, results from passages by editors etc. will also be included. See segment-by-ricoeur? for more details.

If exact? is #false (the default), term-search will try to find matches for lexical variants of term. The precise details of how lexical variants are matched are unspecified and depend on the specific search backend used by the corpus object. If exact? is non-false, lexical variants are ignored and only exact matches for term are returned.

value

search-backend/c : contract?

 = 
(lazy+eager-search-backend/c
 (or/c 'noop 'regexp (postgresql-data-source/c)))

procedure

(lazy+eager-search-backend/c base/c)  contract?

  base/c : contract
 = (or/c base/c (list/c 'eager base/c))
The contract search-backend/c recognizes search backends.

A search backend specifies both the underlying search implementation to be used for functions like term-search and the strategy by which the implementation should be initialized. The actual initialization is handled either by a corpus object or by direct use of initialize-search-backend.

If the search backend is a list beginning with 'eager, the search implementation is initialized synchronously, which is especially useful for debugging. Otherwise, the search implementation is initialized in a background thread, which can provide a substantial improvement in startup time.

The base/c portion of the search backend value determines what underlying search implementation is used:
  • 'noop indicates a trivial implementation that never returns any search results.

  • 'regexp specifies a simplistic regular-expression-based search implemented in pure Racket, with no system-level dependencies. The performance of 'regexp-based search backends is extremely slow for large corpora. Even for development, using 'regexp with Digital Ricœur’s full set of digitized documents is not viable.

  • A value satisfying (postgresql-data-source/c) indicates a production-quality implementation using PostgreSQL’s full-text search feature by connecting to the given database. Note that initializing a PostgreSQL search backend will perform destructive modifications to the database. The specified database should be dedicated completely to use by the constructed corpus object or searchable document set: it should not be relied upon for other purposes, and multiple corpus objects or searchable document sets should not use the same database at the same time.

Returns an impersonator contract recognizing values created by postgresql-data-source with sufficient arguments to be used with dsn-connect without needing to supply any additional arguments. That is, at least the #:user and #:database arguments are required.

Values satisfying (postgresql-data-source/c) can be used as search backends.

Aside from checking that the data-source value is well-formed and contains sufficient arguments, the primary purpose of (postgresql-data-source/c) is to prevent mutation. Mutators like set-data-source-args! raise exceptions when applied to values protected by (postgresql-data-source/c). In addition, the first time (postgresql-data-source/c) encounters a given data-source value, the contract copies it (to prevent it from being mutated through another reference) and coerces any strings in the data-source-args field to immutable strings. Therefore, values protected by (postgresql-data-source/c) may not be eq? or even equal? to their originals.

procedure

(corpus-do-term-search corpus 
  term 
  [#:ricoeur-only? ricoeur-only? 
  #:languages languages 
  #:book/article book/article 
  #:exact? exact?]) 
  (instance-set/c document-search-results?)
  corpus : (is-a?/c term-search-corpus<%>)
  term : term/c
  ricoeur-only? : any/c = #t
  languages : search-languages/c = 'any
  book/article : (or/c 'any 'book 'article) = 'any
  exact? : any/c = #f
Like term-search, but using corpus instead of (current-corpus).

mixin

term-search-corpus-mixin : (class? . -> . class?)

  argument extends/implements: corpus<%>
  result implements: term-search-corpus<%>
...

constructor

(new term-search-corpus-mixin 
    [[search-backend search-backend]] 
    ...superclass-args...) 
  (is-a?/c term-search-corpus-mixin)
  search-backend : search-backend/c = '(eager noop)
Constructs a corpus object encapsulating docs.

The search-backend argument is used as the corpus object’s search backend and affects the behavior of term-search. See search-backend/c for more details.

method

(send a-term-search-corpus term-search 
  term 
  [#:ricoeur-only? ricoeur-only? 
  #:languages languages 
  #:book/article book/article 
  #:exact? exact?]) 
  (instance-set/c document-search-results?)
  term : term/c
  ricoeur-only? : any/c = #t
  languages : search-languages/c = 'any
  book/article : (or/c 'any 'book 'article) = 'any
  exact? : any/c = #f

interface

term-search-corpus<%> : interface?

  implements: corpus<%>
...

4.1 Search Results

procedure

(document-search-results? v)  any/c

  v : any/c

procedure

(document-search-results-count doc-results)

  exact-positive-integer?
  doc-results : document-search-results?

procedure

(document-search-results-results doc-results)

  (non-empty-listof search-result?)
  doc-results : document-search-results?

match expander

(document-search-results kw-pat ...)

 
kw-pat = #:count count-pat
  | #:results results-pat
A document search results value, recognized by the predicate document-search-results?, encapsulates the results of a function like term-search from a single TEI document. Document search results values also serve as instance info values for bibliographic information.

A document search result value will always contain at least one search result.

The function document-search-results-count is equivalent to (compose1 length document-search-results-results), but document-search-results-count (and the corresponding match pattern with document-search-results) is cached for efficiency of repeated calls

procedure

(search-result? v)  any/c

  v : any/c

procedure

(search-result-excerpt search-result)

  
(maybe/c (and/c string-immutable/c
                trimmed-string-px))
  search-result : search-result?

match expander

(search-result excerpt-pat)

A search result value, recognized by search-result?, represents an individual match from a function like term-search. Search result values are also segments, though a given document search result may contain multiple search results that are the same according to segment-meta=? if there was more than one match for the search term within a single segment.

A search result’s excerpt may be (nothing) if there were too many results for the search term from that TEI document to return excerpts for all of them.

The trimmed-string-px part of the contract on the result of search-result-excerpt guaranties that, if the returned excerpt is not (nothing), the contained string will be non-empty and will neither start nor end with whitespace.

See also search-result-<? and search-result->?.

procedure

(search-result-<? a b)  boolean?

  a : search-result?
  b : search-result?

procedure

(search-result->? a b)  boolean?

  a : search-result?
  b : search-result?
Ordering functions on search results, such as might be useful with sort. These are more fine-grained than functions based on segment-order would be: if a and b are from the same segment according to segment-meta=?, these functions will sort them according to their relative position within the segment.

Sorting search results with search-result-<? will put them in the order in which they appeared in the original TEI document.

It is an error to use search-result-<? or search-result->? on search results that did not come from the same document search result value.

4.2 Searching Without a Corpus Object

procedure

(initialize-search-backend backend docs)

  searchable-document-set?
  backend : search-backend/c
  docs : (instance-set/c tei-document?)

procedure

(searchable-document-set? v)  any/c

  v : any/c
While corpus objects are generally the preferred way to use this library’s search functions, searching TEI documents without a corpus object is possible by creating lower-level searchable document sets directly.

A searchable document set is recognized by the predicate searchable-document-set? and can be created using initialize-search-backend, which takes a search backend, just like corpus%, and an instance set of TEI documents to be searched. (In fact, corpus% implements term-search by creating a searchable document set internally.)

As with creating an instance of corpus%, creating a new searchable document set with initialize-search-backend involves an appreciable amount of overhead, so creating redundant values should be avoided.

A trivial searchable document set which never returns any results. Calling initialize-search-backend with a search backend of '(eager noop) always returns noop-searchable-document-set.

procedure

(searchable-document-set-do-term-search 
  searchable-document-set 
  term 
  [#:ricoeur-only? ricoeur-only? 
  #:languages languages 
  #:book/article book/article 
  #:exact? exact?]) 
  (instance-set/c document-search-results?)
  searchable-document-set : searchable-document-set?
  term : term/c
  ricoeur-only? : any/c = #t
  languages : search-languages/c = 'any
  book/article : (or/c 'any 'book 'article) = 'any
  exact? : any/c = #f
Like corpus-do-term-search, but uses the searchable document set searchable-document-set.