On this page:
5.1 TEI Lint
5.2 raco ricoeur/  tei

5 Tools

We have implemented a number of tools to assist in preparing and validating TEI XML documents. All of these require the library ricoeur/tei, which should be installed as described under Installing & Updating This Library in Digital Ricœur TEI Library.

All of our command-line tools accept the flags --help or -h to print usage information.

5.1 TEI Lint

Our primary tool is the GUI program “TEI Lint,” which combines several related features.

As the name suggests, “TEI Lint” is a “linter” for TEI XML documents. It checks the validity of the prepared documents (using xmllint from libxml2 when available) both in terms of the DTD and with respect to our more stringent project-specific requirements. It also alerts the user to issues with the documents that, while not making them invalid, are indicative of potential subtle mistakes or high-priority encoding steps that have not yet been completed. For some of these steps, such as inferring paragraph breaks, “TEI Lint” includes the ability to edit the documents automatically with minimal human guidance.

Using “TEI Lint” is the most important reason to have xmllint installed: without it, “TEI Lint” can check that the documents are well-formed XML and meet the project-specific requirements expressed in Racket code, but can’t actually check that the documents are valid in terms of the DTD.

In addition to serving as a linter, “TEI Lint” includes a simple, fully graphical interface for converting a plain text file to an initial TEI XML document. This functionality is described in more detail above, under Getting Started.

5.2 raco ricoeur/tei

The raco ricoeur/tei command extends raco with some further subcommands for processing TEI documents. Most of these commands are primarily intended to be used via make in the texts repository. They are included as raco subcommands to ensure they are in the PATH under most circumstances.

For more information about raco, see raco: Racket Command-Line Tools.

  • raco ricoeur/tei validate-directory validates all of the XML files in some directory, enforcing both the DTD (when xmllint is available) and the additional requirements specified in this document. It does not give warnings about potential subtle mistakes, so “TEI Lint” should generally be preferred.

    Running make validate in the root directory of the texts repository validates the contents of the TEI directory of that repository.

  • raco ricoeur/tei directory-clean-filenames renames all XML files in some directory (or the current directory, if none is provided) as needed to ensure that all start with a lower-case letter and that none have contain spaces. When given the --git or -g flag, it uses git mv to move files (if it is available).

    Running make rename in the root directory of the texts repository renames files as needed (using git mv) in the TEI directory.

    A bug in the implementation of ricoeur/tei (or perhaps in xmllint itself) sometimes causes files that do not conform to these naming requirements to fail validation.

  • raco ricoeur/tei to-plain-text writes a TEI XML file to STDOUT as plain text. It is primarily intended to be used by invoking make (or make all) in the texts repository, which will populate the plain-text directory with plain text versions of every TEI XML file in the TEI directory. If for some reason you want to use it directly, run raco tei-to-plain-text -h for usage information.

  • raco ricoeur/tei encode-xml-entities should be run on plain text files that are going to be converted to TEI XML documents manually: it must not be run on files that will be converted using “TEI Lint”. The command must be run before adding any XML markup. It replaces the reserved characters < and & with the corresponding XML entities. Run it with the flag --help or -h for usage information.

  • raco ricoeur/tei guess-paragraphs is a more limited substitute for functionality included in “TEI Lint”: under most circumstances, “TEI Lint” should be preferred.

    The command replaces a TEI XML file with an equivalent in which paragraph breaks have been guessed using tei-document-guess-paragraphs. Run it with the flag --help or -h for usage information.

    When xmllint is available, the output will be prettyprinted.

    Please always check the output of this tool: it operates on a best-effort basis. If, for example, you notice that it has simply replaced each “annonymous block” with one long paragraph, it would be better to revert your change and wait for this tool to be improved than to commit such semantically meaningless output.

    “TEI Lint” includes the same functionality as this tool, but with better output checking, so it should generally be preferred.