libxml2:   Bindings for XML Validation
1 DTD Validation
dtd?
file->dtd
dtd-validate-xml-string
dtd-validate-xexpr
dtd-validate-xml-file
2 Checking Shared Library Availability
libxml2-available?
exn:  fail:  unsupported:  libxml2
3 Usage Notes
3.1 Platform Dependencies
3.2 Safety & Stability
7.7

libxml2: Bindings for XML Validation

Philip McGrath <philip at philipmcgrath dot com>

 (require libxml2) package: libxml2

This package provides a Racket interface to functionality from the C library libxml2.

Racket already has many mature XML-related libraries implemented natively in Racket: libxml2 does not aim to replace them, nor to implement the entire libxml2 C API. Rather, the goal is to use libxml2 for functionality not currently available from the native Racket XML libraries, beginning with validation.

Note that libxml2 is in an early stage of development: before relying on this library, please see in particular the notes on Safety & Stability.

    1 DTD Validation

    2 Checking Shared Library Availability

    3 Usage Notes

      3.1 Platform Dependencies

      3.2 Safety & Stability

1 DTD Validation

The initial goal for libxml2 is to support XML validation, beginning with document type definitions.

procedure

(dtd? v)  boolean?

  v : any/c

procedure

(file->dtd pth)  dtd?

  pth : path-string?
A DTD object, recognized by the predicate dtd?, is a Racket value encapsulating an XML document type definition, which is a formal specification of the structure of an XML document. A DTD object can be used with functions like dtd-validate-xml-string to validate an XML document against the encapsulated document type definition.

Currently, the only way to construct a DTD object is from a stand-alone DTD file using file->dtd. Additional mechanisms may be added in the future.

Examples:
> (define dtd-file
    (make-temporary-file))
> (display-lines-to-file
   '("<!ELEMENT example (good)>"
     "<!ELEMENT good (#PCDATA)>")
   #:exists 'truncate/replace
   dtd-file)
> (define example-dtd
    (file->dtd dtd-file))
> example-dtd

#<dtd>

> (delete-file dtd-file)

procedure

(dtd-validate-xml-string dtd 
  doc 
  [error-buffer-file]) 
  
(or/c 'valid
      (and/c string? immutable?))
  dtd : dtd?
  doc : string?
  error-buffer-file : (or/c #f path-string?) = #f
Parses the string doc as XML and validates it according to the DTD object dtd. If doc is both well-formed and valid, dtd-validate-xml-string returns 'valid; otherwise, it returns an immutable string containing an error message.

Internally, dtd-validate-xml-string and related functions use a file as buffer to collect any error messages from libxml2. If error-buffer-file is provided and is not #false, it will be used as the buffer: it will be created if it does not already exist, and any existing contents will likely be overwritten. If error-buffer-file is #false (the default), a temporary file will be used.

Examples:
> (dtd-validate-xml-string
   example-dtd
   "<example><good>This is a good doc.</good></example>")

'valid

> (define buffer-file
    (make-temporary-file))
> (dtd-validate-xml-string
   example-dtd
   (string-append "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
                  "<example><good>So is this.</good></example>")
   buffer-file)

'valid

> (define (show-string str)
    (let loop ([lst (regexp-split #rx"\n" str)])
      (match lst
        ['() (void)]
        [(cons str lst)
         #:when (<= (string-length str) 60)
         (displayln str (current-error-port))
         (loop lst)]
        [(cons (pregexp #px"^(.{,60})\\s+(.*)$" (list _ a b)) lst)
         (displayln a (current-error-port))
         (loop (cons (string-append "  " b) lst))])))
> (show-string
   (dtd-validate-xml-string
    example-dtd
    "<ill-formed"
    buffer-file))

Entity: line 1: parser error : Couldn't find end of Start

  Tag ill-formed line 1

> (show-string
   (dtd-validate-xml-string
    example-dtd
    "<example><bad>This is invalid.</bad></example>"))

element example: validity error : Element example content

  does not follow the DTD, expecting (good), got (bad)

element bad: validity error : No declaration for element bad

> (delete-file buffer-file)

procedure

(dtd-validate-xexpr dtd 
  doc 
  [error-buffer-file]) 
  
(or/c 'valid
      (and/c string? immutable?))
  dtd : dtd?
  doc : xexpr/c
  error-buffer-file : (or/c #f path-string?) = #f
Like dtd-validate-xml-string, but validates the x-expression doc. Because doc is an x-expression, it will always be at least well-formed.

Examples:
> (dtd-validate-xexpr example-dtd
                      '(example (good)))

'valid

> (show-string
   (dtd-validate-xexpr example-dtd
                       '(example (bad))))

element example: validity error : Element example content

  does not follow the DTD, expecting (good), got (bad)

element bad: validity error : No declaration for element bad

procedure

(dtd-validate-xml-file dtd 
  doc 
  [error-buffer-file]) 
  
(or/c 'valid
      (and/c string? immutable?))
  dtd : dtd?
  doc : (and/c path-string? file-exists?)
  error-buffer-file : (or/c #f path-string?) = #f
Like dtd-validate-xml-string, but validates the XML document in the file doc.

2 Checking Shared Library Availability

If the libxml2 shared library cannot be loaded, the Racket interface defers raising any exception until a client program attempts to use the foreign functionality. In other words, (require libxml2) should not cause an exception, even if attempting to load the shared library fails. (Currently, an immediate exception may be raised if the shared library is loaded, but does not provide the needed functionality.)

procedure

(libxml2-available?)  boolean?

Returns #true if and only if the libxml2 shared library was loaded successfully. When (libxml2-available?) returns #false, indicating that the shared library could not be loaded, most functions provided by libxml2 will raise an exception of the exn:fail:unsupported:libxml2 structure type.

Added in version 0.0.1 of package libxml2.

Raised by functions from this library that depend on the libxml2 shared library when the foreign library could not be loaded. The who field identifies the origin of the exception, potentially in terms of the C API or other internal names.

See also libxml2-available?.

Added in version 0.0.1 of package libxml2.

3 Usage Notes

3.1 Platform Dependencies

All of this library’s functionality depends on having the libxml2 shared library available. It is included by default with Mac OS and is readily available on GNU/Linux via the system package manager. For Windows users, there are plans to distribute the necessary libraries through the Racket package manager, but this has not yet been implemented.

3.2 Safety & Stability

The goal for libxml2 is to provide a safe interface for Racket clients. However, this library is still in an early stage of development: there are likely subtle bugs, and, since libxml2 is implemented using unsafe functionality, these bugs could have bad consequences. More fundamentally, there may be bugs and security vulnerabilities in the underlying libxml2 shared library. Please give careful thought to these issues when deciding whether or how to use libxml2 in your programs.

In terms of stability, libxml2 is in an early stage of development: backwards-compatibility is not guaranteed. However, I have no intention of breaking things gratuitously. If you use libxml2 now, I encourage you to be in touch; I am happy to consult with users about potential changes.