2 SAX Parsing


(ssax:xml->sxml port    
  namespace-prefix-assig)  sxml?
  port : input-port?
  namespace-prefix-assig : (listof (cons/c symbol? string?))
Reads an XML document (which can be a single XML element) from port, and returns the corresponding SXML (top) representation. The namespace-prefix-assig association list provides shortened forms to be used in place of namespaces.

> (ssax:xml->sxml
    "<zippy><pippy pigtails=\"2\">ab</pippy>cd</zippy>")

'(*TOP* (zippy (pippy (@ (pigtails "2")) "ab") "cd"))

> (ssax:xml->sxml
     "<car xmlns=\"vehicles\"><wheels>4</wheels></car>")

'(*TOP* (vehicles:car (vehicles:wheels "4")))

> (ssax:xml->sxml
     "<car xmlns=\"vehicles\"><wheels>4</wheels></car>")
    '((v . "vehicles")))

'(*TOP* (@ (*NAMESPACES* (v "vehicles"))) (v:car (v:wheels "4")))


(sxml:document url-string    
  namespace-prefix-assig)  sxml?
  url-string : string?
  namespace-prefix-assig : any/c
Given a local file URI, return the corresponding SXML representation.

NOTE: currently, this appears to work only for local documents.

NAMESPACE-PREFIX-ASSIG - is passed as-is to the SSAX parser: there it is used for assigning certain user prefixes to certain namespaces.

NAMESPACE-PREFIX-ASSIG is an optional argument and has an effect for an XML resource only. For an HTML resource requested, NAMESPACE-PREFIX-ASSIG is silently ignored.

So, for instance, if the file "/tmp/foo.xml" contains an XML file, you should be able to call

(sxml:document "file:///tmp/foo")

(Note the plethora of slashes required by the URI format.)


(ssax:make-parser new-level-seed-spec
                  tag-spec ...)
new-level-seed-spec = NEW-LEVEL-SEED
  | new-level-seed-proc
finish-element-spec = FINISH-ELEMENT
  | finish-element-proc
char-data-handler-spec = CHAR-DATA-HANDLER
  | char-data-handler-proc
tag-spec = tag
  | tag-proc
Returns a procedure of two arguments, an input port xml-port, and an object init-seed. That procedure will parse the XML document produced by xml-port, and the object init-seed, according to the specifications new-level-seed-spec, finish-element-spec, char-data-handler-spec, and tag-specs, and will return an object of the same type as init-seed.

new-level-seed-spec consists of the tag NEW-LEVEL-SEED in upper case, followed by a procedure new-level-seed-proc. This procedure must take the arguments element-name, attributes, namespaces, expected-content, and seed. It must return an object of the same type as init-seed.

finish-element-spec consists of the tag FINISH-ELEMENT in upper case, followed by a procedure finish-element-proc. This procedure must take the arguments element-name, attributes, namespaces, parent-seed, and seed. It must return an object of the same type as init-seed.

char-data-handler-spec consists of the tag CHAR-DATA-HANDLER in upper case, followed by a procedure char-data-handler-proc. This procedure must take the arguments string-1, string-2, and seed. It must return an object of the same type as init-seed.

‘tag-spec’: TODO.

Here’s an example that returns a string containing the text, after removing markup, from the XML document produced by the input port ‘in’.

#lang racket
(require racket/string sxml)
(define (remove-markup xml-port)
  (let* ((parser
          (ssax:make-parser NEW-LEVEL-SEED remove-markup-nls
                            FINISH-ELEMENT remove-markup-fe
                            CHAR-DATA-HANDLER remove-markup-cdh))
         (strings (parser xml-port null)))
    (string-join (reverse strings) "")))
(define (remove-markup-nls gi attributes namespaces expected-content
(define (remove-markup-fe gi attributes namespaces parent-seed seed)
(define (remove-markup-cdh string-1 string-2 seed)
  (let ((seed (cons string-1 seed)))
    (if (non-empty-string? string-2)
        (cons string-2 seed)
  "<foo>Hell<bar>o, world!</bar></foo>"))