extensible-parser-specifications

7.7

top ← prev up next →

extensible-parser-specifications

Georges Dupéron <georges.duperon@gmail.com>

Caveat: the mixins defined with define-eh-alternative-mixin cannot be provided and used in a separate module. Unfortunately, I cannot think of an acceptable fix for this problem, as solving this would require extracting parts of the mixin while preserving the bindings of some identifiers, but altering the bindings of others. This means that for the foreseeable future, once a mixin is defined, can only be used via ~mixin (or by directly invoking it) within the same module.

The regular and splicing syntax classes defined with #:define-syntax-class and #:define-splicing-syntax-class will work fine across module boundaries, however. Manually defined syntax classes, splicing syntax classes or ellipsis-head syntax classes will also work fine across module boundaries, even if they contain uses of ~no-order and ~seq-no-order, and even if those special forms contain uses of mixins defined within the same module. In other words, as long as a definition of a mixin and all its uses via ~mixin are within the same module, everything else should work without surprises.

(require extensible-parser-specifications)
	package: extensible-parser-specifications

1 Defining reusable parser mixins

syntax
(define-eh-alternative-mixin name
  maybe-splicing-class
  maybe-define-splicing-class
  (pattern clause-or-mixin) ...)

maybe-define-class =
| #:define-syntax-class class-name

maybe-define-splicing-class =
| #:define-splicing-syntax-class splicing-name

clause-or-mixin = syntax-pattern
| (~mixin eh-alternative-mixin)
| (~or clause-or-mixin ...)
| derived-or

Defines an eh-alternative mixin, which is implemented as an eh-mixin expander. An eh-alternative mixin is like an ellipsis-head alternative set, except that it can only appear as part of a ~no-order (possibly nested under other eh-alternative mixins), and can contain some global constraints. The global constraints, detailed below, allow the parser to perform checks across two or more mixins. For example, given a set of options that can appear in any order, it is possible to specify that two of them are mutually exclusive, or that two other must appear in a certain order, regardless of the order of the other options.

The derived-or term covers any pattern expander or eh-mixin expander application which expands to a clause-or-mixin.

The #:define-syntax-class option defines a syntax class with the given class-name which matches {~no-order {~mixin name}}.

The #:define-splicing-syntax-class option defines a splicing syntax class with the given class-name which matches {~seq-no-order {~mixin name}}.

for-syntax value
eh-mixin-expander-type : expander-type?
for-syntax procedure
(make-eh-mixin-expander)
→ (and/c expander? eh-mixin-expander?)
for-syntax procedure
(eh-mixin-expander? v) → boolean?
v : any/c
syntax
(define-eh-mixin-expander id transformer-procedure)
for-syntax procedure
(expand-all-eh-mixin-expanders stx-tree) → syntax?
stx-tree : syntax?

These functions and forms allow the creation and manipulation of eh-mixin expanders. These identifiers are generated by define-expander-type. For more information, see the documentation for define-expander-type.

1.1 Using mixins

syntax
(~mixin eh-alternative-mixin)

Expands the eh-alternative-mixin, with no arguments. This is equivalent to (eh-alternative-mixin), but ~mixin additionally checks that the given eh-alternative-mixin is indeed an eh-alternative mixin. Otherwise, with the syntax, (eh-alternative-mixin) the name eh-alternative-mixin would be interpreted as a pattern variable by syntax-parse if the expander was not available for some reason (e.g. a missing import).

2 Matching alternatives in any order

pattern expander
(~seq-no-order clause-or-mixin ...)

clause-or-mixin = syntax-pattern
| (~mixin eh-alternative-mixin)
| (~or clause-or-mixin ...)
| derived-or

Splicing pattern which matches the given clause-or-mixins in any order, enforcing the global constraints expressed within each.

Nested ~or directly below ~seq-no-order are recursively inlined. In other words, the ~or present directly below the ~seq-no-order or below such an ~or clause do not behave as "exclusive or", but instead contain clauses which can appear in any order. These clauses are not grouped in any way by the ~or, i.e. (~no-order (~or (~or a b) (~or c d))) is equivalent to (~no-order a b c d).

The derived-or term covers any pattern expander or eh-mixin expander application which expands to a clause-or-mixin. The expansion of pattern and eh-mixin expanders happens before inlining the top ~or clauses.

pattern expander
(~no-order clause-or-mixin ...)

clause-or-mixin = syntax-pattern
| (~mixin eh-alternative-mixin)
| (~or clause-or-mixin ...)
| derived-or

Like ~seq-no-order, except that it matches a syntax list, instead of being spliced into the surrounding sequence of patterns. In other words,

({~seq-no-order clause-or-mixin ...})

is equivalent to (notice the extra pair of braces above):

(~no-order clause-or-mixin ...)

Additionally, ~no-order can include clauses which use ~lift-rest, which lifts a pattern which matches the tail of an improper list.

2.1 Enforcing a partial order on the alternatives

eh-mixin expander
(~order-point point-name syntax-pattern ...)

When parsing a sequence of elements, ~seq-no-order and ~no-order associate an increasing number to each element starting from zero.

The number associated with the first element matched by syntax-pattern ... is memorised into the attribute point-name.

This allows the position of elements matched by otherwise independent mixins to be compared using order-point< and order-point>

syntax
(order-point< a b)

a = attribute-name

b = attribute-name

Returns #t when the first element matched by (~order-point a syntax-pattern ...) occurs before the first element matched by (~order-point b syntax-pattern ...). Otherwise, returns #f.

This operation does not fail if a or b are bound to #f (i.e. their corresponding syntax-pattern ... did not match). Instead, in both cases, it returns #f.

syntax
(order-point> a b)

a = attribute-name

b = attribute-name

Returns #t when the first element matched by (~order-point a syntax-pattern ...) occurs after the first element matched by (~order-point b syntax-pattern ...). Otherwise, returns #f.

This operation does not fail if a or b are bound to #f (i.e. their corresponding syntax-pattern ... did not match). Instead, in both cases, it returns #f.

syntax
(try-order-point< a b)

a = attribute-name

b = attribute-name

Like order-point<, except that it does not fail if a or b are not attributes, or if they are bound to #f. Instead, in all those cases, it returns #f.

It can be used as follows:

(~post-fail "a must appear after b"
#:when (try-order-point< a b))

The same caveats as for try-attribute apply.

syntax
(try-order-point> a b)

a = attribute-name

b = attribute-name

Like order-point>, except that it does not fail if a or b are not attributes, or if they are bound to #f. Instead, in all those cases, it returns #f.

It can be used as follows:

(~post-fail "a must appear before b"
#:when (try-order-point> a b))

The same caveats as for try-attribute apply.

eh-mixin-expander
(~before other message pat ...)

Post-checks that the first element matched by pat ... appears before the other order-point. This is a shorthand for:

{~order-point pt
{~seq pat ...}
{~post-fail message #:when (order-point> pt other)}}

Note: Hopefully ~before will be modified in the future so that it auto-detects if the other order-point is not defined as part of the current ~no-order. Do not rely on comparisons with order points somehow defined outside the current ~no-order, as that behaviour may change in the future.

This is implemented as a pre operation.

eh-mixin-expander
(~after other message pat ...)

Post-checks that the first element matched by pat ... appears after the other order-point. This is a shorthand for:

{~order-point pt
{~seq pat ...}
{~post-fail message #:when (order-point< pt other)}}

Note: Hopefully ~after will be modified in the future so that it auto-detects if the other order-point is not defined as part of the current ~no-order. Do not rely on comparisons with order points somehow defined outside the current ~no-order, as that behaviour may change in the future.

This is implemented as a pre operation.

eh-mixin-expander
(~try-before other message pat ...)

Post-checks that the first element matched by pat ... appears before the other order-point. The try- version does not cause an error if the order-point other is not define (e.g. it was part of another mixin which was not included). This is a shorthand for:

{~order-point pt
{~seq pat ...}
{~post-fail message #:when (try-order-point> pt other)}}

Note: Hopefully ~before will be modified in the future so that it auto-detects if the other order-point is missing. This form will then be removed.

This is implemented as a pre operation.

eh-mixin-expander
(~try-after other message pat ...)

Post-checks that the first element matched by pat ... appears after the other order-point. The try- version does not cause an error if the order-point other is not define (e.g. it was part of another mixin which was not included). This is a shorthand for:

{~order-point pt
{~seq pat ...}
{~post-fail message #:when (try-order-point< pt other)}}

Note: Hopefully ~after will be modified in the future so that it auto-detects if the other order-point is missing. This form will then be removed.

This is implemented as a pre operation.

3 Parsing the tail of improper lists

eh-mixin expander
{~lift-rest pat}

Lifts pat out of the current mixin, so that it is used as a pattern to match the tail of the improper list being matched. It is subject to the following restrictions:

~lift-rest is allowed only within ~no-order, but not within ~seq-no-order. ~seq-no-order always matches against a proper sequence of elements, while ~no-order may match a proper or improper list.
The tail of the improper list must not be a pair, otherwise the car would have been included in the main part of the list.
The pat is used to match the tail only if its surrounding pattern successfully matched some elements of the main section of the list.
If the {~lift-rest pat} is the only pattern present within an alternative, then it is always used.
Example:
> (syntax-parse #'(x y z . 1)
[(~no-order {~lift-rest r:nat} i:id)
(syntax->datum #'(r i ...))])
'(1 x y z)

Among the lifted rest patterns which are considered (see the point above), only one may successfully match. An error is raised if two or more lifted rest patterns successfully match against the tail of the list.

Examples:

(define p
  (syntax-parser
    [(~no-order {~and {~literal x}
                      {~lift-rest rn:nat}
                      {~lift-rest ri:id}}
                {~and {~literal y}
                      {~lift-rest rs:str}
                      {~lift-rest rj:id}})
     'match]))

> (p #'(x . 1))   ; rn and ri considered, rn matched
'match
> (p #'(x . z))   ; rn and ri considered, ri matched
'match
> (p #'(y . "a")) ; rs and rj considered, rs matched
'match
> (p #'(y . z))   ; rs and rj considered, rj matched
'match
> (p #'(x y . 1)) ; all four considered, rn matched
'match
> (p #'(x y . z)) ; all four considered, both ri and rj matched
eval:7.0: x: more than one of the lifted rest patterns
matched
  at: (x y . z)
  in: (x y . z)

The rationale is that selecting the first lifted rest pattern that matches would result in unclear behaviour, as the order of the alternative clauses should not be significant.

Post and global operations can be used within the pat. This combination of features is not thoroughly tested, however. Please report any issues you run into.

eh-mixin expander
{~as-rest pat ...}

Like ~seq, but the pats are injected as part of the same ~or as ~lift-rest. This means that syntax/parse will not throw an error for the following code:

Examples:

(define p2
  (syntax-parser
    [(~no-order {~once name:id}
                {~once message:str}
                (~once (~or {~as-rest val:nat}
                            {~seq {~lift-rest val:nat}})))
     (syntax->datum
      #'(#:name name #:messsage message #:val val))]))

> (p2 #'(x 123 "msg"))       ; matched by ~as-rest
'(#:name x #:messsage "msg" #:val 123)
> (p2 #'(x "msg" 123))       ; matched by ~as-rest
'(#:name x #:messsage "msg" #:val 123)
> (p2 #'(x "msg" . 456))     ; matched by ~lift-rest
'(#:name x #:messsage "msg" #:val 456)
> (p2 #'(x "msg" 123 . 456)) ; can't have both
eval:5.0: x: bad syntax
  in: (x "msg" 123 . 456)

4 Pre, global and post operations

Pre operations happen before the ~! backtracking cut, so they can affect what combination of alternative clauses the parser will choose. Post operations happen after the ~! backtracking cut, and can only reject the ~no-order or ~seq-no-order as a whole (i.e. different orders will not be attempted after a ~post-fail. Global operations will always succeed.

Post operations can access the attributes defined by global and pre operations as well as attributes defined by the alternative clauses. Global operations cannot access the attributes of post operations, and pre operations cannot access the attributes of global and post operations. See Order in which the attributes are bound for post operations and global operations for more details.

4.1 Pre operations

eh-mixin expander
(~named-seq attribute-name syntax-pattern ...)

Equivalent to {~seq syntax-pattern ...}, but also binds the attribute-name to the whole sequence. If the sequence appears inside an ~optional or ~or clause that fails, the attribute-name is still bound to the empty sequence.

Known issues: this may not behave as expected if ~named-seq appears under ellipses.

This probably should bind the sequence attribute before the "global" operations, instead of being a "post" operation, and may be changed in that way the future.

eh-mixin expander
(~maybe/empty syntax-pattern ...)

Optionally matches {~seq syntax-pattern ...}. If the match fails, it matches these same sequence of patterns against the empty syntax list #'(). This form can be used in an ellipsis-head position. This is implemented in both cases as a "pre" action.

eh-mixin expander
(~optional/else syntax-pattern
                maybe-defaults
                else-post-fail ...
                maybe-name)

maybe-defaults =
| #:defaults (default-binding ...)

else-post-fail = #:else-post-fail message #:when condition
| #:else-post-fail #:when condition message
| #:else-post-fail message #:unless unless-condition
| #:else-post-fail #:unless unless-condition message

maybe-name =
| #:name attribute-name

Like ~optional, but with conditional post-failures when the pattern is not matched. An ~optional/else pattern can be matched zero or one time as part of the ~seq-no-order or ~no-order. When it is not matched (i.e. matched zero times):

it uses the default values for the attributes as specified with #:defaults.
for each #:else-post-fail clause, it checks whether the condition or unless-condition is true or false, respectively. If this is the case the whole ~seq-no-order or ~no-order is rejected with the given message. The behaviour of #:else-post-fail is the same as the behaviour of ~post-fail, except that the "post" conditional failure can only be executed if the optional syntax-pattern was not matched.
Note that there is an implicit cut (~!) between the no-order patterns and the "post" checks, so after a ~post-fail fails, syntax-parse does not backtrack and attempt different combinations of patterns to match the sequence, nor does it backtrack and attempt to match a shorter sequence. This is by design, as it allows for better error messages (syntax-parse would otherwise attempt and possibly succeed in matching a shorter sequence, then just treat the remaining terms as "unexpected terms").

The meaning of #:name attribute-name option is the same as for ~optional.

4.2 Global operations

The global patterns presented below match all of the given syntax-patterns, like ~and does, and perform a global aggregation over all the values corresponding to successful matches of a global pattern using the same attribute-name.

After the whole ~seq-no-order or ~no-order finished matching its contents, but before "post" operations are executed, the attribute attribute-name is bound to (aggrgate-function value₁ ... valueₙ), where each valueᵢ is the value which was passed to an occurrence of ~global-or with the same attribute-name, and which successfully matched. The aggregate-function will be or for ~global-or, and for ~global-and or + for ~global-counter.

Each valueᵢ is computed in the context in which it appears, after the syntax-patterns. This means that it can access:

attributes already bound in the current alternative clause within the current ~no-order or ~seq-no-order
attributes bound by the syntax-patternss
attributes already bound outside of the ~no-order or ~seq-no-order
but it cannot access attributes bound in other alternative clauses within the current ~no-order or ~seq-no-order.

The valueᵢ are aggregated with or, and or + in the order in which they appear in the ~no-order or ~seq-no-order. If a valueᵢ appears under ellipses, or as part of an alternative clause which can match more than once (i.e. not ~once or ~optional), then each match within that valueᵢ group is aggregated in the order it appears.

Since this notion of order is rather complex, it is possible that future versions of this library will always return a boolean (#f or #t for ~global-or and ~global-and, which would make the notion of order irrelevant.

eh-mixin expander
(~global-or attribute-name+value syntax-pattern ...)

attribute-name+value = attribute-name
| [attribute-name valueᵢ]

Matches all of the given syntax-patterns, like ~and does, and perform a global or over all the values corresponding to successful matches of a global pattern using the same attribute-name. See above for a description of how global operations work.

If the valueᵢ is omitted, #t is used as a default.

The result is always transformed into a boolean, so attribute-name is always bound to either #t or #f.

eh-mixin expander
(~global-and attribute-name+value syntax-pattern ...)

attribute-name+value = [attribute-name valueᵢ]

Matches all of the given syntax-patterns, like ~and does, and perform a global and over all the values corresponding to successful matches of a global pattern using the same attribute-name. See above for a description of how global operations work.

If there is at least one occurrence of ~global-and for that attribute-name which successfully matches, the result of the (and valueᵢ ...) is always coerced to a boolean, so attribute-name is always bound to either #t or #f.

If there are no matches at all, the special value 'none is used instead of #t as would be produced by (and).

eh-mixin expander
(~global-counter attribute-name+value syntax-pattern ...)

attribute-name+value = attribute-name
| [attribute-name valueᵢ]

Matches all of the given syntax-patterns, like ~and does, and perform a global + over all the values corresponding to successful matches of a global pattern using the same attribute-name. See above for a description of how global operations work.

If the valueᵢ is omitted, 1 is used as a default.

4.3 Post operations

eh-mixin expander
(~post-check syntax-pattern A-pattern)
(~post-check A-pattern)

Matches syntax-pattern, and executes the given A-pattern after the whole ~seq-no-order or ~no-order finished matching its contents.

If unspecified, the syntax-pattern defaults to (~nop).

eh-mixin expander
(~post-fail message #:when condition)
(~post-fail #:when condition message)
(~post-fail message #:unless unless-condition)
(~post-fail #:unless unless-condition message)

After the whole ~seq-no-order or ~no-order finished matching its contents, checks whether condition or unless-condition is true or false, respectively. If this is the case the whole ~seq-no-order or ~no-order is rejected with the given message.

Note that there is an implicit cut (~!) between the no-order patterns and the "post" checks, so after a ~post-fail fails, syntax-parse does not backtrack and attempt different combinations of patterns to match the sequence, nor does it backtrack and attempt to match a shorter sequence. This is by design, as it allows for better error messages (syntax-parse would otherwise attempt and possibly succeed in matching a shorter sequence, then just treat the remaining terms as "unexpected terms").

4.4 Order in which the attributes are bound for post operations and global operations

Within the A-patterns of post operations, the regular attributes bound by all the clauses inside ~seq-no-order or ~no-order are bound. The attributes defined as part of all "global" actions are bound too. The attributes defined as part of "post" actions of other clauses are bound only if the clause defining them appears before the current clause in the source code. For example, the following code works because the clause containing {~post-fail "2 is incompatible with 1" #:when (not (attribute a))} appears after the clause which binds a with the "post" action {~post-check {~bind ([a #'the-a])}}.

{~seq-no-order
{~post-check {~and the-a 1} {~bind ([a #'the-a])}}
{~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}}}

If the two clauses are swapped, then the following code would raise a syntax error because a is not bound as an attribute in the ~post-fail:

{~seq-no-order
{~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}}
{~post-check {~and the-a 1} {~bind ([a #'the-a])}}}

On the other hand, the following code, which does not bind a as part of a post operation, is valid:

{~seq-no-order
{~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}}
{~and the-a 1 {~bind ([a #'the-a])}}}

Furthermore, the following code still works, as attributes are bound by the "global" operations before the "post" operations are executed:

{~seq-no-order
{~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}}
{~global-or a 1}}

Note that the order in which clauses appear within the ~seq-no-order or ~no-order does not impact the order in which the elements must appear in the matched syntax (aside from issues related to greediness).

syntax
(try-attribute attribute-name)

This macro expands to (attribute attribute-name) if attribute-name is bound as a syntax pattern variable, and to #f otherwise.

This macro can be used to check for mutual exclusion of an attribute which is bound by other mixins that might or might not be present in the final ~no-order or ~seq-no-order.

Use this sparingly, as if an syntax pattern variable with that name is bound by an outer scope, the try-attribute macro will still access it, ignorant of the fact that the current ~seq-no-order does not contain any mixin which binds that attribute.

Instead, it is better practice to use {~global-or [attribute-name #f]} or {~global-and [attribute-name #t]} to ensure that the attribute is declared, while using the operation’s neutral element to not alter the final result.

syntax
(if-attribute attribute-name if-branch else-branch)

This macro expands to if-branch if attribute-name is bound as a syntax pattern variable, and to else-branch otherwise.

The same caveats as for try-attribute apply.

5 Miscellaneous pattern expanders

pattern expander
{~nop}

The A-pattern ~nop does not perform any action. It simply expands to {~do}.

6 Chaining macro calls without re-parsing everything

syntax
(define/syntax-parse+simple (name-or-curry . syntax-pattern) . body)

name-or-curry = name
| (name-or-curry arg ...)

maybe-define-class =
| #:define-syntax-class splicing-name

maybe-define-splicing-class =
| #:define-splicing-syntax-class splicing-name

name = identifier?

class-name = identifier?

splicing-name = identifier?

This macro works like define/syntax-parse from phc-toolkit, except that it also defines the function name-forward-attributes, which can be used by other macros to forward already parsed attributes to the body, without the need to parse everything a second time.

The syntax pattern for the name macro’s arguments can be saved in a splicing syntax class by specifying the #:define-splicing-syntax-class option. The pattern only includes the arguments after the name, i.e it matches (stx-cdr stx).

The syntax pattern for the name macro’s arguments can be saved in a syntax class by specifying the #:define-syntax-class option. The pattern only includes the arguments after the name, i.e it matches (stx-cdr stx).

If the caller macro which uses (name-forward-attributes) parsed its own stx argument using class-id, then (name-forward-attributes) is equivalent to expanding (name stx).

The name-forward-attributes function is defined at the same meta level as name, i.e. at the same meta-level where this library was required.

for-template syntax
(define-syntax/parse+simple (name . syntax-pattern) . body)

This macro is provided for meta-level -1.

This is the same as define/syntax-parse+simple, except that it operates at level -1 relative to this library, and defines at that level a transformer binding (which therefore executes at the same meta-level as this library. In other words, (define-syntax/parse+simple (name . pat) . body) is roughly equivalent to:

(begin-for-syntax
(define/syntax-parse+simple (tmp . pat) . body)
(define name-forward-attributes tmp-forward-attributes))
(define-syntax name tmp)

top ← prev up next →

1	Defining reusable parser mixins
2	Matching alternatives in any order
3	Parsing the tail of improper lists
4	Pre, global and post operations
5	Miscellaneous pattern expanders
6	Chaining macro calls without re-parsing everything