extensible-parser-specifications
Caveat: the mixins defined with define-eh-alternative-mixin cannot be provided and used in a separate module. Unfortunately, I cannot think of an acceptable fix for this problem, as solving this would require extracting parts of the mixin while preserving the bindings of some identifiers, but altering the bindings of others. This means that for the foreseeable future, once a mixin is defined, can only be used via ~mixin (or by directly invoking it) within the same module.
The regular and splicing syntax classes defined with #:define-syntax-class and #:define-splicing-syntax-class will work fine across module boundaries, however. Manually defined syntax classes, splicing syntax classes or ellipsis-head syntax classes will also work fine across module boundaries, even if they contain uses of ~no-order and ~seq-no-order, and even if those special forms contain uses of mixins defined within the same module. In other words, as long as a definition of a mixin and all its uses via ~mixin are within the same module, everything else should work without surprises.
(require extensible-parser-specifications) | |
package: extensible-parser-specifications |
1 Defining reusable parser mixins
syntax
(define-eh-alternative-mixin name maybe-splicing-class maybe-define-splicing-class (pattern clause-or-mixin) ...)
maybe-define-class =
| #:define-syntax-class class-name maybe-define-splicing-class =
| #:define-splicing-syntax-class splicing-name clause-or-mixin = syntax-pattern | (~mixin eh-alternative-mixin) | (~or clause-or-mixin ...) | derived-or
The derived-or term covers any pattern expander or eh-mixin expander application which expands to a clause-or-mixin.
The #:define-syntax-class option defines a syntax class with the given class-name which matches {~no-order {~mixin name}}.
The #:define-splicing-syntax-class option defines a splicing syntax class with the given class-name which matches {~seq-no-order {~mixin name}}.
for-syntax value
for-syntax procedure
→ (and/c expander? eh-mixin-expander?)
for-syntax procedure
(eh-mixin-expander? v) → boolean?
v : any/c
syntax
(define-eh-mixin-expander id transformer-procedure)
for-syntax procedure
(expand-all-eh-mixin-expanders stx-tree) → syntax?
stx-tree : syntax?
1.1 Using mixins
syntax
2 Matching alternatives in any order
pattern expander
(~seq-no-order clause-or-mixin ...)
clause-or-mixin = syntax-pattern | (~mixin eh-alternative-mixin) | (~or clause-or-mixin ...) | derived-or
Nested ~or directly below ~seq-no-order are recursively inlined. In other words, the ~or present directly below the ~seq-no-order or below such an ~or clause do not behave as "exclusive or", but instead contain clauses which can appear in any order. These clauses are not grouped in any way by the ~or, i.e. (~no-order (~or (~or a b) (~or c d))) is equivalent to (~no-order a b c d).
The derived-or term covers any pattern expander or eh-mixin expander application which expands to a clause-or-mixin. The expansion of pattern and eh-mixin expanders happens before inlining the top ~or clauses.
pattern expander
(~no-order clause-or-mixin ...)
clause-or-mixin = syntax-pattern | (~mixin eh-alternative-mixin) | (~or clause-or-mixin ...) | derived-or
({~seq-no-order clause-or-mixin ...})
is equivalent to (notice the extra pair of braces above):
Additionally, ~no-order can include clauses which use ~lift-rest, which lifts a pattern which matches the tail of an improper list.
2.1 Enforcing a partial order on the alternatives
eh-mixin expander
(~order-point point-name syntax-pattern ...)
The number associated with the first element matched by syntax-pattern ... is memorised into the attribute point-name.
This allows the position of elements matched by otherwise independent mixins to be compared using order-point< and order-point>
syntax
(order-point< a b)
a = attribute-name b = attribute-name
This operation does not fail if a or b are bound to #f (i.e. their corresponding syntax-pattern ... did not match). Instead, in both cases, it returns #f.
syntax
(order-point> a b)
a = attribute-name b = attribute-name
This operation does not fail if a or b are bound to #f (i.e. their corresponding syntax-pattern ... did not match). Instead, in both cases, it returns #f.
syntax
(try-order-point< a b)
a = attribute-name b = attribute-name
It can be used as follows:
(~post-fail "a must appear after b" #:when (try-order-point< a b))
The same caveats as for try-attribute apply.
syntax
(try-order-point> a b)
a = attribute-name b = attribute-name
It can be used as follows:
(~post-fail "a must appear before b" #:when (try-order-point> a b))
The same caveats as for try-attribute apply.
eh-mixin-expander
(~before other message pat ...)
{~order-point pt {~seq pat ...} {~post-fail message #:when (order-point> pt other)}}
Note: Hopefully ~before will be modified in the future so that it auto-detects if the other order-point is not defined as part of the current ~no-order. Do not rely on comparisons with order points somehow defined outside the current ~no-order, as that behaviour may change in the future.
This is implemented as a pre operation.
eh-mixin-expander
(~after other message pat ...)
{~order-point pt {~seq pat ...} {~post-fail message #:when (order-point< pt other)}}
Note: Hopefully ~after will be modified in the future so that it auto-detects if the other order-point is not defined as part of the current ~no-order. Do not rely on comparisons with order points somehow defined outside the current ~no-order, as that behaviour may change in the future.
This is implemented as a pre operation.
eh-mixin-expander
(~try-before other message pat ...)
{~order-point pt {~seq pat ...} {~post-fail message #:when (try-order-point> pt other)}}
Note: Hopefully ~before will be modified in the future so that it auto-detects if the other order-point is missing. This form will then be removed.
This is implemented as a pre operation.
eh-mixin-expander
(~try-after other message pat ...)
{~order-point pt {~seq pat ...} {~post-fail message #:when (try-order-point< pt other)}}
Note: Hopefully ~after will be modified in the future so that it auto-detects if the other order-point is missing. This form will then be removed.
This is implemented as a pre operation.
3 Parsing the tail of improper lists
eh-mixin expander
{~lift-rest pat}
~lift-rest is allowed only within ~no-order, but not within ~seq-no-order. ~seq-no-order always matches against a proper sequence of elements, while ~no-order may match a proper or improper list.
The tail of the improper list must not be a pair, otherwise the car would have been included in the main part of the list.
The pat is used to match the tail only if its surrounding pattern successfully matched some elements of the main section of the list.
If the {~lift-rest pat} is the only pattern present within an alternative, then it is always used.
Example:> (syntax-parse #'(x y z . 1) [(~no-order {~lift-rest r:nat} i:id) (syntax->datum #'(r i ...))]) '(1 x y z)
Among the lifted rest patterns which are considered (see the point above), only one may successfully match. An error is raised if two or more lifted rest patterns successfully match against the tail of the list.
Examples:(define p (syntax-parser [(~no-order {~and {~literal x} {~lift-rest rn:nat} {~lift-rest ri:id}} {~and {~literal y} {~lift-rest rs:str} {~lift-rest rj:id}}) 'match])) > (p #'(x . 1)) ; rn and ri considered, rn matched 'match
> (p #'(x . z)) ; rn and ri considered, ri matched 'match
> (p #'(y . "a")) ; rs and rj considered, rs matched 'match
> (p #'(y . z)) ; rs and rj considered, rj matched 'match
> (p #'(x y . 1)) ; all four considered, rn matched 'match
> (p #'(x y . z)) ; all four considered, both ri and rj matched eval:7.0: x: more than one of the lifted rest patterns
matched
at: (x y . z)
in: (x y . z)
The rationale is that selecting the first lifted rest pattern that matches would result in unclear behaviour, as the order of the alternative clauses should not be significant.
Post and global operations can be used within the pat. This combination of features is not thoroughly tested, however. Please report any issues you run into.
eh-mixin expander
{~as-rest pat ...}
(define p2 (syntax-parser [(~no-order {~once name:id} {~once message:str} (~once (~or {~as-rest val:nat} {~seq {~lift-rest val:nat}}))) (syntax->datum #'(#:name name #:messsage message #:val val))]))
> (p2 #'(x 123 "msg")) ; matched by ~as-rest '(#:name x #:messsage "msg" #:val 123)
> (p2 #'(x "msg" 123)) ; matched by ~as-rest '(#:name x #:messsage "msg" #:val 123)
> (p2 #'(x "msg" . 456)) ; matched by ~lift-rest '(#:name x #:messsage "msg" #:val 456)
> (p2 #'(x "msg" 123 . 456)) ; can't have both eval:5.0: x: bad syntax
in: (x "msg" 123 . 456)
4 Pre, global and post operations
Pre operations happen before the ~! backtracking cut, so they can affect what combination of alternative clauses the parser will choose. Post operations happen after the ~! backtracking cut, and can only reject the ~no-order or ~seq-no-order as a whole (i.e. different orders will not be attempted after a ~post-fail. Global operations will always succeed.
Post operations can access the attributes defined by global and pre operations as well as attributes defined by the alternative clauses. Global operations cannot access the attributes of post operations, and pre operations cannot access the attributes of global and post operations. See Order in which the attributes are bound for post operations and global operations for more details.
4.1 Pre operations
eh-mixin expander
Known issues: this may not behave as expected if ~named-seq appears under ellipses.
This probably should bind the sequence attribute before the "global" operations, instead of being a "post" operation, and may be changed in that way the future.
eh-mixin expander
(~maybe/empty syntax-pattern ...)
eh-mixin expander
(~optional/else syntax-pattern maybe-defaults else-post-fail ... maybe-name)
maybe-defaults =
| #:defaults (default-binding ...) else-post-fail = #:else-post-fail message #:when condition | #:else-post-fail #:when condition message | #:else-post-fail message #:unless unless-condition | #:else-post-fail #:unless unless-condition message maybe-name =
| #:name attribute-name
it uses the default values for the attributes as specified with #:defaults.
for each #:else-post-fail clause, it checks whether the condition or unless-condition is true or false, respectively. If this is the case the whole ~seq-no-order or ~no-order is rejected with the given message. The behaviour of #:else-post-fail is the same as the behaviour of ~post-fail, except that the "post" conditional failure can only be executed if the optional syntax-pattern was not matched.
Note that there is an implicit cut (~!) between the no-order patterns and the "post" checks, so after a ~post-fail fails, syntax-parse does not backtrack and attempt different combinations of patterns to match the sequence, nor does it backtrack and attempt to match a shorter sequence. This is by design, as it allows for better error messages (syntax-parse would otherwise attempt and possibly succeed in matching a shorter sequence, then just treat the remaining terms as "unexpected terms").
The meaning of #:name attribute-name option is the same as for ~optional.
4.2 Global operations
The global patterns presented below match all of the given syntax-patterns, like ~and does, and perform a global aggregation over all the values corresponding to successful matches of a global pattern using the same attribute-name.
After the whole ~seq-no-order or ~no-order finished matching its contents, but before "post" operations are executed, the attribute attribute-name is bound to (aggrgate-function value₁ ... valueₙ), where each valueᵢ is the value which was passed to an occurrence of ~global-or with the same attribute-name, and which successfully matched. The aggregate-function will be or for ~global-or, and for ~global-and or + for ~global-counter.
attributes already bound in the current alternative clause within the current ~no-order or ~seq-no-order
attributes bound by the syntax-patternss
attributes already bound outside of the ~no-order or ~seq-no-order
but it cannot access attributes bound in other alternative clauses within the current ~no-order or ~seq-no-order.
The valueᵢ are aggregated with or, and or + in the order in which they appear in the ~no-order or ~seq-no-order. If a valueᵢ appears under ellipses, or as part of an alternative clause which can match more than once (i.e. not ~once or ~optional), then each match within that valueᵢ group is aggregated in the order it appears.
Since this notion of order is rather complex, it is possible that future versions of this library will always return a boolean (#f or #t for ~global-or and ~global-and, which would make the notion of order irrelevant.
eh-mixin expander
(~global-or attribute-name+value syntax-pattern ...)
attribute-name+value = attribute-name | [attribute-name valueᵢ]
If the valueᵢ is omitted, #t is used as a default.
The result is always transformed into a boolean, so attribute-name is always bound to either #t or #f.
eh-mixin expander
(~global-and attribute-name+value syntax-pattern ...)
attribute-name+value = [attribute-name valueᵢ]
If there is at least one occurrence of ~global-and for that attribute-name which successfully matches, the result of the (and valueᵢ ...) is always coerced to a boolean, so attribute-name is always bound to either #t or #f.
If there are no matches at all, the special value 'none is used instead of #t as would be produced by (and).
eh-mixin expander
(~global-counter attribute-name+value syntax-pattern ...)
attribute-name+value = attribute-name | [attribute-name valueᵢ]
If the valueᵢ is omitted, 1 is used as a default.
4.3 Post operations
eh-mixin expander
(~post-check A-pattern)
If unspecified, the syntax-pattern defaults to (~nop).
eh-mixin expander
(~post-fail message #:when condition)
(~post-fail #:when condition message) (~post-fail message #:unless unless-condition) (~post-fail #:unless unless-condition message)
Note that there is an implicit cut (~!) between the no-order patterns and the "post" checks, so after a ~post-fail fails, syntax-parse does not backtrack and attempt different combinations of patterns to match the sequence, nor does it backtrack and attempt to match a shorter sequence. This is by design, as it allows for better error messages (syntax-parse would otherwise attempt and possibly succeed in matching a shorter sequence, then just treat the remaining terms as "unexpected terms").
4.4 Order in which the attributes are bound for post operations and global operations
Within the A-patterns of post operations, the regular attributes bound by all the clauses inside ~seq-no-order or ~no-order are bound. The attributes defined as part of all "global" actions are bound too. The attributes defined as part of "post" actions of other clauses are bound only if the clause defining them appears before the current clause in the source code. For example, the following code works because the clause containing {~post-fail "2 is incompatible with 1" #:when (not (attribute a))} appears after the clause which binds a with the "post" action {~post-check {~bind ([a #'the-a])}}.
{~seq-no-order {~post-check {~and the-a 1} {~bind ([a #'the-a])}} {~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}}}
If the two clauses are swapped, then the following code would raise a syntax error because a is not bound as an attribute in the ~post-fail:
{~seq-no-order {~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}} {~post-check {~and the-a 1} {~bind ([a #'the-a])}}}
On the other hand, the following code, which does not bind a as part of a post operation, is valid:
{~seq-no-order {~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}} {~and the-a 1 {~bind ([a #'the-a])}}}
Furthermore, the following code still works, as attributes are bound by the "global" operations before the "post" operations are executed:
{~seq-no-order {~and 2 {~post-fail "2 is incompatible with 1" #:when (not (attribute a))}} {~global-or a 1}}
Note that the order in which clauses appear within the ~seq-no-order or ~no-order does not impact the order in which the elements must appear in the matched syntax (aside from issues related to greediness).
syntax
This macro can be used to check for mutual exclusion of an attribute which is bound by other mixins that might or might not be present in the final ~no-order or ~seq-no-order.
Use this sparingly, as if an syntax pattern variable with that name is bound by an outer scope, the try-attribute macro will still access it, ignorant of the fact that the current ~seq-no-order does not contain any mixin which binds that attribute.
Instead, it is better practice to use {~global-or [attribute-name #f]} or {~global-and [attribute-name #t]} to ensure that the attribute is declared, while using the operation’s neutral element to not alter the final result.
syntax
(if-attribute attribute-name if-branch else-branch)
The same caveats as for try-attribute apply.
5 Miscellaneous pattern expanders
pattern expander
{~nop}
6 Chaining macro calls without re-parsing everything
syntax
(define/syntax-parse+simple (name-or-curry . syntax-pattern) . body)
name-or-curry = name | (name-or-curry arg ...) maybe-define-class =
| #:define-syntax-class splicing-name maybe-define-splicing-class =
| #:define-splicing-syntax-class splicing-name name = identifier? class-name = identifier? splicing-name = identifier?
The syntax pattern for the name macro’s arguments can be saved in a splicing syntax class by specifying the #:define-splicing-syntax-class option. The pattern only includes the arguments after the name, i.e it matches (stx-cdr stx).
The syntax pattern for the name macro’s arguments can be saved in a syntax class by specifying the #:define-syntax-class option. The pattern only includes the arguments after the name, i.e it matches (stx-cdr stx).
If the caller macro which uses (name-forward-attributes) parsed its own stx argument using class-id, then (name-forward-attributes) is equivalent to expanding (name stx).
The name-forward-attributes function is defined at the same meta level as name, i.e. at the same meta-level where this library was required.
for-template syntax
(define-syntax/parse+simple (name . syntax-pattern) . body)
This is the same as define/syntax-parse+simple, except that it operates at level -1 relative to this library, and defines at that level a transformer binding (which therefore executes at the same meta-level as this library. In other words, (define-syntax/parse+simple (name . pat) . body) is roughly equivalent to:
(begin-for-syntax (define/syntax-parse+simple (tmp . pat) . body) (define name-forward-attributes tmp-forward-attributes)) (define-syntax name tmp)