PEG
This library implements a PEG parser generator.
1 Introduction
PEG can be thought of as an advance over regex. It can match more languages (for example balanced brackets) and can be paired with semantic actions to produce structured results from a parse.
The PEG language is implemented as a system of macros that compiles parser descriptions (rules) into scheme code. It is also provided with a custom syntax via "#lang peg".
The generated code parses text by interacting with the "PEG VM", which is a set of registers holding the input text, input position, control stack for backtracking and error reporting notes.
2 Syntax Reference
2.1 define-peg
syntax
(define-peg name rule)
<rule> = (epsilon) ; always succeeds | (char c) ; matches the character c | (any-char) ; matches any character | (range c1 c2) ; match any char between c1 and c2 | (string str) ; matches string str | (and <rule> ...) ; sequence | (or <rule> ...) ; prioritized choice | (* <rule> ...) ; zero or more | (+ <rule> ...) ; one or more | (? <rule> ...) ; zero or one | (call name) | (capture name <rule>) | (! <rule> ...) ; negative lookahead | (& <rule>) ; positive lookahead | (drop <rule> ...) ; discard the semantic result on matching
syntax
(define-peg name rule action)
We also provide shorthands for some common semantic actions:
syntax
(define-peg/drop name rule)
makes the parser produce no result.
syntax
(define-peg/bake name rule)
transforms the peg-result into a scheme object.
syntax
(define-peg/tag name rule)
tags the result with the peg rule name. Useful for parsers that create an AST.
2.2 peg
syntax
(peg rule input-text)
3 Examples
3.1 Example 1
For a simple example, lets try splitting a sentence into words. We can describe a word as one or more of non-space characters, optionally followed by a space:
> (require peg) > (define sentence "the quick brown fox jumps over the lazy dog") > (define-peg non-space (and (! #\space) (any-char))) > (define-peg/bake word (and (+ non-space) (drop (? #\space)))) > (peg word sentence) "the" > (peg (+ word) sentence) '("the" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog")
#lang peg |
|
(define sentence "the quick brown fox jumps over the lazy dog"); //yes, we can use |
//one-line comments and any sequence of s-exps BEFORE the grammar definition |
|
non-space <- (! ' ') . ; //the dot is "any-char" in peg |
word <- c:(non-space+ ~(' ' ?)) -> c ; //the ~ is drop |
//we can use ident:peg to act as (name ident peg). |
//and rule <- exp -> action is equal to (define-peg rule exp action) |
//with this in a file, we can use the repl of drracket to do exactly the |
//uses of peg above. |
3.2 Example 2
Here is a simple calculator example that demonstrates semantic actions and recursive rules.
(define-peg number (name res (+ (range #\0 #\9))) (string->number res)) (define-peg sum (and (name v1 prod) (? (and #\+ (name v2 sum)))) (if v2 (+ v1 v2) v1)) (define-peg prod (and (name v1 number) (? (and #\* (name v2 prod)))) (if v2 (* v1 v2) v1))
this grammar in peg lang is equivalent to:
#lang peg number <- res:[0-9]+ -> (string->number res); sum <- v1:prod ('+' v2:sum)? -> (if v2 (+ v1 v2) v1); prod <- v1:number ('*' v2:prod)? -> (if v2 (* v1 v2) v1);
Usage:
> (peg sum "2+3*4") 14 > (peg sum "2*3+4") 10 > (peg sum "7*2+3*4") 26
3.3 Example 3
Here is an example of parsing balanced parenthesis. It demonstrates a common technique of using _ for skipping whitespace, and using "define-peg/bake" to produce a list rather than a sequence from a *.
#lang racket (require peg) (define-peg/drop _ (* (or #\space #\newline))) (define-peg symbol (and (name res (+ (and (! #\( #\) #\space #\newline) (any-char)))) _) (string->symbol res)) (define-peg/bake sexp (or symbol (and (drop #\() (* sexp) (drop #\) _))))
or in PEG syntax:
#lang peg _ < [ \n]*; symbol <- res:(![() \n] .)+ _ -> (string->symbol res); sexp <- s:symbol / ~'(' s:sexp* ~')' _ -> s; // had to use s: -> s because there is no way to express bake from the PEG language
Usage:
> (peg sexp "(foob (ar baz)quux)") '(foob (ar baz) quux) > (peg sexp "((())(()(())))") '((()) (() (()))) > (peg sexp "(lambda (x) (list x (list (quote quote) x)))") '(lambda (x) (list x (list 'quote x)))
4 PEG Syntax
This package also provides a "#lang peg" alternative, to allow you to make parsers in a more standard PEG syntax.
4.1 PEG Syntax Reference
The best way to understand the PEG syntax would be by reference to examples, there are many simple examples in the racket peg repo and the follow is the actual grammar used by racket-peg to implemet the peg lang:
Note: When you match the empty string in peg lang, the result object is the empty sequence, not the empty string, be careful.
#lang peg |
|
(require "s-exp.rkt"); |
|
_ < ([ \t\n] / '//' [^\n]*)*; |
SLASH < '/' _; |
|
name <-- [a-zA-Z_] [a-zA-Z0-9\-_.]* _; |
|
rule <-- name ('<--' / '<-' / '<') _ pattern ('->' _ s-exp _)? ';' _; |
pattern <-- alternative (SLASH alternative)*; |
alternative <-- expression+; |
expression <-- (name ~':' _)? ([!&~] _)? primary ([*+?] _)?; |
primary <-- '(' _ pattern ')' _ / '.' _ / literal / charclass / name; |
|
literal <-- ~['] (~[\\] ['\\] / !['\\] .)* ~['] _; |
|
charclass <-- ~'[' '^'? (cc-range / cc-escape / cc-single)+ ~']' _; |
cc-range <-- cc-char ~'-' cc-char; |
cc-escape <-- ~[\\] .; |
cc-single <-- cc-char; |
cc-char <- !cc-escape-char . / 'n' / 't'; |
cc-escape-char <- '[' / ']' / '-' / '^' / '\\' / 'n' / 't'; |
|
peg <-- _ import* rule+; |
import <-- s-exp _ ';' _; |