Xsmith

7.7

top ← prev up next →

Xsmith

William Hatch <william@hatch.uno>
and Eric Eide <eeide@cs.utah.edu>

package: xsmith

Version xsmith 1.0.0 (unknown)

1 Introduction

Xsmith is a library for creating fuzz testers, also known as fuzzers, for programming language compilers and interpreters. In other words, Xsmith is a library for creating random program generators.

Xsmith implements a domain-specific language (DSL) for defining random program generators. The DSL is used to specify a programming language’s grammar, typing rules, and other information that guides generation choices. Xsmith also includes utilities for creating a command-line interface for generating a single program or starting a web server that generates one program per request.

Xsmith is bundled with some example random program generators that were created using the library. If you just want to run them, you can jump ahead to Xsmith’s Bundled Generators (And How to Run Them).

Reporting bugs. Please send bug reports and fixes to xsmith-bugs@flux.utah.edu.

Getting help. There is a mailing list, xsmith-dev@flux.utah.edu, for discussing Xsmith. Visit the xsmith-dev list management website to subscribe. To post messages to the list, you must first subscribe to the list.

2 How to Install Xsmith

First, install Racket. If your operating system’s package manager doesn’t have a package or you want a fresher version, download it.

Then run raco pkg install xsmith.

3 Xsmith Guide

Xsmith uses RACR, an attribute grammar library, in its implementation, and some knowledge of RACR is necessary when using Xsmith.

To create a fuzzer with Xsmith, users create a specification by combining spec components, defined with define-spec-component. Each spec component can have portions of grammar as well as properties added to them (using add-to-grammar and add-prop). The grammar and properties are used to generate a RACR grammar, attributes for the grammar, and choice objects, which guide AST generation.

Program generation starts by generating an AST hole for a given grammar production. Generation continues by filling holes with concrete AST nodes (which may introduce new holes as child nodes). Each time a hole is to be filled, the grammar specification is used to determine potential replacements. For example, in a grammar with addition and subtraction expressions, an expression hole may be replaced by an addition or subtraction node. A choice object is created for each legal replacement. Choice objects have methods (choice-rules) which aid in choosing a concrete replacement. Some of these methods act as predicates to filter out choices that are not legal in a particular context, such as choices that introduce more holes when the maximum tree depth has been reached. The choice-weight property defines a method which determines the relative probability of each choice being chosen. The fresh property defines a method which determines how the choice is instantiated as a RACR node. Additional methods may be defined as helpers. Choice objects have access to the current-hole, so they may query RACR attributes in method bodies. Choice object classes follow the same hierarchy as the grammar, so method inheritance for choice objects is similar to attribute inheritance for RACR nodes.

RACR attributes and choice object methods may be added directly with add-att-rule and add-choice-rule, respectively, but many are defined indirectly by various Xsmith properties. Properties allow users to specify various attributes and choice rules in a more declarative fashion.

The primary intended use case of Xsmith is differential compiler/interpreter testing. Therefore Xsmith takes pains to avoid producing programs that rely on commonly unspecified behaviors, such as the order of evaluation of function or operator arguments. Because the language-agnostic analysis within Xsmith is quite conservative, there are many valid programs that will not be generated. However, Xsmith makes it easy to create a fuzzer that obeys such rules without much language-specific work.

3.1 RACR Overview

RACR is a library for Reference Attribute Grammars. Xsmith’s DSL defines a RACR grammar specification as well as various attributes. The attributes are queried to determine how to generate the AST.

RACR caches the results of attribute queries and keeps track of the nodes accessed for any attribute. When nodes used in an attribute computation are changed, future queries to that attribute are re-computed.

Users can specify new RACR attributes for Xsmith generators, but they should use add-att-rule or add-prop from Xsmith rather than using RACR functions directly. In expressions evaluated in the context of RACR attributes (att-rules) or choice rules, RACR attributes may be queried.

The main RACR APIs of interest are:

Functions for querying the AST:

att-value
ast-child
ast-children
ast-parent

Xsmith provides a function which generates a complete AST, but users can also perform AST rewrites after initial program generation. Relevant RACR functions for performing AST rewrites include:

perform-rewrites

Full RACR documentation is here.

3.2 Holes and Choice Objects

Hole nodes are RACR AST nodes. For every node type in the grammar, a hole node is created as a subclass of it, inheriting all of its RACR attributes. A hole can be recognized by the xsmith_is-hole? attribute.

Consider the following (partial) grammar:

(add-to-grammar
my-spec-component
[Expression #f ()]
[LiteralInt Expression (v = (random 1000))]
[AdditionExpression Expression ([left : Expression] [right : Expression])])

When a fresh AdditionExpression is created, it will include two Expression hole nodes. When the generator gets to those holes, a choice object is created for each subclass of Expression (including Expression itself unless it is disabled with the may-be-generated property). The choice objects have types corresponding to LiteralInt and AdditionExpression, and therefore may have different implementations for various choice methods. The choice objects all have access to the Expression hole (through current-hole), but while choice objects have access to their specialized choice method implementations, the hole is of type Expression, and so all RACR attributes (att-rules) that may be queried are specialized only as far as Expression, not to LiteralInt or AdditionExpression.

Note that hole node types are created for every type in the grammar (including LiteralInt and AdditionExpression), but more specialized holes are only used if the grammar specifies that a node’s child must be specifically that kind of expression, or if a custom fresh implementation uses make-hole with the specific kind of expression.

3.3 Scope Graphs

Xsmith uses the language-independent resolution algorithm of scope graphs.

The theory of scope graphs is described in the paper “A Theory of Name Resolution with Extended Coverage and Proofs”, (available at https://researchr.org/publication/NeronTVW15).

3.4 Minimal Example

#lang racket/base

(require xsmith racr racket/string)

(define-spec-component arith)

(add-to-grammar

arith

[Program #f (Expression)]

[Expression #f ()

#:prop may-be-generated #f]

[LiteralInt Expression ([v = (random 100)])]

[Addition Expression ([l : Expression] [r : Expression])

;; The default weight is 10.

#:prop choice-weight 20])

(define int (base-type 'int))

(add-prop arith type-info

[Program [int (λ (n t) (hash 'Expression int))]]

[LiteralInt [int (λ (n t) (hash))]]

[Addition [int (λ (n t) (hash 'l int 'r int))]])

(add-att-rule

arith ugly-print

[Program (λ (n) (att-value 'ugly-print (ast-child 'Expression n)))]

[LiteralInt (λ (n) (number->string (ast-child 'v n)))]

[Addition (λ (n) (format "(~a + ~a)"

(att-value 'ugly-print (ast-child 'l n))

(att-value 'ugly-print (ast-child 'r n))))])

;; This line defines `arithmetic-generate-ast`.

(assemble-spec-components arithmetic arith)

(define (arithmetic-generate-and-print)

(displayln (att-value 'ugly-print (arithmetic-generate-ast 'Program))))

(xsmith-command-line arithmetic-generate-and-print

#:comment-wrap (λ (lines)

(string-join

(map (λ (x) (format "// ~a" x)) lines)

"\n")))

3.5 Another Small Example With Variables

#lang racket/base

(require xsmith racr racket/pretty racket/string)

(define-spec-component arith)

(add-to-grammar

arith

[Definition #f (name type Expression)

#:prop binder-info (name type definition)]

[Expression #f ()

#:prop may-be-generated #f]

[LetStar Expression ([definitions : Definition *]

[sideEs : Expression * = (random 2)]

Expression)

#:prop strict-child-order? #t]

[VariableReference Expression (name)

#:prop reference-info (read name)]

[SetBangRet Expression (name Expression)

#:prop reference-info (write name)]

[LiteralInt Expression ([v = (random 100)])]

[Addition Expression ([es : Expression * = (+ 1 (random 5))])

#:prop choice-weight 50])

(define int (base-type 'int))

(add-prop arith type-info

[Definition [(fresh-type-variable) (λ (n t) (hash 'Expression t))]]

[LetStar [(fresh-type-variable)

(λ (n t) (hash 'definitions (fresh-type-variable)

'sideEs (fresh-type-variable)

'Expression t))]]

[LiteralInt [int (λ (n t) (hash))]]

[VariableReference [(fresh-type-variable) (λ (n t) (hash))]]

[SetBangRet [(fresh-type-variable) (λ (n t) (hash 'Expression t))]]

[Addition [int (λ (n t) (hash 'es t))]])

(add-att-rule

arith to-s-exp

[LetStar

(λ (n)

`(let* (,@(map (λ (d)

`[,(string->symbol (ast-child 'name d))

,(att-value 'to-s-exp

(ast-child 'Expression d))])

(ast-children (ast-child 'definitions n))))

,@(map (λ (c) (att-value 'to-s-exp c))

(ast-children (ast-child 'sideEs n)))

,(att-value 'to-s-exp (ast-child 'Expression n))))]

[LiteralInt (λ (n) (ast-child 'v n))]

[VariableReference (λ (n) (string->symbol (ast-child 'name n)))]

[SetBangRet (λ (n) `(begin (set! ,(string->symbol (ast-child 'name n))

,(att-value 'to-s-exp

(ast-child 'Expression n)))

,(string->symbol (ast-child 'name n))))]

[Addition (λ (n) `(+ ,@(map (λ (c) (att-value 'to-s-exp c))

(ast-children (ast-child 'es n)))))])

(assemble-spec-components arithmetic arith)

(define (arithmetic-generate-and-print)

(parameterize ([current-xsmith-type-constructor-thunks (list (λ () int))])

(pretty-print (att-value 'to-s-exp (arithmetic-generate-ast 'LetStar))

(current-output-port)

1)))

(xsmith-command-line arithmetic-generate-and-print

#:comment-wrap (λ (lines)

(string-join

(map (λ (x) (format ";; ~a" x)) lines)

"\n")))

4 Xsmith Reference

4.1 Stability

Xsmith does not currently promise API stability between versions.

4.2 Auto-Generated att-rules and choice-rules

Many attributes and choice-rules are auto-generated by Xsmith:

att-rules

'xsmith_is-hole?
Accepts no arguments other than the node to call it on. Returns #t when called on a hole node, otherwise returns #f.
Example:
; Within the context of a choice method, this will return #t
(att-value 'xsmith_is-hole? (current-hole))

'xsmith_find-descendants

Accepts the node to call it on, then a predicate. Returns a list of all descendant nodes that satisfy the predicate.

Example:

; This will return a list of all addition and subtraction nodes.
; It will not return any nodes that are subtypes, though!
(att-value 'xsmith_find-descendants ast-node (λ (n) (member (ast-node-type n)
'(AdditionExpression
SubtractionExpression))))

'xsmith_find-a-descendant
Accepts the node to call it on, then a predicate. Like 'xsmith_find-descendants, but it only returns one. If no descendant matching the predicate is found, it returns #f. It is mostly useful if you just want to know if there is any descendant matching a type, such as to determine whether to do an analysis based on the presence or absence of some feature.
'xsmith_binding Accepts the node to call it on and optionally a boolean require-binder-or-reference flag (defaults to #t). If the given node is a reference, as determined by the reference-info property, then the reference is resolved and a binding object is returned. If the given node is a binder, as determined by the binder-info property, then its binding object is returned. For any other node, #f is returned when require-binder-or-reference is false, otherwise an error is raised.
Example:
; This raises an exception if `n` is not a binder or reference.
(att-value 'xsmith_binding n)
; If `n` is not a binder or a reference, return #f.
(att-value 'xsmith_binding n #f)
The main use of this is to do analyses where you need to look up the declaration node that a reference refers to. The binding object returned by this function contains a reference to the binder node. When resolving references, the scope graph model is used.
'xsmith_ast-depth
Accepts the node it is called on, and no other arguments. Returns the tree depth at that node. Determined by depth-increase.
Example:
(att-value 'xsmith_ast-depth n)
'xsmith_type This attribute takes no arguments (besides a RACR node) and returns the type (type?) of that node. Note that the type of the node may be a not-yet-concretized type variable.
(att-value 'xsmith_type n)

choice-rules

'xsmith_get-reference! This is a choice method that can be used when creating a reference node. If no binding of the appropriate type for the current node is in scope, this method will cause a fresh appropriate definition to be lifted into a node that can hold definitions. It returns a string? of the reference name.

Example:

(att-value 'xsmith_type (current-hole)) ; -> (fresh-type-variable int bool)

; This will return the name of a variable in scope
; that has type int or type bool.
; It may make a new definition node to do that.
(define name (send (current-hole) xsmith_get-reference!))

Note that the reference is choosing one of the possible types (via concretize-type). Probably you are using this in a fresh method to initialize a node. So you will want to put that name in the node.

Generally you don’t need to do this manually, however. Nodes with the reference-info property marked will automatically have their name fields initialized, and nodes with the binder-info property marked will automatically have their name and type fields initialized.

Node type names, attribute names, and choice rule names are just symbols, so they are not hygienic. The names of symbols used or defined by the xsmith library start with xsmith or _xsmith, so don’t tread on those namespaces.

4.3 Forms for Defining a Grammar and Its Attributes

syntax
(define-spec-component component-name)

Defines a spec component. Spec components include information about a language grammar and attributes, and can be combined to generate an xsmith fuzzer. You add grammar productions with add-to-grammar, you add properties with add-prop, and you can add att-rules and choice-rules with add-att-rule and add-choice-rule, respectively. Spec components are combined with assemble-spec-components.

Example:

(define-spec-component my-spec-component)
; Now use add-to-grammar, add-prop, etc.
; Then use my-spec-component in assemble-part-specs.

syntax
(assemble-spec-components spec-name
maybe-properties
spec-component ...)

maybe-properties =
| #:properties list-of-properties

Combines spec components and generates a RACR specification.

Defines spec-name as a RACR specification.

Defines <spec-name>-generate-ast as a function. The function accepts the name of a grammar production as a symbol and produces a random tree starting from a fresh node of that nonterminal. Essentially, given the name of the top-level program node, this function generates a random program.

Various att-rules are automatically defined within the spec, see Auto-Generated att-rules and choice-rules.

Properties (defined with define-property) are used to derive more RACR att-rules as well as Xsmith choice-rules. Each property may have a transformer function that alters other properties, att-rules, or choice-rules. All properties referenced within a spec-component are used to generate att-rules and choice-rules, as well as any properties specified in the maybe-properties list. Unless values for that property have also been specified within a spec component, properties in the maybe-properties list will only be able to generate rules based on the default value for the property.

Example:

(assemble-spec-components
my-spec
#:properties (depth-increase fresh wont-over-deepen introduces-scope)
my-spec-component
my-other-spec-component)

; Now `my-spec` is defined as a RACR spec,
; and `my-spec-generate-ast` is defined as a function which
; accepts a symbol argument representing the name of an AST node.

syntax
(add-to-grammar spec-component grammar-clause ...)

grammar-clause = (node-name parent-name (field ...) maybe-prop ..)

parent-name = identifier
| #f

field = name/type-id
| (name/type-id maybe-type-id maybe-kleene-star maybe-init-expr)

maybe-type-id =
| : type-name

maybe-kleene-star =
| *

maybe-init-expr =
| = init-expr

maybe-prop = #:prop prop-id prop-val

Adds grammar productions to spec-component.

node-name will be the name of the grammar production in RACR. parent-name is either the name of the parent grammar production or #f.

Names for the node and fields are limited to alphabetic characters. You may want to use camelCase style names since kebab-style or snake_style names due to this limitation.

Fields are then specified. Each nonternimal inherits all fields of its parent nodes. A field has a name, a type, an optional kleene star, and an optional initialization expression. The type of each field is the name of the nonterminal that it must be or #f for fields that may contain arbitrary Racket values. A field name may be the same as the type, in which case the type does not need to be specified. If a type is not specified and the name does not match the name of a nonterminal, then the type #f is used. If the optional kleene star is supplied, the field will be a list field. If a kleene star is provided for a non-false type, the name and type must be specified separately.

The init-expr for each field specifies a default value for the field. When a node is generated, each init-expr is evaluated unless a non-default value is supplied to the generating function. If no init-expr is supplied, the following defaults are used:

For false types, #f.
For nonterminal types, a hole node of the appropriate type.
For fields with a kleene star, an empty list.

For nodes with a kleene star, init-expr may return a list or a number. If a number is provided, the default value is a list of that many of the non-kleene-star default value.

Example:

(add-to-grammar
my-spec-component
[Expression #f ()]
[LiteralInt Expression (v = (random 1000))]
[AdditionExpression Expression ([left : Expression] [right : Expression])]
[SumExpression Expression ([addends : Expression * = (random 5)])])

The example defines a piece of a grammath that includes some kinds of expressions. When a LiteralInt expression is generated, it’s v field will be populated with a random number. Since the random expression is evaluated for each fresh LiteralInt, they will (probably) receive different values for v. When an AditionExpression node is generated, it will be populated with an Expression hole node for each of its left and right fields. When a fresh SumExpression is generated, its addends field will be populated with a list of zero to four Expression hole nodes.

syntax
(add-att-rule spec-component rule-name rule-clause ...)

rule-clause = (nonterminal-name rule-function)

Adds a RACR attribute rule to the spec-component. The format for each rule is similar to that required by RACR’s ag-rule form.

Example:

(add-att-rule
my-spec-component
interp
[LiteralInt (λ (n) (ast-child 'v n))]
[AdditionExpression (λ (n) (+ (att-value 'interp (ast-child 'left n))
(att-value 'interp (ast-child 'right n))))]
[SumExpression (λ (n) (apply + (map (λ (c) (att-value 'interp c))
(ast-children (ast-child 'addends n)))))])

syntax
(add-choice-rule spec-component rule-name rule-clause ...)

rule-clause = (nonterminal-name rule-function)

Adds an Xsmith choice-rule to the spec-component.

Xsmith creates a choice object class for each node type in the specification grammar, following the same class hierarchy that AST nodes themselves do. Choice objects are created every time Xsmith fills in a hole node. One choice object is created for every node type that is legal to use in filling the hole. Choice objects are then filtered according to the choice-filters-to-apply property, and then the choice-weight property of the remaining choice objects is used to determine the probability of choosing each one. When a choice object is selected, its fresh property is used to generate a new AST node of its type. If all choices are eliminated, an exception is raised with a message stating which filter step invalidated each potential choice.

Choice rules are methods on the choice objects. Some choice rules are used by choice-filters-to-apply to filter choices. Other choice rules may be used by those filters or in the body of the fresh property as helper methods. While most information about the AST and the current choice are probably computed using att-rules, information about choosing a specific node type to fill in an abstract hole (such as an expression hole which may be filled with many different types of expressions) are computed using choice rules.

Choice rules are methods in Racket’s class system and therefore have the this macro available for use in their bodies to access other methods (eg. with the send macro). Choice rules also have the current-hole macro available within their body so that they can query attributes of the RACR AST being elaborated (eg. with att-value to access att-rules and ast-parent to inspect other nodes in the AST).

Since choice rules are methods in Racket’s class system, they must be defined with a literal lambda (with no parameter for the implicit this argument). If a method needs to modify state (such as to cache the computation of available references of the appropriate type), I would normally recommend the “let-over-lambda” pattern, but that is not allowed in this case. To make up for this, I recommend using make-weak-hasheq to hold the state, using the this object as a key.

This is a poor example, but it demonstrates how att-rules and choice-rules can be used together to help make choices:

(add-choice-rule
my-spec-component
my-weight-helper
[#f 7]
[AdditionExpression
  (λ () (if (att-value 'my-helper-att-rule (current-hole))
            20
            5))])
(add-prop
my-spec-component
choice-weight
[#f (send this my-weight-helper)])

syntax
(add-prop spec-component prop-name prop-clause ...)

prop-clause = (nonterminal-name prop-value)

Adds property values to the spec-component.

Since property transformers are macros that may accept arbitrary domain-specific syntax, the grammar of prop-value varies for each property.

syntax
(current-hole)

Within the body of a choice-rule, (current-hole) returns the hole node being considered for replacement. This allows choice-rules to query att-rule attributes of the grammar.

Elsewhere it raises a syntax error.

syntax
(make-hole hole-type-expression)

Within the context of a spec component (eg. in the body of add-att-rule, add-prop, add-to-grammar, etc), make-hole is a function to generate a hole of a given type.

For example, to make a hole node that will eventually be replaced with some type of Expression node:

(make-hole 'Expression)

This function is essentially used by add-to-grammar as the default value for grammar fields with nonterminal types that lack an init-expr.

Outside of a spec component context, it raises a syntax error.

syntax
(make-fresh-node node-type-expression optional-field-value-dict)

Within the context of a spec component (eg. in the body of add-att-rule, add-prop, add-to-grammar, etc), make-fresh-node is a function to generate a fresh node of the given type. Construction of the new node is guided by the fresh property.

For example, to generate a fresh AdditionExpression node, specifying values for some of its fields:

(make-fresh-node 'AdditionExpression
(hash 'left (make-fresh-node 'LiteralInt
(hash 'v 5))))

4.4 Custom Properties

Properties are used to create domain-specific languages or terse declarative syntax for specifying att-rules and choice-rules. Custom properties are probably only useful for shared infrastructure for multiple languages (perhaps with the exception of define-non-inheriting-rule-property).

syntax
(define-property name
                 maybe-dups
                 maybe-reads
                 maybe-rewrites
                 maybe-appends
                 maybe-transformer)

maybe-dups =
| #:allow-duplicates? literal-bool

maybe-reads =
| #:reads transformer-arg-spec ...

maybe-rewrites =
| #:rewrites transformer-arg-spec ...

maybe-appends =
| #:appends transformer-arg-spec ...

maybe-transformer =
| #:transformer transformer-function

transformer-arg-spec = (property prop-name)
| (att-rule rule-name)
| (choice-rule rule-name)
| (grammar)

Defines a property for use with add-prop.

Properties can have a transformer function that produce att-rules, choice-rules, or other property values. Property transformers can read the values set for the property by add-prop, and optionally the values of other properties as well. Not all properties must have a transformer – some properties may just be a single place to store information that is used by (multiple) other properties.

The transformer function accepts a dictionary mapping grammar node names (as symbols) to the syntax objects from the right-hand-side of add-prop uses for each property that it reads. The transformer function must return a list of dictionaries mapping grammar node names (as symbols) to syntax objects for the properties, att-rules, and choice-rules that it writes.

Property transformers are run during assemble-spec-components in an order determined by the read/write dependencies among the properties. Appending goes before rewriting, which goes before reading.

If a property appends (using the #:appends keyword) to a property or rule, its return dictionary will be appended to the existing dictionary for that property/rule. This allows a property or rule to be specified in part by the property that appends and in part by another property or the user. If an appended rule (or property that disallows multiple values) ends up with two values for any node, an error is raised.

If a property rewrites (using the #:rewrites keyword) to a property or rule, its return dictionary replaces any previous dictionary for that property/rule. Rewriting a property also automatically implies reading that property. A property or rule may only be rewritten by one property transformer.

Example showing the arguments and return type needed by transformers: The transformer argument order is its own property, then #:reads dictionaries in the order declared, then #:rewrites dictionaries in the order declared. The transformer return order is #:rewrites dictionaries in the order declared then #:writes dictionaries in the order declared.

(define-property my-property
  #:reads (property a) (property b)
  #:rewrites (property c)
  #:appends (att-rule d) (choice-rule e)
  #:transformer
  (λ (this-prop-dict prop-a-dict prop-b-dict prop-c-dict)
    ; compute output dictionaries...
    (list dict-for-c dict-for-d dict-for-e)))

The syntax object value for a property can be anything, since property transformers define the grammar and semantics of properties.

The syntax object value for att-rules and choice-rules should be a syntax object specifying a function (IE a lambda). att-rules may be any syntax that evaluates to a function (so you may return an identifier that references a function or an expression that computes a function such as let-over-lambda), but choice-rule syntax is provided to Racket’s class macro, which requires literal lambda forms.

Dictionaries may or may not contain an entry for each nonterminal in the grammar. A dictionary may even be empty.

In addition to nonterminals, each dictionary may include a mapping for the value #f, which will define a default value used for the (super secret) parent node that assemble-spec-components defines. If nothing is specified for #f, att-rules and choice-rules will have a default which errors, providing a helpful error message.

If the #:allow-duplicates? argument is supplied and is #t, then add-prop may be used more than once for the property for the same node, and the syntax object in the dictionary for the property will be a syntax list of the syntax objects specified in the various add-prop calls. But by default only one use of add-prop is allowed per property per node type, and the syntax object in the dict is the single syntax object from the one call.

Here is a simple example that basically desugars straight to an att-rule with a default:

(define-property strict-child-order?
  #:appends (att-rule xsmith_strict-child-order?)
  #:transformer
  (λ (this-prop-info)
    (define xsmith_strict-child-order?-info
      (hash-set
       (for/hash ([(n v) (in-dict this-prop-info)])
         (values n (syntax-parse v [b:boolean #'(λ (n) b)])))
       #f #'(λ (n) #f)))
    (list xsmith_strict-child-order?-info)))

The rationale for this example property is to:

Allow values to be specified by just #t or #f, rather than (λ (n) #t)
To implicitly provide a default value to the root node (#f).

For more realistic examples of properties, see the file private/core-properties.rkt in the Xsmith implementation. Generally they are big, hairy macros.

syntax
(define-non-inheriting-rule-property property-name
                                     rule-type
                                     maybe-rule-name
                                     default-value
                                     maybe-transformer)

maybe-rule-name =
| #:rule-name rule-name

default-value = #:default default-expr

maybe-transformer =
| #:transformer transformer-func

Defines a property that generates an att-rule or a choice-rule that does NOT inherit its implementation from its superclass.

rule-type must be either att-rule or choice-rule.

rule-name defaults to property-name, but you can make it give the rule a different name than the property.

default-expr is the default value of the property. Any nonterminal that does not have a different value specified gets this one.

transformer-func is an optional transformer to transform the value. It is not called with a dictionary like the transformers of define-property, but rather it receives each value individually. This allows a small amount of sugar. Note that the value supplied as the default-expr is also transformed by the transformer-func when it is supplied. When no transformer-func is supplied, values are passed through literally.

Example:

(define-non-inheriting-rule-property
  some-bool-flag
  att-rule
  #:default #f
  #:transformer (syntax-parser [#f #'(λ () #f)]
                               [#t #'(λ () #t)]))
(add-to-grammar
a-spec-component
[A #f ()]
[B A ()]
[C B ()])
(add-prop
a-spec-component
some-bool-flag
[A #t]
[C #t])

Normally B would inherit a method from A when none was specified for it. But in this case it inherits the default (#f). When a user tries (att-value 'some-bool-flag <node-of-type-B>) it will return #f, not #t.

4.5 Scope Graph Functions

This section describes the API to the Scope Graph model of binding.

struct
(struct binding (name ast-node type def-or-param))
  name : string?
  ast-node : ast-node?
  type : type?
  def-or-param : (or/c 'definition 'parameter)

Struct for binding information of nodes that create bindings.

Notably this is returned by the att-rule xsmith_binding.

The ast-node field is the grammar node containing the definition of name. The def-or-param field is there to distinguish names that are bound as function parameters vs names that are bound as (local or global) definitions.

Probably all you need to do with this, though, is get the name field and stuff it in the name field of your definition nodes in the fresh method.

parameter
(current-well-formedness-regexp) → regexp?
(current-well-formedness-regexp r) → void?
r : regexp?
= #px"rp*i?d"

For most languages you probably don’t need to fuss with this.

When the xsmith_binding attribute is used or when Xsmith searches for a valid reference with xsmith_get-reference, this regexp determines valid scope-graph resolution paths. The path elements (reference, parent, import, definition) are turned into characters (r, p, i, and d respectively). If the path from reference to definition matches this regexp, it is valid. If two definitions have the same name and paths from a reference to both definitions are valid, the definition that is in scope for the reference is determined by current-path-greater-than.

Because Xsmith doesn’t currently support import elements at all, modifying this regexp is somewhat of a moot point.

parameter
(current-path-greater-than)
→
(-> (listof/c (or/c 'reference 'parent 'import 'declaration))
    (listof/c (or/c 'reference 'parent 'import 'declaration))
    any/c)
(current-path-greater-than comparator) → void?
   comparator :
(-> (listof/c (or/c 'reference 'parent 'import 'declaration))
    (listof/c (or/c 'reference 'parent 'import 'declaration))
    any/c)

If there are two valid resolution paths (determined by current-well-formedness-regexp) for a name, this comparator determines which path is chosen. The comparator must return a non-false value if the left operand is greater than the right, otherwise #f The greatest path is chosen.

By default the comparator walks down the paths, comparing each element. A path is greater than another if its first differing element is greater. From least to greatest, path elements are 'reference, 'parent, 'import, 'declaration.

For most languages you probably don’t need to fuss with this.

4.6 Core Properties

These properties are available for specification on Xsmith grammars. Many of them define or modify the behavior of core functionality, such as fresh modifying how fresh program nodes are instantiated, or type-info defining a language’s type rules.

Many are optional, but for any language at all you probably need to use type-info and may-be-generated. Additionally, for any language with variables you need binder-info and reference-info. Next, the fresh property will likely become necessary for a few fields that Xsmith can’t infer. Then the most important of the truly optional properties is likely choice-weight.

spec-property
may-be-generated

Acceptable values for this property are #t or #f, and the default is #t.

If may-be-generated is false, the node is not added to the list of possibile choices to replace an appropriate AST hole. It is useful to set it to false for abstract node types or for specialized node types that are meant to be swapped in only after a full tree is generated, such as by a later analysis to determine validity of an unsafe operation. This property is NOT inherited by subclasses.

Example:

(add-prop
my-spec-component
may-be-generated
; Expression is abstract and should not be instantiated,
; only AdditionExpression, SubtractionExpression, etc.
[Expression #f]
; Only safe addition expressions should be generated,
; but maybe a later pass after generation swaps some
; safe addition expressions for unsafe ones after analysis.
[UnsafeAdditionExpression #f])

spec-property
type-info

This property is used to specify the type system used by the generator. You should specify a type system even for dynamically typed languages so that programs don’t just crash with dynamic type errors.

Example:

(define int (base-type 'int))
(define float (base-type 'float))
(define bool (base-type 'bool))
(add-prop
my-spec-component
type-info
[AdditionExpression [(fresh-type-variable int float)
                      (λ (n t) (hash 'l t 'r t))]]
[EqualityExpression [bool
                      (λ (n t)
                        (define arg-type (fresh-type-variable))
                        (hash 'l arg-type 'r arg-type))]]
[Lambda [(function-type (fresh-type-variable) (fresh-type-variable))
          (λ (n t) (hash 'arg (function-type-arg-type t)
                         'Expression (function-type-return-type t)))]])

The property is two armed.

The first part is the type (or partially-constrained type variable) that the given node can inhabit. The expression given is evaluated fresh every time a node is type checked or considered for generation.

The second part is a function that takes a node, its type, and must return a dictionary mapping its children nodes to types. The dictionary keys may be the node objects of the node’s children OR the symbol of the field name the child inhabits. For kleene-star children, use their node unless they all should receive the same type.

spec-property
fresh

This property determines how fresh nodes are constructed (by the make-fresh-node function).

Acceptable values for this property are expressions which produce a dict? object, or expressions which produce a function of type (-> dict? dict?). Keys of the dictionary must be field names of the node being generated. The values in the dictionary are used to fill node fields of the appropriate name. Any field whose name is not in the dictionary will be filled by evaluating the default init-expr defined in the grammar (via add-to-grammar).

Example:

(add-to-grammar
my-spec-component
[Expression #f ()]
[LiteralInt Expression (v = (random 1000))]
[AdditionExpression Expression ([left : Expression] [right : Expression])])
(add-prop
my-spec-component
fresh
; Make AdditionExpressions always be generated with a literal 7 argument.
[AdditionExpression (hash 'left (make-fresh-node LiteralInt (hash 'v 7)))])

This is useful for fields that must be determined together. For examlpe, a function call needs the function name and the number of arguments to be chosen together rather than independently.

As with all choice-rules, this and current-hole are available for use in expressions, which you may want to do for eg. accessing available bindings or mutable information connected to the choice object.

If the result is a procedure instead of a dictionary, that procedure must accept and return a dictionary. It is called with a dictionary that is empty unless the node being created is the result of lifting a definition. In that case it will have the appropriate name and type fields with the name and type chosen by the lifting mechanism. In the case of lifting a definition, the name and type fields in the return dictionary are ignored. This procedure option is allowed because your fresh expression may need access to the name or type to determine the values of other fields. If a definition node only has a name and type field then a fresh property is unnecessary when lifting, and if lifting is the only way you generate definitions then fresh properties or initializers for definition nodes are unnecessary.

If the value for a field (IE values inside the result dictionary) is a procedure, it will be called with 0 arguments. This allows the fresh property to provide a default value that is not evaluated when make-fresh-node is called with an appropriate value.

spec-property
choice-weight

This property determines the probability that different kinds of nodes will be chosen. When choices have been filtered (based on choice-filters-to-apply), one of the remaining choices is chosen at random with probability (choice-weight / sum-of-choice-weights).

The expression provided as the choice weight will be evaluated in the context of a method call, so this and current-hole are available.

Choice weights should be positive integer values. The default weight is 10 unless set explicitly.

Example:

(add-prop
my-spec-component
choice-weight
            "The default choice weight."
[#f (λ () 10)]
            "Generate more AdditionExpressions"
[AdditionExpression 20]
[MultiplicationExpression 15]
            "Generate fewer SumExpressions"
[SumExpression 5])

spec-property
depth-increase

This property defines the xsmith_ast-depth non-inheriting att-rule.

The property accepts an expression which much evaluate to a function of one argument (the RACR AST node) which returns a truthy value for nodes which increase the depth of the AST and #f otherwise. The default is (λ (n) #t). This property is NOT inherited by subclasses.

This is useful to allow node re-use. For example, the body of an if or for statement might be a block and have the same semantics, but you might want a block inside an if to only be considered a depth increase of 1, not 2.

Example:

(define no-depth-if-body-is-block
(λ (n) (if (node-subtype? (ast-child 'body n) 'Block) 0 1)))
(add-prop
my-spec-component
depth-increase
[IfStatement no-depth-if-body-is-block]
[ForStatement no-depth-if-body-is-block])

spec-property
wont-over-deepen

The default for this property is probably what you want, so probably just be sure to add this to the extra #:properties flag of assemble-part-specs.

But if you want to set it:

The property accepts expressions which will evaluate to booleans (IE anything but only #f is false...), which are evaluated if the choice is made at the point where the AST is at it maximum depth. A true value means that the choice is acceptable, false otherwise. The default is computed by checking whether a node includes AST-node-typed fields. If it does not it is considered atomic and therefore acceptable to choose when the AST is already at its maximum depth.

spec-property
binder-info

This property is used to mark nodes that define bindings. The property consists of a length-3 list. The first two are field names, one for the name of the field that stores the binding name, one for the name of the field that stores the binding type. The last field is either definition or parameter, reflecting whether the binding is a function parameter. This is used by some Xsmith analyses about higher order values.

Example:

(add-to-grammar
my-spec-component
[Definition #f (name type Expression)]
[Reference #f (name)])
(add-prop
my-spec-component
binder-info
[Definition (name type definition)])

spec-property
reference-info

This property marks nodes that are reference nodes. The argument for the property is a list containing:

The identifier read or the identifier write, indicating whether the reference reads or writes the variable
The name of the field that stores the reference name (as an identifier).

Example:

(add-prop
my-spec-component
reference-info
[Reference (read name)])

spec-property
binding-structure

This property is used on nodes that can have binders as children. It determines the visibility of those binders to their siblings. Options are 'serial (like let* in scheme), 'parallel (like let in scheme), and 'recursive (like letrec in scheme).

If the property is not specified, 'serial is assumed and used as a default.

Example:

(add-to-grammar
my-spec-component
[Let #f ([definitions : Definition *] Expression)]
[Letstar #f ([definitions : Definition *] Expression)]
[Letrec #f ([definitions : Definition *] Expression)])

(add-prop
my-spec-component
binding-structure
[Let 'parallel]
; Letstar we can leave blank if we want because serial is the default.
[Letrec 'recursive])

spec-property
strict-child-order?

Specifies that a node’s children are guaranteed by the language to have a strict evaluation order. The default is false. This property is used to determine whether nodes have a dependency in their read/write/io effect conditions. (Those conditions are set by the io and reference-info properties.)

Setting this property is unnecessary, but using it allows more liberal use of references, mutation, and io effects.

Example:

(add-prop
my-spec-component
strict-child-order?
; Most languages have some sort of sequential construct, like a block
[Block #t])

spec-property
io

Used to specify that a node has some kind of IO effect, such as printing or reading a volatile variable.

Until and unless I make something better, this should also be used for any form that mutates some kind of higher-order object. For example, setting a field in a mutable record.

Example:

(add-prop
my-spec-component
io
[Print #t]
[SetRecordField #t])

spec-property
lift-predicate

This property specifies a predicate for whether a definition of a given type can be lifted to a node.

Example:

(add-to-grammar
my-spec-component
[Let #f ([definitions : Definition *] Expression)]
[Letstar #f ([definitions : Definition *] Expression)]
[Letrec #f ([definitions : Definition *] Expression)])

(add-prop
my-spec-component
lift-predicate
; Allow any definition type to be lifted into the top level of a program.

[Program (λ (n type) #t)]
; Lifting a definition to Lambda's formal parameter list would require changing all calls.

[Lambda (λ (n type) #f)]
; Allow local variables to be lifted, except make all functions top-level.

[Let (λ (n type) (not (function-type? type)))])

spec-property
lift-type->ast-binder-type

If you have more than one binding node in your language (IE via binder-info) you must specify this property. This property should be defined once for the base node (#f). It is a mapping from the type of a desired definition (eg. int, float, int -> int, ...) to the AST node type (eg. VariableDefinition, FunctionDefinition). This is important when different kinds of definitions use different AST nodes. Otherwise it is just boilerplate...

Example:

(add-to-grammar
my-spec-component
[VariableDefinition #f (name type Expression)]
[FunctionDefinition #f (name type Body)])

(add-prop
my-spec-component
lift-type->ast-binder-type
[#f (λ (type) (if (function-type? type)
'FunctionDefinition
'VariableDefinition))])

spec-property
choice-filters-to-apply

This property accepts a syntax list of choice-rule names to use as a filter for the node type. Generally this should be set on the greatest super node type (or #f if there is no explicit super node type in your grammar). Each choice-rule in the list is called on the choice object with no arguments. Each rule that returns #f rules the node out as a choice for filling in a hole.

Example:

(add-prop
my-spec-component
choice-filters-to-apply
[#f (my-custom-filter-choice-rule my-other-filter-choice-rule)])

Some core methods are always applied in addition to this list, such as the method defined by the may-be-generated property. If you don’t make custom filtering rules you don’t need to specify this property.

4.7 Types

These type constructors and other functions are largely useful for specifying the type-info property.

While there are various predicates for different types, at any point in type checking you might actually have a type variable instead of a concrete type. So if you want to check if you have a particular type (and maybe deconstruct it), you should maybe create an instance of the type you are interested in, check if it can-unify?, then unify!-ing it if you want to deconstruct it. Note that if you do unify! a type variable, that unification needs to be consistent between multiple runs of type checking (since it runs multiple times as the tree is constructed).

procedure
(type? t) → bool?
t : any/c

Predicate for types.

procedure
(fresh-type-variable args ...) → type?
args : type?

Creates a fresh type variable. If given no arguments it is unconstrained and can unify with any type. Arguments can be provided, in which case the type variable is constrained to be one of the types given. In the optional arguments, only one function type is allowed.

Example:

; Unconstrained
(define v1 (fresh-type-variable))

(define int (base-type 'int))
(define float (base-type 'float))
(define bool (base-type 'bool))

(define v2 (fresh-type-variable int bool))

(unify! v1 v2)

(can-unify? v1 float) ; #f
(can-unify? v1 int) ; #t

(unify! v2 bool)
(can-unify? v1 int) ; #f

procedure
(type-variable? t) → bool?
t : any/c

Predicate for type variables.

procedure
(can-unify? t1 t2) → bool?
t1 : type?
t2 : type?

Returns whether two types can be unified without actually unifying them.

procedure
(unify! t1 t2) → void?
t1 : type?
t2 : type?

Unifies two types. This mutates type variables so that they match other variables or types going forward.

If unification fails an exception is raised. Right now a failure to unify might mean that type variables are left in a bad state, so code generation should just give up at that point.

procedure
(base-type name) → type?
name : symbol?

Creates a base type. Base types with the same name are the same.

procedure
(function-type arg-type return-type) → type?
arg-type : type?
return-type : type?

Creates a function type. For multi-argument functions, use a product-type for the argument type.

procedure
(function-type? t) → bool?
t : any/c

Predicate for function types.

procedure
(function-type-arg-type t) → type?
t : function-type?

Get the argument type. Remember that you can’t deconstruct type variables that are not fully constrained!

procedure
(function-type-return-type t) → type?
t : function-type?

Get the return type. Remember that you can’t deconstruct type variables that are not fully constrained!

procedure
(product-type types) → type?
types : (or/c (listof types?) #f)

Creates a product type (tuple). If types is #f, the length of the tuple is unspecified, and it can be unify!-ed with a product type of any length.

Example:

(define any-length (product-type #f))
(define l2 (product-type (list int int)))
(define l3 (product-type (list int int int)))

(can-unify? any-length l2) ; #t
(can-unify? any-length l3) ; #t
(unify! any-length l2)
(can-unify? any-length l2) ; #t
(can-unify? any-length l3) ; #f

procedure
(product-type? t) → bool?
t : any/c

Predicate for product types.

procedure
(product-type-inner-type-list t) → any/c
t : product-type?

Returns the list of types in the product-type, or #f for a product-type with a length that is still unspecified.

syntax
(define-generic-type name (type-argument ...))

Used to create generic types.

This form defines a constructor name, a predicate name?, and one accessor for each type-argument name-type-argument a la struct.

Each instance of a generic type can be unified with other instances of the same generic.

Example:

(define int (base-type 'int))
(define-generic-type list-type (type))

(define t1 (list-type int))
(generic-type? t1) ; #t
(list-type? t1) ; #t
(generic-type-type-arguments t1) ; (list int)
(list-type-type t1) ; int

(define t2 (list-type (fresh-type-variable)))
(can-unify? t1 t2) ; #t
(unify! t1 t2)

procedure
(generic-type? t) → bool?
t : any/c

Returns true when t is a generic type. Not very useful, since you probably want to know if it is an instance of a specific generic.

procedure
(generic-type-name t) → symbol?
t : generic-type?

Returns the name of a generic type. Remember that you can’t deconstruct type variables that are not fully constrained!

procedure
(generic-type-type-arguments t) → (listof type?)
t : generic-type?

Returns the inner types of a generic type as a list. Remember that you can’t deconstruct type variables that are not fully constrained!

procedure
(nominal-record-type? t) → bool?
t : any/c

Predicate for nominal record types.

Partially specified nominal-record-type?s are created with nominal-record-type-with. Fully specified nominal-record-type?s are created by using concretize-type on a partially specified one. Rather than making them manually, simply rely on Xsmith’s definition-lifting mechanism to create appropriate fully-specified nominal-record-type?s.

When a partially defined nominal-record-type is unify!-ed with a fully defined nominal-record-type, the partially defined one is mutated to become the same as the fully defined one. When two partially defined nominal-record-types are unified together, an error is raised.

Every node in a language grammar that stands for a nominal-record-type constructor, accessor, or mutator must include a reference to a nominal-record-definition-type containing the nominal-record-type being used.

The reason for this is that nominal types must be defined in the program. nominal-record-definition-types are necessary because the lookups of these type names are a different type than uses of the record type that the name is bound to.

Example:

(add-to-grammar my-grammar
...
[VariableReference Expression (name)]
...
[StructReference Expression (fieldname
                              [structdefref : VariableReference]
                              [structval : Expression])]
...)

(add-prop
my-grammar
type-info
...
[StructReference [(fresh-type-variable)
                   (λ (n t)
                     (define type-with-field
                       (nominal-record-type-with (ast-child 'fieldname n) t))
                     (hash 'structval type-with-field
                           'structdefref (nominal-record-definition-type
                                          type-with-field)))]]
...)

procedure
(nominal-record-type-with fieldname
fieldtype) → type?
fieldname : string?
fieldtype : type?

Creates a partially-specified nominal-record-type?. Use it to specify that you want a record that contains a certain type.

procedure
(any-nominal-record-type) → type?

Creates a completely unconstrained nominal-record-type?. It will unify with any fully constrained nominal-record-type?.

procedure
(nominal-record-type-name t) → any/c
t : nominal-record-type?

Getter for the name of a nominal-record-type?.

procedure
(nominal-record-type-inners t) → dict?
t : nominal-record-type?

Getter for the inner type dictionary. If you use this on a partially-specified nominal-record-type?, you will get an incomplete dictionary.

procedure
(nominal-record-definition-type t) → type?
t : nominal-record-type?

Constructor. See note in nominal-record-type? for how it is used.

procedure
(nominal-record-definition-type? t) → bool?
t : any/c

Predicate for nominal record definition types constructed with nominal-record-definition-type.

procedure
(nominal-record-definition-type-type t) → nominal-record-type?
t : nominal-record-definition-type?

Getter for the nominal-record-type inside a nominal-record-definition-type.

procedure
(concretize-type t) → type?
t : type?

Returns a fully concrete (no type variables) type that can-unify? with t.

This function can be useful if you want to generate a random type. But beware! You should probably NOT generate random types unless you also store them in the grammar node that represents that type. The type-checking code defined in the type-info property can be run many times for each node, so a node that randomly chooses its type will not be stable. Because the type algorithm imperatively unifies types, this causes mayhem. Don’t do it.

Note that to use this function you must parameterize current-xsmith-type-constructor-thunks. Probably in the generate-and-print function passed to xsmith-command-line.

parameter
(current-xsmith-type-constructor-thunks) → (listof (-> type?))
(current-xsmith-type-constructor-thunks thunk-list) → void?
thunk-list : (listof (-> type?))

This needs to be parameterized for concretize-type, which is needed in code dealing with variable definition and reference. It should consist of a list of thunks that each produce a fully concrete type when called.

The generate-and-print function passed to xsmith-command-line needs to parameterize this.

4.8 Debug Logging

procedure
(xd-printf format-string args ...) → any/c
format-string : string?
args : (listof any/c)

Like printf, but it prints to a buffer that is output when an exception is raised during program generation.

4.9 Turning Your Grammar Specification Into a Program

procedure
(xsmith-command-line generate-and-print-func
#:comment-wrap comment-wrap
#:features features
#:default-max-depth default-max-depth)
→ any/c
  generate-and-print-func : (-> any/c)
  comment-wrap : (-> (listof string?) string?)
   features :
(listof (or/c (list/c symbol? boolean?)
              (list/c symbol? boolean? string?)))
  default-max-depth : number?

This function parses the current command-line arguments for xsmith fuzzers. It is basically to be used in the main function of a fuzzer. Based on options supplied, it may print a help message and terminate the program, generate a single program, or start a web server to generate many programs.

generate-and-print-func must be a function that generates and prints a single program. It is called within xsmith-command-line with the random seed parameterized according to command-line options (and for the web server reset and incremented for each call), and with all xsmith-options parameterized according to the command line. The generate-and-print-func needs to parameterize current-type-thunks-for-concretization if your language is to support variable definitions and references.

comment-wrap takes a list of strings which contain info about the generated program, such as the command line used to generate it and the random seed number. It should return a string representing those lines commented out. Such as the following, assuming the "#" character is the line-comment character in your language:

(λ (lines)
  (string-join
   (map (λ (x) (format "# ~a" x)) lines)
   "\n"))

features takes a list of lists containing a feature name (as a symbol) and a default value (as a boolean), and optionally a list of documentation strings. Each feature will be included in the command-line options as --with-<feature-name>. Documentation strings will be displayed in the --help text, one per line. The values of these features is available via xsmith-feature-enabled?.

The command-line options given by xsmith-command-line are:

--help – see all command-line options. The --help list will automatically stay up to date, unlike this documentation.
--server <true-or-false> – whether to run the web server. Defaults to false.
--server-port <port-number> – Port to use when running server. Defaults to 8080.
--server-ip <ip-address> – Listen for connections to <ip-address>. Use false to accept connections from all the machine’s addresses. Defaults to 127.0.0.1.
--seed – Random seed for program generation. Defaults to whatever Racket does to initialize randomness normally.
--output-file <filename> – Outputs to <filename> instead of standard output when not used with --server.
--max-depth <n> – Maximum depth of the generated program tree.
--version – prints the version info of xsmith and exits.
--with-<feature> <bool> – enables or disables generator-specific features, where <feature> is replaced with concrete feature names.

procedure
(xsmith-feature-enabled? feature) → boolean?
feature : symbol?

Returns the value set by the user via xsmith-command-line for the given feature. The feature name must have been supplied to the #:features argument of xsmith-command-line, or an error will be raised.

procedure
(xsmith-max-depth) → number?

Returns the maximum tree generation depth as set by the user via xsmith-command-line.

4.10 RACR Convenience Functions

(require xsmith/racr-convenience)

package: xsmith

These are a group of convenience functions around RACR. They are not necessary, and some may not be the best choices. But I have found them a little more convenient than using certain RACR APIs directly.

syntax
(expr->ast-list length-expression field-expression)

Creates an ast-list-node? containing a list of length length-expression. For each element of the list, field-expression is evaluated again.

procedure
(node-type n) → any/c
n : any/c

Returns the symbol of the type of n, or #f if n is not a proper non-bud, non-list ast-node?.

Wrapper for ast-node-type that returns false rather than erroring when it gets bud nodes or list nodes...

procedure
(parent-node n) → any/c
n : any/c

Wrapper for ast-parent that returns #f rather than erroring when the given node doesn’t have a parent.

procedure
(top-ancestor-node n) → any/c
n : any/c

Calls parent-node until it reaches the last parent, and returns it.

procedure
(node-subtype? n) → any/c
n : any/c

Wrapper for ast-subtype? that returns #f rather than erroring when the given node is a bud, list, or non-node.

5 Xsmith’s Bundled Generators (And How to Run Them)

When xsmith is installed as a Racket package, executables for the bundled generators are placed in your Racket package bin directory. Usually this directory is $HOME/.racket/racket-<version>/bin on Linux, maybe $HOME/Library/Racket/<version>/bin on normal MacOS installs, and maybe /usr/local/bin for MacOS Homebrew installs.

These fuzzers can be run on the command line to generate a single program or as an http server that generates one program per request.

Command-line options for bundled Xsmith generators are all the same, and provided by xsmith-command-line.

5.1 Cish

Cish is a C program generator made with the Xsmith library. It has co-evolved with Xsmith, and is essentially the reference Xsmith program generator.

The executable for Cish is called xsmith-cish. Additionally, Cish can be run with the command racket -l xsmith/cish -- (the final -- causes further flags to be parsed by cish and not by Racket).

The command-line options available in Cish are:

--help – see all command-line options. The --help list will automatically stay up to date, unlike this documentation.
--server <true-or-false> – whether to run the web server. Defaults to false.
--server-port <port-number> – Port to use when running server. Defaults to 8080.
--server-ip <ip-address> – Listen for connections to <ip-address>. Use false to accept connections from all the machine’s addresses. Defaults to 127.0.0.1.
--seed – Random seed for program generation. Defaults to whatever Racket does to initialize randomness normally.
--output-file <filename> – Outputs to <filename> instead of standard output when not used with --server.
--max-depth <n> – Maximum depth of the generated program tree.
--version – prints the version info of xsmith and exits.
--with-<feature> <bool> – enables or disables generator-specific features, where <feature> is replaced with concrete feature names.

Cish supports the following features for the --with-<feature> flags:

int – defaults to #t
float – defaults to #t
addition – defaults to #t
subtraction – defaults to #t
multiplication – defaults to #t
division – defaults to #t
modulus – defaults to #t
comparisons – defaults to #t
if-expression – defaults to #t
if-statement – defaults to #t
loop-statement – defaults to #t
null-statement – defaults to #t
structs – defaults to #t
volatile – defaults to #t
unsafe-math/range – defaults to #f

To compile cish output, you need to include Csmith’s runtime directory in your header path to get safe_math.h.

eg.

xsmith-cish > cish-output.c && gcc -I $CSMITH_DIR/runtime -o cish-output cish-output.c

5.2 Schemely

The executable for Schemely is called xsmith-schemely. Additionally, Schemely can be run with the command racket -l xsmith/schemely -- (the final -- causes further flags to be parsed by cish and not by Racket).

The command-line options available in Schemely are:

--help – see all command-line options. The --help list will automatically stay up to date, unlike this documentation.
--server <true-or-false> – whether to run the web server. Defaults to false.
--server-port <port-number> – Port to use when running server. Defaults to 8080.
--server-ip <ip-address> – Listen for connections to <ip-address>. Use false to accept connections from all the machine’s addresses. Defaults to 127.0.0.1.
--seed – Random seed for program generation. Defaults to whatever Racket does to initialize randomness normally.
--output-file <filename> – Outputs to <filename> instead of standard output when not used with --server.
--max-depth <n> – Maximum depth of the generated program tree.
--version – prints the version info of xsmith and exits.
--with-<feature> <bool> – enables or disables generator-specific features, where <feature> is replaced with concrete feature names.

Schemely currently has no features for the --with-<feature> flags.

At the time of writing, Schemely really just supports Racket. At some future point it should generate portable Scheme code.

6 Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant Number 1527638. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

7 Code and License

The code is available at the University of Utah Flux Research Group’s GitLab server.

Xsmith is distributed under the following license, which is often called the “BSD License.”

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

top ← prev up next →

1	Introduction
2	How to Install Xsmith
3	Xsmith Guide
4	Xsmith Reference
5	Xsmith’s Bundled Generators (And How to Run Them)
6	Acknowledgments
7	Code and License