Tesurell:   A Self-hosting Melting Pot of Languages
1 Motivation
2 Guide
2.1 Example:   Trivial Case
2.2 Example:   Defining your own Document
2.3 Example:   Using Other Language Features
2.4 Example:   Self-hosting
2.5 Example:   Inline Language Demo
3 Reference
module/  port
embed
make-tesurell-lang
default-doc-module
reformat-doc
7.7

Tesurell: A Self-hosting Melting Pot of Languages

Sage Gerard

 #lang tesurell package: tesurell

Tesurell is a markup language that supports inline use of other #langs, including itself. When used as a module, Tesurell helps you use #langs via input ports, and helps you define other languages that support inline #langs.

1 Motivation

When I write Racket programs using different languages I end up with a bunch of files. That makes sense when those files represent modular components in a sufficiently large system. Thing is, I don’t always want to bounce between files to express one composite idea.

Other libraries like multi-lang and polyglot address this problem by writing Racket modules to disk for later processing. But sometimes disk activity and the filesystem are interruptions. Tesurell aims to minimize that.

Dancing between notations can also be really fun and productive for creative types. Tesurell gives Racket the LaTeX-like ability to swap out notations to some desired effect when writing.

2 Guide

If you can write Scribble, you can write Tesurell markup. They both use scribble/reader, and use (provide doc) to share content. If you run a Tesurell module directly using DrRacket or the racket launcher, it will evaluate (write doc). Each Tesurell module provides all bindings from racket/base, plus those in the Reference section.

The differences are more interesting. Tesurell documents do not prescribe any document semantics because other languages already do that. It is up to you to assemble notations to your preference.

2.1 Example: Trivial Case

You can embed a #lang and require the module in the same document.

#lang tesurell
 
@embed['my-module]|{
#lang racket/base
(provide out)
(define out 1)
}|
 
@require['my-module]
@out

The following interaction holds:

> doc

'(1)

Here you can see that doc reflects the content.

2.2 Example: Defining your own Document

If you want to define doc youself, then define a make-doc procedure to create it. You do not need to provide the procedure.

#lang tesurell
 
Doesn't matter what gets written here.
 
@(define (make-doc elements)
  (printf "Normally: ~v~n" elements)
  "Overridden")

The following interaction holds:

> (require "markup.rkt")

Normally: '("\n" "\n" "Doesn't matter what gets written here." "\n" "\n" |#<void>| "\n")

> doc

"Overridden"

Here you can see the body before it gets cleaned up. The void value is what the (define (make-doc) ...) evaluated to within the document, and the newlines come from the Scribble reader.

This feature is useful as a simple way for documents to define their own layout, namely without needing a templating system.

2.3 Example: Using Other Language Features

You can borrow more established languages and compose their output.

#lang tesurell
 
@embed['other-a]|{
#lang scribble/manual
@title{Manual A}
}|
@embed['other-b]|{
#lang scribble/manual
@title{Manual B}
}|
 
@require[@rename-in['other-a [doc a]]]
@require[@rename-in['other-b [doc b]]]
@(define (make-doc . _) (list a b))

2.4 Example: Self-hosting

Tesurell can self-host, but be warned that a Tesurell subdocument cannot see anything in the containing Tesurell document.

You could get around that by interpolating code within a subdocument, but using string interpolation to build code can be dangerous. It’s better to use Tesurell subdocuments to perform mechanical adjustments, or use make-tesurell-lang.

Here’s an example of a subdocument that overrides doc, while the parent document uses the default representation of doc.

#lang tesurell
 
Gonna get meta.
 
@embed['other 'doc]|{
#lang tesurell @require[racket/string]
 
@(define (make-doc raw)
   (list 'pre
         (string-trim (string-join (filter string? raw) ""))))
 
Preformatted
      text
   document
}|

The following interaction holds:

> doc

'("Gonna get meta." (pre "Preformatted\n      text\n   document"))

2.5 Example: Inline Language Demo

Since Tesurell supports inline Racket modules, you can also use it to define new languages for immediate demonstration. Despite my earlier warning, this example leverages string interpolation to provide input to the example sum language.

#lang tesurell
@require[racket/list racket/format]
 
@define[N 100]
 
@embed['sum-lang]|{
#lang racket
(require syntax/strip-context)
 
(provide (rename-out [seq-read read]
                     [seq-read-syntax read-syntax]))
 
(define (seq-read in)
  (syntax->datum (seq-read-syntax #f in)))
 
(define (seq-read-syntax src in)
  (with-syntax ([operands (read in)])
    (strip-context
     #'(module container racket
         (provide message)
         (define message (foldl + 0 operands))))))
}|
 
Welcome to the most offensively contrived way to sum
the first @N positive integers to
@embed['show-off 'message]{
#lang reader 'sum-lang @(~v (range 1 (+ N 1)))}

The following interaction holds:

> doc

'("Welcome to the most offensively contrived way to sum the first" 100 "positive integers to" 5050)

3 Reference

procedure

(module/port id autorequire in [ns])  any/c

  id : symbol?
  autorequire : symbol?
  in : input-port?
  ns : namespace? = (current-namespace)
Reads the module as #lang-prefixed source code from in, such that (require id) will work in the given namespace.

If autorequire is a symbol, then module/port will return the value bound to autorequire by the input module. Otherwise, module/port will return (void).

BEWARE: Like racket/load, the modules defined here are evaluated dynamically and are therefore not compiled. Two modules defined by this procedure cannot require each other via this form. Unlike racket/load, however, the modules can provide bindings. For best results, only use this for small expressions of code that are not shared by other documents.

procedure

(embed id [autorequire] str ...)  any/c

  id : symbol?
  autorequire : (or/c symbol? #f) = #f
  str : string?
This is a markup-friendly form of module/port.

#lang tesurell
 
@embed['my-module 'data]|{
#lang racket/base
(provide data)
(define data "I am from an inline module.")
}|

procedure

(make-tesurell-lang [wrap])  
(-> input-port?)
(-> (or/c #f)  input-port?)
  wrap : (-> syntax? syntax?) = default-doc-module

(make-tesurell-lang default-doc-module) implements #lang tesurell

Returns a read and read-syntax procedure, in that order.

The read procedures use read-syntax-inside from scribble/reader to parse content, then generates code that runs the instructions in the markup. Each will return the appropriate variant of (wrap body), where body is code that evaluates the markup language, and wrap is a syntax transformer that returns an enclosing module form. body consists entirely of top-level expressions and is dependent on any assumptions made by wrap.

body introduces some bindings of interest:

procedure

(default-doc-module body)  syntax?

  body : syntax?
This is the default wrap procedure for make-tesurell-lang. It implements rules re: doc as shown below.

(define (default-doc-module body)
  #`(module content racket/base
      (provide doc)
      #,body
      (define post
        (namespace-variable-value
         'make-doc
         #t
         (λ () reformat-doc)
         $module-namespace))
      (define doc (post $raw))
      (module+ main
        (writeln doc))))

procedure

(reformat-doc doc)  (listof any/c)

  doc : (listof any/c)
This procedure acts as the default make-doc implementation if one is not provided by a tesurell module.

The default value reduces noise from the Scribble reader by doing the following in order:

  1. Filters out all void values

  2. Combines strings like so:
    (filter-map (λ (x) (and (not (equal? "" x))
                            (regexp-replace* #px"\\s\\s*" x " ")))
      (regexp-split #px"\n\n+"
        (string-trim (string-join strings ""))))

In English, this combines the strings into one big string, and then trims the excess whitespace off the ends. It will then split the big string at each sequence of 2+ consecutive newlines. Each resulting substring then has all sequences of at least one space transformed into a single blank space.

The following interaction holds:

> (reformat-doc '("\n" "\n"  "Welcome   to the " "\nThunderdome"
                  "\n" "\n" "\n" "\n\nOver " 1000 " masters blasted."))
'("Welcome to the Thunderdome" "Over " 1000 " masters blasted.")

Under this interpretation, a paragraph is terminated by either the end of the list or a contiguous string element. If you wish to preserve the formatting of some part of a document, then you will need to wrap it in some container to prevent reformat-doc from changing it.

Also, zero or one space may appear after non-string values depending on how that string was formatted in the markup.

> (reformat-doc '(1000 "km")) ; @|1000|km

'(1000 "km")

> (reformat-doc '(1000 "   meters")) ; @1000 meters

'(1000 " meters")