MessagePack
MessagePack is a binary data serialization format. This means that in-memory objects like numbers, string, dictionaries or ordered sequences can be serialized (packed) to raw bytes, and deserialized (unpacked) to in-memory Racket objects.
Source code: https://gitlab.com/HiPhish/MsgPack.rkt
1 MessagePack guide
From the MessagePack website:
MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.
What this means is that we can turn in-memory objects of a process (such as a Racket REPL instance) into raw bytes, and turn raw bytes into in-memory objects again. These actions are referred to as "packing" and "unpacking" respectively, although the technical terms are "serialization" and "deserialization". Since the format is standardised we can use it to exchange data between processes written in different programming languages running on different architectures without issue.
MessagePack is focused on optimising speed and size, hence why it uses a binary format instead of a text-based one. Serialization formats like JSON and YAML are easy to read and manually edit by humans, but parsing them is a non-trivial task. Human-readability is not always a priority: for instance, if we want to exchange data between two processes for remote procedure calls, that data will never be read by a human.
1.1 Motivation
Suppose we have two processes and we wish to exchange data between them. Consider the following crude illustration:
+-------------+ Outgoing +-------------+ | | -------------------------> | | | Process 1 | | Process 2 | | | <------------------------- | | +-------------+ Incoming +-------------+
The first process wants to send some "object" to the second process. The object can be anything: a number, a string, a boolean value, a list of objects, a hash table, a time stamp, or any additional type you want. The two processes do not necessarily have to be the same program, or run on the same machine or even run at the same time.
The easiest thing to do would be to just dump the in-memory representation of the object from inside the first process, and feed those raw bytes into the second process. However, there is no guarantee that the two processes can understand each other’s "dumped format", which is why we need some format to agree on.
The act of turning an in-memory object into raw bytes is generally called serialization, and the act of turning raw bytes into an in-memory object is called deserialization. In MessagePack terminology it is customary to refer to them as packing and unpacking respectively, and that is what we will call them from now on.
MessagePack defines a data serialization format which prefers speed and small memory sizes, this makes it particularly well suited for inter-process communication. We could in theory also write the packed object to a file and pass the file around to other people to look at, but the binary format makes it hard to read for humans.
1.2 A guided tour
The easiest way to get acquainted with MessagePack is to try it out. Install this library, fire up your Racket REPL of choice, then follow along with the code. If you are reading this manual in an interactive format (like HTML) you can always click the name of a procedure to read its documentation. Let’s first import the MessagePack module.
> (require msgpack)
MessagePack is often abbreviated as "msgpack", and this name is used throughout the library. There a several modules provided, with the topmost module msgpack being an "umbrella module" which exports everything the submodules export.
> (require msgpack) > (pack 13) #"\r"
> (pack #f) #"\302"
> (pack "Hello world!") #"\254Hello world!"
> (pack 13 #f "Hello world!") #"\r\302\254Hello world!"
The pack procedure takes one or more objects and turns them into bytes, returning a byte string. We could now write those bytes to an output port to send them off. As a shorthand the pack-to procedure takes in a binary output port, followed by one or more objects to pack, and sends then off through the port without allocating a bytevector.
> (require msgpack) > (pack-to (current-output-port) "Hello world!") ?Hello world!
The complements to pack and pack-to are unpack and unpack-from.
> (require msgpack) > (unpack #"\r") 13
> (unpack #"\302") #f
> (unpack #"\254Hello world!") "Hello world!"
> (unpack #"\r\302\254Hello world!") 13
> (unpack/rest #"\r\302\254Hello world!")
13
#"\302\254Hello world!"
> (unpack-from (current-input-port)) 13
As we can see, while it was possible to pack multiple values at the same time, it is not possible to unpack multiple values at the same time. Instead we can use the unpack/rest procedure, which returns the unpacked object and the remaining bytes in the byte string. This is due to the fact that typed Racket requires procedures to return a fixed number of values.
These four procedures are what you will be using most of the time, they are documented in detail below, see the Packing and Unpacking sections.
1.3 Type system
For the most part packable types have a clear 1:1 mapping to Racket types; arrays unpack to vectors, dictionaries to hash tables, symbols pack to strings, vectors and lists to arrays. See the relevant sections for the exact details.
While the MessagePack type system does define the most common types, there is always room for extension. This is what the Ext type is for: every instance is a pair of an integer number and a byte string The number is a tag which specifies the type of the extension and the byte string carries the actual data.
MessagePack reserves negative type values for its own extensions, while non-negative type values are for your custom extensions. Let us assume you want to pack exact rational numbers without losing the exactness. None of the default types are suitable, so we will pick extension type zero.
> (require msgpack)
> (define (rational->ext q) (unless (rational? q) (throw 'type-error)) (ext 0 (pack (numerator q) (denominator q)))) > (rational->ext 2/3) (ext 0 #"\2\3")
> (define (ext->rational e) (let*-values ([(n rest) (unpack/rest (ext-data e))]) (let ([d (unpack rest)]) (/ n d)))) > (ext->rational (ext 0 #"\2\3")) 2/3
Of course our recipient must also be aware of our extension. In the above example we pack the two components of our rational number as two integers and unpack by unpacking the data twice. The first time around we use unpack/rest, which returns two values: the unpacked object and the remaining packed data. Then we call unpack, which returns only the unpacked object, because we only care about that object, not the remaining data (not that there would be any remaining data in this case anyway).
2 MessagePack API
All of the API is exposed via the msgpack module, which only serves to re-export the binding from the more specific sub-modules. We can divide the library into three tasks: packing data, unpacking data, and the auxiliary data types provided for dealing with MessagePack data. The following sections discuss these tasks and the associated modules.
2.1 MessagePack types
The MessagePack specification specifies which types the format supports. Most of these neatly map onto the types provided by Guile out of the box, only the extension type Ext needs to be explicityly defined.
2.1.1 Packable types
(require msgpack/packable) | package: msgpack |
syntax
2.1.2 MessagePack extension type
(require msgpack/ext) | package: msgpack |
MessagePack allows for custom types to be defined via the ext type. An extension is a tagged byte sequence: the tag is a signed 8-bit integer and the data is an ordered sequence of bytes. Non-negative tag values are free to be used for any purpose, but negative tag values are reserved for future extension by MessagePack.
syntax
2.2 Packing
(require msgpack/pack) | package: msgpack |
Packing is the act of serialising Racket objects to raw bytes. Racket objecs are packed according to the following rules:
The void object gets packed as nil.
Both booleans #t and #f get packed as boolean values.
Real exact integers get packed as integers. They must be in the range from -263 to 264 - 1 (both inclusive).
Real numbers (including rationals) get packed as floating-point numbers. The default precision is double, but if the number is a single-flonum? it is packed with single precision as well.
Strings get packed a text strings, byte strings get packed a binary strings
Vectors and lists get packed as arrays. This will get stuck in an infinite loop if the vector or list contains cycles.
Hash tables get packed as maps.
Ext objects get packed as extension objects.
If there is more than one possibility the first one applicable in the above order is used. If there is more than one way of packing an object, the format with the least bytes will be chosen.
procedure
out : (and/c output-port? (not/c port-closed?)) datum : packable?
If a datum cannot be packed an exception will be raised. If the datum can only be packed partially (for example if a vector contains an unpackable object) an exception will be raised as well, but bytes might already have been written to out.
2.3 Unpacking
(require msgpack/unpack) | package: msgpack |
Unpacking is the act of deserialising raw bytes objects to Racket. Racket objecs are unpacked according to the following rules:
The nil object gets unpacked to an instance of void.
Boolean objects get unpacked to #t or #f.
Integers get unpacked to exact real integer numbers.
Floating point numbers get unpacked to inexact real numbers.
Strings get unpacked to strings, Binary strings get unpacked to bytevectors.
Arrays get unpacked to vectors.
Maps get unpacked to hash tables using equal? for key comparison.
Extension objects get unpacked to Ext objects.
I tried to chose mappings which preserve semantic meaning and use existing data types, but of course no such mapping is perfect. If you disagree, I recommend wrapping the unpack-from, unpack and unpack/rest procedures like this:
> (require msgpack)
> (let ((datum (unpack #"\300"))) (cond ((void? datum) '()) (else datum))) '()
This wrapping can also be used to map "Ext" objects onto whatever they are meant to actually represent in your application.
procedure
(unpack-from in) → packable?
in : (and/c input-port? (not/c port-closed?))
Unpacks a single object from the open binary input port in and returns the unpacked object. If the object is a collection this procedure is called recursively.
If the port is empty, meaning that not a single byte could be read, of if not enough bytes could be read to form a complete object, or if there are objects missing from a collection (such as an array or a map) an unexpected EOF exception will be raised.
procedure
(unpack/rest bstr) →
packable? bytes? bstr : bytes?
> (require msgpack) > (define packed (pack "Hello" 23 #f))
> (let loop ([objects '()] [remaining packed]) (let-values ([(obj rest) (unpack/rest remaining)]) (if (zero? (bytes-length rest)) (reverse (cons obj objects)) (loop (cons obj objects) rest)))) '("Hello" 23 #f)
The condition for termination is whether there are any remaining bytes to unpack.