Message  Pack
1 Message  Pack guide
1.1 Motivation
1.2 A guided tour
1.3 Type system
2 Message  Pack API
2.1 Message  Pack types
2.1.1 Packable types
Packable
packable?
2.1.2 Message  Pack extension type
Ext
ext
2.2 Packing
pack-to
pack
2.3 Unpacking
unpack-from
unpack
unpack/  rest
7.7

MessagePack

Alejandro Sanchez <hiphish@openmailbox.org>

 (require msgpack) package: msgpack

MessagePack is a binary data serialization format. This means that in-memory objects like numbers, string, dictionaries or ordered sequences can be serialized (packed) to raw bytes, and deserialized (unpacked) to in-memory Racket objects.

Source code: https://gitlab.com/HiPhish/MsgPack.rkt

    1 MessagePack guide

      1.1 Motivation

      1.2 A guided tour

      1.3 Type system

    2 MessagePack API

      2.1 MessagePack types

        2.1.1 Packable types

        2.1.2 MessagePack extension type

      2.2 Packing

      2.3 Unpacking

1 MessagePack guide

From the MessagePack website:

MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.

What this means is that we can turn in-memory objects of a process (such as a Racket REPL instance) into raw bytes, and turn raw bytes into in-memory objects again. These actions are referred to as "packing" and "unpacking" respectively, although the technical terms are "serialization" and "deserialization". Since the format is standardised we can use it to exchange data between processes written in different programming languages running on different architectures without issue.

MessagePack is focused on optimising speed and size, hence why it uses a binary format instead of a text-based one. Serialization formats like JSON and YAML are easy to read and manually edit by humans, but parsing them is a non-trivial task. Human-readability is not always a priority: for instance, if we want to exchange data between two processes for remote procedure calls, that data will never be read by a human.

1.1 Motivation

Suppose we have two processes and we wish to exchange data between them. Consider the following crude illustration:

+-------------+          Outgoing          +-------------+
| | -------------------------> | |
| Process 1 |                            | Process 2 |
| | <------------------------- | |
+-------------+          Incoming          +-------------+

The first process wants to send some "object" to the second process. The object can be anything: a number, a string, a boolean value, a list of objects, a hash table, a time stamp, or any additional type you want. The two processes do not necessarily have to be the same program, or run on the same machine or even run at the same time.

The easiest thing to do would be to just dump the in-memory representation of the object from inside the first process, and feed those raw bytes into the second process. However, there is no guarantee that the two processes can understand each other’s "dumped format", which is why we need some format to agree on.

The act of turning an in-memory object into raw bytes is generally called serialization, and the act of turning raw bytes into an in-memory object is called deserialization. In MessagePack terminology it is customary to refer to them as packing and unpacking respectively, and that is what we will call them from now on.

MessagePack defines a data serialization format which prefers speed and small memory sizes, this makes it particularly well suited for inter-process communication. We could in theory also write the packed object to a file and pass the file around to other people to look at, but the binary format makes it hard to read for humans.

1.2 A guided tour

The easiest way to get acquainted with MessagePack is to try it out. Install this library, fire up your Racket REPL of choice, then follow along with the code. If you are reading this manual in an interactive format (like HTML) you can always click the name of a procedure to read its documentation. Let’s first import the MessagePack module.

> (require msgpack)

MessagePack is often abbreviated as "msgpack", and this name is used throughout the library. There a several modules provided, with the topmost module msgpack being an "umbrella module" which exports everything the submodules export.

Let us now pack some objects.
> (require msgpack)
> (pack 13)

#"\r"

> (pack #f)

#"\302"

> (pack "Hello world!")

#"\254Hello world!"

> (pack 13 #f "Hello world!")

#"\r\302\254Hello world!"

The pack procedure takes one or more objects and turns them into bytes, returning a byte string. We could now write those bytes to an output port to send them off. As a shorthand the pack-to procedure takes in a binary output port, followed by one or more objects to pack, and sends then off through the port without allocating a bytevector.

> (require msgpack)
> (pack-to (current-output-port) "Hello world!")

?Hello world!

The complements to pack and pack-to are unpack and unpack-from.

> (require msgpack)
> (unpack #"\r")

13

> (unpack #"\302")

#f

> (unpack #"\254Hello world!")

"Hello world!"

> (unpack #"\r\302\254Hello world!")

13

> (unpack/rest #"\r\302\254Hello world!")

13

#"\302\254Hello world!"

> (unpack-from (current-input-port))

13

As we can see, while it was possible to pack multiple values at the same time, it is not possible to unpack multiple values at the same time. Instead we can use the unpack/rest procedure, which returns the unpacked object and the remaining bytes in the byte string. This is due to the fact that typed Racket requires procedures to return a fixed number of values.

These four procedures are what you will be using most of the time, they are documented in detail below, see the Packing and Unpacking sections.

1.3 Type system

For the most part packable types have a clear 1:1 mapping to Racket types; arrays unpack to vectors, dictionaries to hash tables, symbols pack to strings, vectors and lists to arrays. See the relevant sections for the exact details.

While the MessagePack type system does define the most common types, there is always room for extension. This is what the Ext type is for: every instance is a pair of an integer number and a byte string The number is a tag which specifies the type of the extension and the byte string carries the actual data.

MessagePack reserves negative type values for its own extensions, while non-negative type values are for your custom extensions. Let us assume you want to pack exact rational numbers without losing the exactness. None of the default types are suitable, so we will pick extension type zero.

Encoding exact rationals as an extension
> (require msgpack)
> (define (rational->ext q)
    (unless (rational? q)
      (throw 'type-error))
    (ext 0
         (pack (numerator   q)
               (denominator q))))
> (rational->ext 2/3)

(ext 0 #"\2\3")

> (define (ext->rational e)
    (let*-values ([(n rest) (unpack/rest (ext-data e))])
      (let ([d (unpack rest)])
        (/ n d))))
> (ext->rational (ext 0 #"\2\3"))

2/3

Of course our recipient must also be aware of our extension. In the above example we pack the two components of our rational number as two integers and unpack by unpacking the data twice. The first time around we use unpack/rest, which returns two values: the unpacked object and the remaining packed data. Then we call unpack, which returns only the unpacked object, because we only care about that object, not the remaining data (not that there would be any remaining data in this case anyway).

2 MessagePack API

All of the API is exposed via the msgpack module, which only serves to re-export the binding from the more specific sub-modules. We can divide the library into three tasks: packing data, unpacking data, and the auxiliary data types provided for dealing with MessagePack data. The following sections discuss these tasks and the associated modules.

2.1 MessagePack types

The MessagePack specification specifies which types the format supports. Most of these neatly map onto the types provided by Guile out of the box, only the extension type Ext needs to be explicityly defined.

2.1.1 Packable types

 (require msgpack/packable) package: msgpack

syntax

Packable

Union of all packable types for use with Typed Racket. Use this as the most general type for objects you can send to pack and pack-to, or receive from unpack, unpack/rest and unpack-from.

procedure

(packable? x)  boolean?

  x : Any
True if x can be packed by MessagePack.

2.1.2 MessagePack extension type

 (require msgpack/ext) package: msgpack

MessagePack allows for custom types to be defined via the ext type. An extension is a tagged byte sequence: the tag is a signed 8-bit integer and the data is an ordered sequence of bytes. Non-negative tag values are free to be used for any purpose, but negative tag values are reserved for future extension by MessagePack.

syntax

Ext

The type of an ext structure for use with Typed Racket. The constructor for extension objects is ext.

struct

(struct ext (type data))

  type : (and/c integer? (integer-in -128 127))
  data : bytes?
Represents a MessagePack extension type, a pair of a signed 8-bit type integer and a data byte string of less than 4GiB in length. The type name for Typed Racket is Ext.

2.2 Packing

 (require msgpack/pack) package: msgpack

Packing is the act of serialising Racket objects to raw bytes. Racket objecs are packed according to the following rules:

If there is more than one possibility the first one applicable in the above order is used. If there is more than one way of packing an object, the format with the least bytes will be chosen.

procedure

(pack-to out datum ...)  any

  out : (and/c output-port? (not/c port-closed?))
  datum : packable?
Pack each datum into the open binary output port out. If there are multiple ways of packing a datum the smallest possible format will be preferred. If the datum is a collection the packing procedure is called recursively.

If a datum cannot be packed an exception will be raised. If the datum can only be packed partially (for example if a vector contains an unpackable object) an exception will be raised as well, but bytes might already have been written to out.

procedure

(pack datum ...)  bytes?

  datum : packable?
Similar to pack-to, but instead of packing to a port return the packed data as a byte string.

2.3 Unpacking

 (require msgpack/unpack) package: msgpack

Unpacking is the act of deserialising raw bytes objects to Racket. Racket objecs are unpacked according to the following rules:

I tried to chose mappings which preserve semantic meaning and use existing data types, but of course no such mapping is perfect. If you disagree, I recommend wrapping the unpack-from, unpack and unpack/rest procedures like this:

> (require msgpack)
> (let ((datum (unpack #"\300")))
    (cond
      ((void? datum) '())
      (else datum)))

'()

This wrapping can also be used to map "Ext" objects onto whatever they are meant to actually represent in your application.

procedure

(unpack-from in)  packable?

  in : (and/c input-port? (not/c port-closed?))
Unpack a datum from in. At least one byte is consumed in the process to read the tag, more bytes are consumed as needed by the type of data.

Unpacks a single object from the open binary input port in and returns the unpacked object. If the object is a collection this procedure is called recursively.

If the port is empty, meaning that not a single byte could be read, of if not enough bytes could be read to form a complete object, or if there are objects missing from a collection (such as an array or a map) an unexpected EOF exception will be raised.

procedure

(unpack bstr)  packable?

  bstr : bytes?
Similar to unpack-from, except it reads from the byte string bytes and returns the unpacked object.

procedure

(unpack/rest bstr)  
packable? bytes?
  bstr : bytes?
Similar to unpack, except that it returns two values: the unpacked object and the remaining packed bytes. Since bstr can contain multiple objects we need a way of unpacking them all one by one. The following example will recursively unpack all objects into a list:

> (require msgpack)
> (define packed (pack "Hello" 23 #f))
> (let loop ([objects      '()]
             [remaining packed])
    (let-values ([(obj rest) (unpack/rest remaining)])
      (if (zero? (bytes-length rest))
        (reverse (cons obj objects))
        (loop (cons obj objects) rest))))

'("Hello" 23 #f)

The condition for termination is whether there are any remaining bytes to unpack.