mboxrd-read
mboxrd-parse
mboxrd-parse/  port
1 mboxcl2
mboxcl2-parse
2 maildir
maildir-parse
3 leftovers
7.7

mboxrd-read

This package parses mboxrd and mboxcl2 files, also known as "normal UNIX mbox files", into lazy lists of messages. It now also handles maildir directories.

procedure

(mboxrd-parse path)  (stream/c (list/c bytes? bytes?))

  path : path?
given a path to an mbox file, return a stream of the messages in the file. Each file is represented as a list containing a byte-string representing the header and a byte-string representing the body. These byte-strings can be appended to obtain the original message except that every \n in the original is replaced by \r\n to match the RFC 2822 format.

You are responsible for any locking needed to protect the file from modification while it’s being read. Different MUAs and MDAs have different locking protocols. Good luck.

procedure

(mboxrd-parse/port port)  (stream/c (list/c bytes? bytes?))

  port : input-port?
given an input port, return a lazy list of the messages in the port.

NB: this procedure assumes that it’s the only one reading the port. Bad stuff will happen if its not; it doesn’t leave the "From " of the next message on the stream.

EFFECT: reads from stream, closes it when peek-char returns eof.

1 mboxcl2

Well, it turns out that dovecot actually uses mboxcl2. Ah well. In fact, mboxcl2 looks like a bit of a win; since it uses Content-Length to locate the next header, it should be possible to parse faster, since you can set the file position rather than scanning those hideously long base64 body strings looking for the next line starting with "From ". The down side is that since the body strings aren’t read eagerly, closing the file port is a separate operation that you’re responsible for.

procedure

(mboxcl2-parse path [#:fallback fallback?])

  
(-> void?)
(stream/c (list/c bytes? (-> bytes?)))
  path : path-string?
  fallback? : boolean? = #f
given an input port, returns a closer function that closes the input port associated with the file, and a list of lists containing a header byte-string and a thunk that returns the body bytes.

When fallback? is true, a message whose header lacks a Content-Length field will instead be processed by searching forward for a line beginning with "From ".

Please note that the header gets rfc822-style newlines, but the body does not.

Note that after the closer function is called, it’s not possible to extend the lazy list or to extract bodies.

Again, you are responsible for any locking required to protect this file from modification while it’s being read.

2 maildir

Despite its increasingly narrow name, this package now also parses maildir directories, with a similar interface.

procedure

(maildir-parse path)

  (stream/c (list/c (or/c bytes? false?) (-> (or/c input-port? false?))))
  path : path-string?
Given a path that refers to a directory containing both "new" and "cur" subdirectories, return a stream of file results, where each result contains a byte string as before and a thunk that returns an input port pointing to the beginning of the body text of the given message.

Maildir was designed to avoid the need for locking, and indeed, it’s less problematic in this regard than other formats. The most likely problem is that the list of files will change while you’re looking at the directory. This may mean that the lazily constructed stream will contain references to files that aren’t there any more by the time you ask for their headers or their bodies, which means that both headers and body-ports can wind up being false. Beyond this, though, you don’t need to worry about locking.

3 leftovers

Additionally, you can use the utilities (e.g. extract-field and validate-headers) in "net/head" to process the header.

Let me know of any bugs.

John Clements