Portaudio: Bindings for the Portaudio portable sound library
1 Using Windows, Choosing Host APIs
Using Portaudio on Windows raises a few extra challenges. In particular, Windows machines generally support a number of different "Host API"s that Portaudio can use to interact with the machine. In addition, these Host APIs may also target multiple different devices.
The default Host API for windows is MME. My observations suggest that this API is limited; it can open only a small number of simultaneous streams, and the latency for playing sounds is extremely high.
The WASAPI API (if that’s not redundant) has its own issues; in particular, it seems to be necessary to manually set the playback device to the right sample rate (often 44100Hz) before starting DrRacket. Failing to do so simply results in an "invalid device" error from Portaudio.
To address these issues, Portaudio includes a number of functions used to control the selection of the host API.
procedure
(pa-maybe-initialize) → void?
procedure
(pa-terminate-completely) → void?
procedure
(display-device-table) → void?
procedure
(default-host-api) → symbol?
procedure
→ exact-nonnegative-integer? desired-latency : number?
procedure
(device-low-output-latency device-number) → number?
device-number : exact-nonnegative-integer?
2 Playing Sounds
The first high-level interface involves copying the entire sound into a malloc’ed buffer, and then playing it. This is relatively low-latency. On the other hand, copying the sound involves doubling the memory required for the sound itself, so it’s a bad idea to call this for sounds that are really big (> 100MB?).
procedure
(s16vec-play s16vec start-frame end-frame sample-rate) → (-> void?) s16vec : s16vector? start-frame : nat? end-frame : nat? sample-rate : nonnegative-real?
This function signals an error if start and end frames are not ordered and legal.
Here’s an example of a short program that plays a sine wave at 426 Hz for 2 seconds:
#lang racket (require portaudio ffi/vector) (define pitch 426) (define sample-rate 44100.0) (define tpisr (* 2 pi (/ 1.0 sample-rate))) (define (real->s16 x) (inexact->exact (round (* 32767 x)))) (define vec (make-s16vector (* 88200 2))) (for ([t (in-range 88200)]) (define sample (real->s16 (* 0.2 (sin (* tpisr t pitch))))) (s16vector-set! vec (* 2 t) sample) (s16vector-set! vec (add1 (* 2 t)) sample)) (s16vec-play vec 0 88200 sample-rate)
3 Playing Streams
procedure
(stream-play buffer-filler buffer-time sample-rate) → (list/c (-> real?) (-> (list-of (list/c symbol? number?)))(-> void?)) buffer-filler : (-> buffer-setter? nat? nat? void?) buffer-time : nonnegative-real? sample-rate : nonnegative-real?
Note that the buffer length may be longer than the specified length, if the provided length is too short for the chosen device.
The function returns a list containing three functions: one that queries the stream for a time in seconds, one that returns statistics about the stream, and a third that stops the stream.
This function is believed safe; it should not be possible to crash DrRacket by using this function badly (unless you exhaust memory by choosing an enormous buffer size).
Here’s an example of a program that uses stream-play to play a constant pitch of 426 Hz forever:
#lang racket (require portaudio) (define pitch 426) (define base-frames 0) (define sample-rate 44100.0) (define tpisr (* 2 pi (/ 1.0 sample-rate))) (define (real->s16 x) (inexact->exact (round (* 32767 x)))) (define (buffer-filler setter frames) (for ([i (in-range frames)] [f (in-range base-frames (+ base-frames frames))]) (define sample (real->s16 (* 0.2 (sin (* tpisr f pitch))))) (setter (* i 2) sample) (setter (+ 1 (* i 2)) sample)) (set! base-frames (+ base-frames frames))) (match-define (list timer stats stopper) (stream-play buffer-filler 0.2 sample-rate))
Note that this example uses a long buffer of 0.2 seconds (= 200 milliseconds) so that most GC pauses won’t interrupt it.
However, this a latency of 200ms is be pretty terrible for an interactive system. I usually use 50ms, and just put up with the occasional miss in return for lower latency.
procedure
(stream-play/unsafe buffer-filler buffer-time sample-rate) → (list/c (-> real?) (-> void?)) buffer-filler : (-> cpointer? int? void?) buffer-time : nonnegative-real? sample-rate : nonnegative-real?
The difference is that this function’s callback is called with a cpointer, rather than a set!-proxy. This saves the overhead of a function call and several checks, but perhaps more importantly allows the use of functions like memcpy and vector-add that can operate at much higher speeds (currently ~5x) than the current vector operations.
4 Recording Sounds
This library also provides a high-level interface for recording sounds of a fixed length.
procedure
(s16vec-record frame frame-rate num-channels) → s16vector? frame : frame? frame-rate : integer? num-channels : channels?
5 A Note on Memory, Synchronization, and Concurrency
Note: the following is not organized to the high standards of a technical paper. The Management would like to apologize in advance, and humbly requests your forgiveness.
Interacting with sound libraries is tricky. The basic framework for this library is what’s called a "pull" architecture; the OS makes a call to a callback every 5-50ms[*], asking for new data to be shoveled into a given buffer. This callback runs on a separate OS thread, which means that Racket must somehow synchronize with this thread to provide data when needed.
One difficulty here is that Racket is garbage-collected, with GC pauses that typically run from 50ms to 100ms. This means that when a program is generating garbage, there are simply bound to be hiccoughs in a stream-based program. In general, these don’t seem to be too awful, and it’s often possible to write programs that generate very little garbage.
After trying several architectures, the model that seems to work the best is a shared-memory design, where the callback is written entirely in C, and takes its data from a buffer shared with Racket. If Racket has written the data into the buffer, then this routine copies it into the OS’s buffer. If not, then it just zeros out the buffer to play silence.
5.1 Copying Vs. Streaming
This package supports two different play interfaces: a "copying" interface and a "streaming" interface.
The copying interface is simple: Racket stuffs an entire sound into a buffer, then opens a new stream, providing a callback that pulls samples out of the buffer until it’s done. This means that the sound is not affected by GC pauses or Racket’s speed. On the other hand, it means duplicating the entire sound (expensive, for large sounds), and it requires a platform that can support multiple streams simultaneously. (OS X, yes. Windows, usually no.) Also, it tends to have higher startup latency (especially on windows), because there’s time required to start a new stream. Finally, it requires pre-rendering of the entire sound, meaning that interactivity is out.
The streaming interface solves these problems, but exposes more of the grotty stuff to the programmer. Rather than providing sound data, the user provides a racket callback that can generate sound data on demand. If the given callback can’t keep up with the demand, the stream starts to hiccough.
More specifically, this package uses a ring buffer, whose length can be specified independently of the underlying machine latency. The Portaudio engine calls the user’s racket callback quite frequently–on the order of every 1-5ms–to top up this ring buffer. When GC pauses occur, the C callback will drink up everything left in the ring buffer, and then just play silence.
Choosing the length of this ring buffer is therefore difficult: too short, and you’ll hear frequent hiccoughs as the C callback runs out of data. Too long, and you get high-latency, sluggish response. Times on the order of 50ms seem to be an acceptable compromise.
5.2 Memory
Shared memory management is a big pain. Racket is garbage-collected, but it’s interacting with an audio library that is not. It’s nearly impossible to avoid all possible race conditions related to the free-ing of memory.
The first and largest issue is the block of memory shared between the Racket engine and the C callback. The current setup is that the memory is freed by a close-stream callback associated with the stream on the Portaudio side. The sequence is therefore this: Racket calls CloseStream. Portaudio then stops calling the callback, and closes the stream. Then, it calls the provided "all-done" callback, which frees the memory. One note here is that Racket should probably wrap the pointer in a mutable object so that it can be severed on the Racket side when the stream is closed. Actually, that’s true of the stream, as well.
[*] Different platforms are different; currently, this package insists on a latency of at most 50ms, or it just refuses to run. It appears that all modern platform can provide this, though it’s sometimes a bit tricky to decide which output device to use.