Coinvented by Matt Wright

Open Sound Control: A New Protocol for Communicating with Sound Synthesizers

1 Introduction

A better integration of computers, controllers and sound synthesizers will lead to lower costs, increased reliability, greater user convenience, and more reactive musical control. The prevailing technologies to interconnect these elements are bus (motherboard or PCI), operating system interface (software synthesis), or serial LAN (Firewire, USB, ethernet, fast ethernet). It is easy to adapt MIDI streams to this new communication substrate [11]. To do so needlessly discards new potential and perpetuates MIDI's well-documented flaws. Instead we have designed a new protocol optimized for modern transport technologies.

Open SoundControl is an open, efficient, transport-independent, message-based protocol developed for communication among computers, sound synthesizers, and other multimedia devices. Open SoundControl is a machine and operating system neutral protocol and readily implementable on constrained, embedded systems.

We begin by examining various networking technologies suitable for carrying Open SoundControl data and discuss shared features of these technologies that impact the design of our protocol. Next we discuss the encoding and formatting of data. We describe our novel addressing scheme based on a URL-style symbolic syntax. In our last section we specify a number of query messages that request information from an Open SoundControl system.

2 Transport Layer Assumptions

Open SoundControl is a transport-independent protocol, meaning that it is a format for data that can be carried across a variety of networking technologies. Networking technologies currently becoming widely available and economical include high speed busses such as PCI [8] and medium speed serial LANs such as USB [10], IEEE-1394 ("Firewire") [3], Ethernet, and Fast Ethernet [4]. Although Open SoundControl was not designed with a particular transport layer in mind, our design reflects features shared by modern networking technologies.

We assume that Open SoundControl will be transmitted on a system with a bandwidth in the 10+ megabit/sec range. MIDI's bandwidth, by contrast, is only 31.25 kilobit/sec, roughly 300 times slower. Therefore, our design is not preoccupied with squeezing musical information into the minimum number of bytes. We encode numeric data in 32-bit or 64-bit quantities, provide symbolic addressing, time-tag messages, and in general are much more liberal about using bandwidth for important features.

We assume that data will be delivered in packets (a.k.a. "datagrams") rather than as a stream of data traveling along an established connection. (Although many LANs transmit data serially, they deliver the data as a single block.) This leads to a protocol that is as stateless as possible: rather than assuming that the receiver holds some state from previous communications (e.g., MIDI's running status), we send information in larger, self-contained chunks that include all the relevant data in one place. This packet-based delivery model provides a mechanism for synchronicity: messages in the same packet (e.g., messages to start each of the notes in a chord) can be specified to have their effects occur at the same time as each other. We assume that the network services will tell us the length of each packet that we receive.

All modern networking technologies have the notion of multiple devices connected together in a LAN, with each device able to send a packet to any other device. So we assume that any number of clients might send Open SoundControl messages to a particular device. We also assume that the transport layer provides a return address mechanism that allows a device to send a response back to the device that sends it a message.

3 Data Representation

All Open SoundControl data is aligned on 4-byte boundaries. Numeric data are encoded using native machine representations of 32-bit or 64-bit big-endian twos-complement integers and IEEE floating point numbers. Strings are represented as a sequence of non-null ASCII characters followed by a null, padded with enough extra null characters to make the total length be a multiple of 4 bytes. These representations facilitate real-time performance by eliminating the need to reformat received data. (Except for the unavoidable big-endian/small-endian conversion. We note that most small-endian machines provide special instructions for this conversion.)

The basic unit of Open SoundControl data is a message, which consists of the following:

  • A symbolic address and message name (a string whose meaning is described below)
  • Any amount of binary data up to the end of the message, which represent the arguments to the message.

An Open SoundControl packet can contain either a single message or a bundle. A bundle consists of the following:

  • The special string "#bundle" (which is illegal as a message address)
  • A 64 bit fixed point time tag
  • Any number of messages or bundles, each preceded by a 4-byte integer byte count

Note that bundles are recursively defined; bundles can contain other bundles.

Messages in the same bundle are atomic; their effects should be implemented concurrently by the receiver. This provides the important service in multimedia applications of specifying values that have to be set simultaneously.

Time tags allow receiving synthesizers to eliminate jitter introduced during packet transport by resynchronizing with bounded latency. They specify the desired time at which the messages in the bundle should take effect, a powerful mechanism for precise specification of rhythm and scheduling of events in the future. To implement time tags properly, message recipients need a real-time scheduler like those described by Dannenberg [1].

Time tags are represented by a 64 bit fixed point number. The first 32 bits specify the number of seconds since midnight on January 1, 1900, and the last 32 bits specify fractional parts of a second to a precision of about 200 picoseconds. This is the representation used by Internet NTP timestamps [6]. The Open SoundControl protocol does not provide a mechanism for clock synchronization; we assume either that the underlying network will provide a synchronization service or that the two systems will take advantage of a protocol such as NTP or SNTP [7].

4 Addressing Scheme

The model for Open SoundControl message addressing is a hierarchical set of dynamic objects that include, for example, synthesis voices, output channels, filters, and a memory manager. Messages are addressed to a feature of a particular object or set of objects through a hierarchical namespace similar to URL notation, e.g., /voices/drone-b/resonators/3/set-Q. This protocol does not proscribe anything about the objects that should be in this hierarchy or how they should be organized; each system that can be controlled by Open SoundControl will define its own address hierarchy. This open-ended mechanism avoids the addressing limitations inherent in protocols such as MIDI and ZIPI [5] that rely on short fixed length bit fields.

To allow efficient addressing of multiple destination objects for parameter updates, we define a pattern-matching syntax similar to regular expressions. When a message's address is a pattern, the receiving device expands the pattern into a list of all the addresses in the current hierarchy that match the pattern, similar to the way UNIX shell "globbing" interprets special characters in filenames. Each address in the list then receives a message with the given arguments.

We reserve the following special characters for pattern matching and forbid their use in object addresses: ?, *, [, ], {, and }. The # character is also forbidden in object address names to allow the bundle mechanism. Here are the rules for pattern matching:

  • ? matches any single character except /
  • * matches zero or more characters except /
  • A string of characters in square brackets (e.g., [string]) matches any character in the string. Inside square brackets, the minus sign (-) and exclamation point (!) have special meanings: two characters separated by a minus sign indicate the range of characters between the given two in ASCII collating sequence. (A minus sign at the end of the string has no special meaning.) An exclamation point at the beginning of a bracketed string negates the sense of the list, meaning that the list matches any character not in the list. (An exclamation point anywhere besides the first character after the open bracket has no special meaning.)
  • A comma-separated list of strings enclosed in curly braces (e.g., {foo,bar}) matches any of the strings in the list.

Our experience is that with modern transport technologies and careful programming, this addressing scheme incurs no significant performance penalty either in network bandwidth utilization or in message processing.

5 Requests for Information

Some Open SoundControl messages are requests for information; the receiving device (which we'll call the Responder in this section) constructs a response to the request and sends it back to the requesting device (which we'll call the Questioner). We assume that the underlying transport layer will provide a mechanism for sending a reply to the sender of a message.

Return messages are formatted according to this same Open SoundControl protocol. There is always enough information in a return message to unambiguously identify what question is being answered; this allows Questioners to send multiple queries without waiting for a response to each before sending the next. The time tag in a return message indicates the time that the Responder processed the message, i.e., the time at which the information in the response was true.

Exploring the Address Space

Any message address that ends with a trailing slash is a query asking for the list of addresses underneath the given node. These messages do not take arguments. This kind of query allows the user of an Open SoundControl system to map out the entire address space of possible messages as the system is running.

Message Type Signatures

Because Open SoundControl messages do not have any explicit type tags in their arguments, it is necessary for the sender of a message to know how to lay out the argument values so that they will be interpreted correctly.

An address that ends with the reserved name "/type-signature" is a query for the "type signature" of a message, i.e., the list of types of the arguments it expects. We have a simple syntax for representing type signatures as strings; see our WWW site for details.

Requests For Documentation

An address that ends with the reserved name "/documentation" is a request for human-readable documentation about the object or feature specified by the prefix of the address. The return message will have the same address as the request, and a single string argument. The argument will either be the URL of a WWW page documenting the given object or feature, or a human-readable string giving the documentation directly. This allows for "hot-plugging" extensible synthesis resources and internetworked multimedia applications.

Parameter Value Queries

An address that ends with the reserved name "/current-value" is a query of the current value of the parameter specified by the prefix of the address. Presumably this parameter could be set to its current value by a message with the appropriate arguments; these arguments are returned as the arguments to the message from the Responder. The address of the return message should be the same as the address of the request.

6 Conclusion

We have had successful early results transmitting this protocol over UDP and Ethernet to control real-time sound synthesis on SGI workstations from MAX [9] programs running on Macintosh computers. Composers appreciate being freed from the limited addressing model and limited numeric precision of MIDI, and find it much easier to work with symbolic names of objects than to remember an arbitrary mapping involving channel numbers, program change numbers, and controller numbers. This protocol affords satisfying reactive real-time performance.

7 References

[1] Dannenberg, R, 1989. "Real-Time Scheduling and Computer Accompaniment," in M. Mathews and J. Pierce, Editors, Current Directions in Computer Music Research, Cambridge, Massachusetts: MIT Press, pp. 225-261.

[2] Freed, A. 1996. "Firewires, LANs and Buses," from SIGGRAPH Course Notes Creating and Manipulating Sound to Enhance Computer Graphics, New Orleans, Louisiana, pp. 84-87.

[3] IEEE 1995. "1394-1995: IEEE Standard for a High Performance Serial Bus," New York: The Institute of Electronical and Electronic Engineers.

[4] IEEE 1995b. "802.3u-1995 IEEE Standards for Local and Metropolitan Area Networks: Supplement to Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications: Media Access Control (MAC) Parameters, Physical Layer, Medium Attachment Units, and Repeater for 100 Mb/s Operation," New York: The Institute of Electronical and Electronic Engineers.

[5] McMillen, K, D. Wessel, and M. Wright, 1994. "The ZIPI Music Parameter Description Language," Computer Music Journal, Volume 18, Number 4, pp. 52-73.

[6] Mills, D., 1992. "Network Time Protocol (Version 3) Specification, Implementation, and Analysis," Internet RFC 1305. (http://sunsite.auc.dk/RFC/rfc1305.html)

[7] Mills, D., 1996. "Simple Network Time Protocol (SNTP) Version 4 for Ipv4, Ipv6 and OSI," Internet RFC 2030. (http://sunsite.auc.dk/RFC/rfc2030.html)

[8] PCI 1993. PCI Local Bus Specification, Revision 2.0, Hillsboro, Oregon: PCI Special Interest Group.

[9] Puckette, M., 1991. "Combining Event and Signal Processing in the MAX Graphical Programming Environment," Computer Music Journal, Volume 15, Number 3, pp. 58-67.

[10] The USB web site is www.usb.org

[11] Yamaha 1996. "Preliminary Proposal for Audio and Music Protocol, Draft Version 0.32," Tokyo, Japan: Yamaha Corporation.

Publication Keywords

Open Sound Control: A New Protocol for Communicating with Sound Synthesizers, Wright, Matthew, and Freed Adrian , International Computer Music Conference (ICMC), 1997, Thessaloniki, Hellas, p.101-104, (1997)
Syndicate content