Machine Learning and AI at CNMAT

This is the first of a set of thematic overviews of CNMATs work I am writing that cover the period since our inception to the closing in 2017 of the research sector after David Wessel’s death. The activities themselves are more completely understood with concurrent perspectives of many themes so this serialization is more a reflection of the cursed sequentiality of language than any suggestion that a particular theme might be more important than an other. Future themes may include Spatial Audio, Musical Programming Tools, Musical control Structuring, Instrument Building, etc. When David Wessel and I first met and started working together in Paris in the early 1980s we quickly discovered a shared interest and background in many subjects - notably Cybernetics and AI. We also found we were both stimulated by a challenging question of the day: what were the limits of and potential of computation?

One of the most stimulating texts we had both studied that influenced a lot of people in those days was "Laws of Form" by George Spencer Brown. It is an odd, eccentric, charismatic text which carried much of the energy and enthusiasm of early British Cybernetic thinking. David encountered this cybernetics work directly when he took a class with Ross Ashby at Stanford. This cemented David’s lifelong explorations of computation and psychology that went well beyond the usual use of computers in psychology (producing statistical analyses of experiments).

I first encountered cybernetics as a teenager when I discovered an old thesis from the 1950s on an electronic simulation of neurones - a primitive neural network simulation. I also discovered through my father’s experience obtaining his engineering degree from the University of Manchester that pattern recognition was considered one of the most promising applications of the earliest digital, stored-program computers built.

As an undergraduate at UNSW in Australia in the late 1970s, I was fortunate to be invited by Andrew Hume, a graduate student, to follow his seminar studying LISP. This was my entry into symbolic computing and AI as MIT flavored it. Around the same time David Wessel, Pierre Boulez and others at IRCAM were learning the same material from Patrick Greussay. I discovered this when I found David’s meticulous notes in his IRCAM-era notebooks,

At University I exercised my AI knowledge by experimenting with symbolic computation to represent a “creature” and its “habitat” and struggled with the stability and temporal dynamics (the “creature” kept dying). This sort of challenge was named artificial life a few years later and anticipates genetic programming and neuromorphic computing.

I also dabbled in Natural Language Programming (NLP) by creating a parser that attempted to build transformational grammar trees from as much online text as I could find. I got a B grade for the paper on that because my teacher had never used a computer and doubted this could ever work. I doubted that transformational grammar could ever work and thought that an emperical route might prove this. Being right wasn’t very helpful to my engagement in the field of linguistics at that time. It is hard not to be impressed these days with how much grammatical and semantic structure can be inferred statistically. This was my first encounter with technophobic fears of computers replacing humans. There are more applied linguists employed now working on NLP than there ever have been.

Meanwhile at IRCAM, Boulez encouraged the development of symbolic computation in the direction of pitch set representations and tools for traditional composition (resulting in a long series of tools now under the rubric of Open Music). David worked with a famous Jazz musician, critic and scholar André Hodeir trying to make an automated accompaniment system. David was interested in pitch and harmony and he was just as interested in rhythm and musical control structures. This led to MIDI-Lisp and hosting of Miller Puckette’s Patcher development, the precursor to PD and Max.

One interesting experiment from this time (when digital editing on large mainframes was just starting) was to rearrange some scat singing phrases from a solo of Louis Armstrong into new plausible solos. David discovered that the essential swing was lost because good models of percieved onset time were unavailable. This is an important moment of tension where two models from AI and cognitive psychology revealed themselves: 1) the discrete, symbolic one that we see these days under the rubrics of recognition and labeling and 2) statistical estimation and optimization using perceptual models. David and I had good backgrounds to study the former - we found out that we were the only ones to have checked out “Text Discourse and Process” and Syntactic Pattern Recognition” from the IRCAM library. David was trained on the latter during his graduate work on visual perceptual and recognized that optimization theory was underlying what became known as Machine Learning. Juggling these two viewpoints is common to most ML work in music and audio. We pursued at CNMAT a conviction that a focus on continuous, temporally and perceptually oriented machine learning would yield the most interesting applications and that symbolic processes would “fly on top” of that as needed. You can see the synergy between these models exercised well in commercial products these days such as Izotope’s Ozone and Neutron.

In my work in David’s group at IRCAM of the mid-1980s I developed MacMix the precursor to the (DAW) digital audio workstation. Although by necessity implemented in C, I had learned the value of property lists from LISP as a core data structure. I used this in MacMix for the multichannel file format (later adopted as SoundDesigner II format in Protools). I also used a textual representation of keys and values for the communication encoding between the Macintosh and IRCAM’s DEC Vax mainframe that did the mixing and signal processing. This was one of the precursor protocols to Open Sound Control which was eventually codified and shaped by Matt Wright’s knowledge of Scheme and an early application of OSC: SuperCollider. The key catalyst for OSC was to solve a practical problem of Roberto Morales, a pioneer of making music with AI and machine learning systems. During his first visit to CNMAT, he needed a way to communicate between his Max patch running on a Macintosh and the Prolog interpreter that we could only run on a Sun workstation. His compositional strategy is still very interesting to pursue: he translated his musical phrases in real-time into Prolog propositions and synthesized a sonic response from the attempts of the interpreter to prove or disprove the proposition.

When David Wessel and I started at CNMAT, we were keen to take up the research on Machine Learning and were fortunate that Michael Lee, a student from a neighboring campus (UC Davis) wanted to work on that. Michael Lee wrote the MaxNET neural networking plug-in for Max which was probably the first real-time neural network feed-forward implementation tailored for musical applications. Many of us at CNMAT in those days had a hand in concocting applications of this object and summarized what we learned in an influential early paper. My takeaways were that we would need to do a lot more training, would need a better way to handle time (BRNN's hadn't been developed) and would need a lot more computing power to take on big problems. This was informed by Michael’s exploration of a wide range of optimization algorithms of which neural network simulations (MLPs) was just one. In 1993 I wrote a visionary paper on how I thought the field was going to tackle this computational need. Building on the conviction of the importance of continuous computations, I claimed the solution was going to be Analog VLSI. I built a small programmable array of discrete OTA's for analog neural network computations to confirm this. This hasn’t come to be for computer music in general but analog neuromorphic processing on chips is a key solution approach for portable interactive and biomedical systems-an area I am still active in.

Since CNMAT’s primary mission was live musical performance as opposed to studio work we found ourselves building a lot of what David called in 2006 “software listening assistants”. This cemented a philosophical distinction which is also well represented these days in commercial applications of AI and ML to audio and music. Some tools are said to “do the work for you”, others are assistive “enhancers of creativity”. These evoke a common binary that has been with us at least ever since the first automata: technophobia (because of job loss and trivialization of hard-earned practice) and technophilia (optimistic conception of the cultural resonance of the cyborg). As a research director I encouraged a third way: to be technocritical. We built and shared and actively used technologies in significant musical projects to determine their value in particular contexts. This important issue is often better considered by critical organologists, sociologists and anthropologists than practitioners. David and I encouraged such scholars at IRCAM and CNMAT and various accounts of our work have been published with their particular lenses.

As David arrived to the traditional retirement age, he shifted CNMAT’s primary research activities to transferring what we had learned in the rigorous context of real-time music and audio application to broader vistas in two 5-year collaborations with the UC Berkeley EECS department and many affiliated Universities. Machine learning made a brief appearance in the first one, the Parlab, as an optimization technique for computer resource scheduling problems. In the second project the, Terraswarm, ML was our primary contribution. David’s final work “Control Improvisation with Probabilistic Temporal Specifications” may in the long term have the greatest impact of his many contributions outside core ones in computer music and perceptual psychology.

CNMAT was above everything else a supportive, safe place for young scholars to experiment and rehearse their life’s work. Following what these amazing folk are doing now may be a faster way to understand this important theme than a reading our older papers.

Here is my first attempt to identify those colleagues to direct you to their current activities. Please contact me to update this - especially so I can correct my accidental acts of omission.

  • Jean Ahn (Cool Jam)
  • Ritwik Banerji (University of Cincinnati)
  • Eric Battenberg (Google)
  • Jeff Bilmes (University of Washington)
  • Georgina Born (University of Oxford)
  • Arshia Cont (Antescofo)
  • Johanna Devaney (CUNY)
  • Cyril Drame
  • Aaron Einbond (City University of London)
  • Kelly Fitz (Earlens)
  • Guy Garnett (University of Illinois)
  • Vijay Iyer (Columbia)
  • Tristan Jehan (EchoNest, Spotify)
  • Peter Kassakian (Twitter)
  • Michael Lee (Sennheiser, Creative Labs)
  • Nils Peters (University of Erlangen-Nuremberg)
  • Deirdre Loughridge (Northeastern University)
  • Psyche Loui (Northeastern)
  • Yotam Mann (NYU, Google)
  • Ali Momeni (CMU, Shield AI)
  • Roberto Morales (Universidad de Guanajuato)
  • Andy Schmeder (Enchroma)
  • Rafael Valle (nVidia)
  • Brian Vogel (Preferred Networks)
  • Matt Wright (Stanford University)
  • Michael Zbyszyński (Goldsmiths University)
  • David Zicarelli (Cycling74)

    Valle R. Data Hallucination, Falsification and Validation using Generative Models and Formal Methods. Computational and Data Science and Engineering. 2018. p. 84.

    Banerji R. Balancing Defiance and Cooperation: The Design and Human Critique of a Virtual Free Improviser. In ICMC 2016. Utrecht: ICMA; 2016.

    Valle R, Fremont DJ, Akkaya I, Donze A, Freed A, Seshia SS. Learning and Visualizing Music Specifications Using Pattern Graphs. In International Society for Music Information Retrieval Conference. New York; 2016.

    Valle R. ABROA : Audio-based Room-occupancy Analysis using Gaussian Mixtures and Hidden Markov Models. In Future Technologies Conference (FTC) 2016. San Francisco; 2016.

    Valle R, Donz A, Fremont D, Akkaya I, Seshia S, Freed A, et al.. Specification Mining For Machine Improvisation With Formal Specifications. ACM : Computers in Entertainment (Musical Metacreation). 2016.

    Valle R, Freed A. Symbolic Music Similarity using Neuronal Periodicity and Dynamic Programming. Springer. Mathematics and Computation in Music. London, UK: Springer; 2015.

    Valle R, Freed A. Batera : Drummer Agent with Style Learning and Interpolation. In Study Day On Computer Simulation Of Musical Creativity [Internet]. University of Huddersfield, UK; 2015.

    Wessel D, Battenberg E, Schmeder A, Kelly F, Edwards B. Hearing aid fitting procedure and processing based on subjective space representation. USPTO. UC Berkeley; 2015.

    Donze A, Akkaya I, Seshia SA, Libkind S, Valle R, Wessel D. Machine Improvisation with Formal Specifications. In International Computer Music Conference. Athens, Greece; 2014.

    Lalor E, Mesgarani N, Rajaram S, ODonovan A, Wright J, Choi I, et al.. Decoding Auditory Attention (in Real Time) with EEG. 37th ARO MidWinter Meeting. Baltimore, US: Association for Research in Otolaryngology; 2013.

    Santos JF, Peters N, Falk TH. Towards blind reverberation time estimation for non-speech signals. 21st International Congress on Acoustics (ICA). Montreal; 2013.

    Peters N, Choi J, Lei H. Matching Artificial Reverb Settings to Unknown Room Recordings: a Recommendation System for Reverb Plugins. 133rd AES Convention. San Francisco, US; 2012.

    J Devaney, MI Mandel, I Fujinaga, A Study of Intonation in Three-Part Singing using the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT). ISMIR 2012

    Mann, Y., Freed, A. Pervasive Cameras: Making Sense of Many Angles using Radial Basis Function Interpolation and Salience Analysis Pervasive Computing Conference 2011.

    Schmeder A, Freed A. Support Vector Machine Learning for Gesture Signal Estimation with a Piezo Resistive Fabric Touch Surface. In NIME. Sydney, Australia; 2010.

    Battenberg E. Improvements to percussive component extraction using non-negative matrix factorization and support vector machines. EECS. [Berkeley, CA]: University of California, Berkeley; 2008. p. 48.

    Loui P, Wessel D, Wu EH, Knight RT. Perceiving new music recruits flexible neural mechanisms. In Cognitive Neuroscience Society 2007. p. 48. http://www.cnsmeeting.org/CNS_2007_Program.pdf

    Contemporary Music Review Vol. 25, Nos. 5/6, October/December 2006, pp. 425 – 428

    Cont A, Dubnov S, Wessel D. Realtime Multiple-Pitch and Multiple-Instrument Recognition for Music Signals Using Sparse Non-Negative Constraints. In 10th International Conference on Digital Audio Effects (DAFx-07). 2007. pp. 85-92.

    Kassakian, P., Convex Approximation with Applications in Magnitude Filter Design and Beamforming Ph. D. Thesis, EECS, University of California Berkeley, Berkeley, 2006.

    MacCallum J, Einbond A. Real-time Analysis of Sensory Dissonance. In International Computer Music Conference. Copenhagen, Denmark; 2007.

    Wessel D, Fitz K, Battenberg E, Schmeder A, Edwards B. Optimizing Hearing Aids for Music Listening. In 19th International Congress on Acoustics. Madrid, Spain; 2007.

    John MacCallum and Aaron Einbond, Timbre as a Psychoacoustic Parameter for Harmonic Analysis, Rocky Mountain Society for Music Theory, March 31-April 1, 2006

    Loui, P., Wessel, D, Kam, C., Acquiring New Musical grammars: a Statistical Learning Approach, Proceedings of the 28th Annual Conference of the Cognitive Science Society.

    Loui, P., Wessel, D., Acquiring New Musical Grammars Ð a Statistical Learning Approach., Proceedings of the 9th International Conference for Music Perception and Cognition.

    John MacCallum, Jeremy Hunt, and Aaron Einbond , TIMBRE AS A PSYCHOACOUSTIC PARAMETER FOR HARMONIC ANALYSIS AND COMPOSITION, ICMC 2005, Barcelona.

    Peter Kassakian and David Wessel. Optimal Positioning in Low-Dimensional Control Spaces using Convex Optimization, ICMC 2005, Barcelona.

    Kassakian, P. Magnitude Least--Squares Fitting via Semidefinite Programming with Applications to Beamforming and Multidimensional Filter Design, IEEE International Conference on Acoustics, Speech, and Signal Processing 2005 Vogel, B., M. I. Jordan, and D. Wessel (2005) Multi-Instrument Musical Transcription Using a Dynamic Graphical Model, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) - 2005.

    Roberto Morales-Mazanares, Eduardo F. Morales and David Wessel, COMBINING AUDIO AND GESTURES FOR A REAL-TIME IMPROVISER. ICMC 2005,

    Andrew W. Schmeder,Mapping Spectral Frames to Pitch with the Support Vector Machine. Proceedings of ICMC 2004, Miami

    Wessel, D. and M. Wright. 2001. Problems and Prospects for Intimate Musical Control of Computers. Proceedings of the 2001 ACM Computer-Human Interaction (CHI) Workshop on New Interfaces for Musical Expression (NIME'01), Seattle, WA.

    Wessel, D. and M. Wright (2000), Problems and Prospects for Intimate Musical Control of Computers. ACM SIGCHI, CHI '01 Workshop New Interfaces for Musical Expression (NIME'01)

    Wessel, D., C. Drame, et al. Removing the Time Axis from Spectral Model Analysis-Based Additive Synthesis: Neural Networks versus Memory-Based Machine Learning. International Computer Music Conference, Ann Arbor, Michigan, ICMA.

    Iyer, V., Bilmes, J., Wright, M., and Wessel, D., A Novel Representation for Rhythmic Structure, ICMC, Thessaloniki, Greece, 1997.

    Brecht, B., and Garnett, G. Conductor Follower, Proceedings of the 21st. International Computer Music Conference, Banff Centre for the Arts, Banff, Canada, 1995.

    Freed, A. The Rebirth of Computer Music by Analog Signal Processing, Proceedings of the 20th International Computer Music Conference, Aarhus, Denmark, 1994

    Lee, M. and Wessel, D. Real-Time Neuro-Fuzzy Systems for Adaptive Control of Musical Processes. Proceedings of the 19th International Computer Music Conference, Waseda University Center for Scholarly Information 1993, International Computer Music Association.

    Lee, M., Freed, A., Wessel, D. Neural Networks for Simultaneous Classification and Parameter Estimation in Musical Instrument Control, International Society for Optical Engineering Conference, 1992.

    Lee, M., Garnett ,G.E. and Wessel, D. An Adaptive Conductor Follower. Proceedings of the 18th International Computer Music Conference, San Jose State University , 1992, International Computer Music Association.

    Lee, M. and Wessel, D. Connectionist Models for Real-Time Control of Synthesis and Compositional Algorithms. Proceedings of the 18th International Computer Music Conference, San Jose State University , 1992, International Computer Music Association.

    Sandell, G.J. and Martens, W.L. Prototyping and Interpolation of Multiple Music Timbres Using Principal Component-Based Synthesis, Proceedings of the 18th International Computer Music Conference, San Jose State University , 1992, International Computer Music Association.

    Wessel, D. Connectionist Models for Musical Control of Nonlinear Dynamical Systems, The Journal of the Acoustical Society of America, Vol. 92, No. 4, Pt. 2, October 1992.

    Lee, M., Freed, A., Wessel, D. Real-Time Neural Network Processing of Gestural and Acoustic Signals, Proceedings of the 17th International Computer Music Conference, Montreal, 1991, Computer Music Association.

    Wessel, D. Improvisation with Highly Interactive Real-Time Performance Systems, Proceedings of the 17th International Computer Music Conference, Montreal, 1991, International Computer Music Association.

    Wessel, D. Let's Develop a Common Language for Synth Programming, Electronic Musician, August 1991.

    Wessel, D. Instruments That Learn, Refined Controllers, and Source Model Loudspeakers, Computer Music Journal, Vol. 15, No.4, Winter 1991.