Algorithmic Composition: Some Initial Thoughts on Visualizing Music

The variety of possible relationships between sound and image is a vast and intriguing subject. In terms of visualizing music, it can include music notation, spectrographic displays, paintings inspired by music, son et lumière, software that algorithmically generates animation based on sound, music videos, Schenkerian analyses, and so on. In terms of "sonifying" images there is music based on paintings, film music, music generated by drawing on film soundtrack, and a wide range of applications of sound used to display numerical data.

Traditional Western music notation attempts to visualize music by means of a symbolic language. Fluent readers of music notation can use it to recreate -- mentally and/or on an instrument -- the music described by that notation. However, such notation does not really try to give a visual analog of the music. Music notation systems are usually intended to a) give a performer a symbolic representation of some aspects of the music's structure and b) give an instructional tablature of how to produce the desired music on an instrument. The distinction between a and b is perhaps best illustrated by music for lute and guitar, which can be written as notes in standard notation or as fingerings in tablature. Notes are better for revealing the musical structure; fingerings are better for revealing the actions required to produce the sound on the instrument. To a greater or lesser degree, each of those two methods of notation serves both functions of description and instruction. But notation is in some ways incomplete and imprecise, and relies on a good deal of unstated cultural knowledge possessed by the reader -- particularly in its treatment of subtleties of timbre, dynamics, rubato, ornamental inflections, etc. Western notation is relatively precise in describing the pitch and rhythm of note events, but is less precise in visualizing the other parameters of musical structure or sonic content.

Digital audio workstation software provides a static amplitude-over-time graph of the sound wave itself. If one zooms in to look closely at a few milliseconds of the sound, one might be able to tell from the waveform whether the sound is periodic, approximately how rich in harmonics it is, or approximately how noisy it is, but since you're only viewing a tiny time segment of the sound, it's hard to know much of interest about its musical content. If one zooms out to look at several seconds of sound one can see its overall amplitude envelope and may thus be able to discern something about its loudness and rhythm, but details of its content are not apparent. It's an adequate visualization for recognizing significant events in the sound and for editing the sound accordingly, but is otherwise not very rich in information.

Likewise a spectrographic display, even an animated one that updates continually in real time, provides information about the frequency content, its harmonicity, inharmonicity, and noisiness, but still requires expert and painstaking analysis and interpretation to derive information about the musical structure.

The following discussion and example will focus on one particular type of visualization: using information derived from the sound or musical structure itself to generate an animation that displays and elucidates one or more aspects of the sonic/musical structure. In other words, visualization that is concerned not with creating an aesthetic object inspired by the music, but with displaying useful information for "understanding" or analyzing the sound.

To visualize music (as distinguished from visualizing sound), we need to focus on the musical attributes that are of particular interest for our purpose. The parameters most important for analyzing music can be many and varied, may be subject of disagreement, and may change as the music proceeds. There are the standard accepted fundamental musical parameters such as pitch, loudness, and duration, some more complex or derived parameters of musical sounds such as timbre, accent, and articulation, and more global attributes of a musical structure such as rhythmic patterns, tempo, event density, dissonance, spatial distribution, and so on. One way to think about music is as an evolving multidimensional entity, in which each dimension corresponds to a parameter or trait or attribute of interest.

How can you depict multi-dimensionality in a two-dimensional space? In addition to the obvious two dimensions, horizontal and vertical, we might consider the color, size, brightness, and shape of objects to be additional dimensions that can carry information. Three-dimensional graphics can give the impression of a third spatial dimension. The passage of time in the visualization generally will correspond to the musical time in a realtime depiction, but other relationships such as slowed-down time or frozen time might also be useful in some circumstances.

The designer/programmer of the visualization must decide what visual dimensions and characteristics most effectively convey musical attributes. For example, we say that pitches are "high" or "low", so we often find it best to depict pitch height in the vertical dimension. On the other hand, we know that in fact there's nothing inherently spatially higher or lower about one pitch relative to another. In some cases we might just as usefully map them to the horizontal dimension, as they appear on a piano keyboard, or to some other continuum. For some synesthetes with perfect pitch, pitch classes might correspond to particular hues. In a particular tuning system or scale system, pitches may be thought of as residing on a virtual grid. There might also be cases in which higher pitches might be deemed more intense or brighter, or lower pitches might be deemed darker or weightier. In short, mappings of sonic and musical attributes to visual dimensions and characteristics are quite subjective and case-dependent, even though many standard mappings exist.

In the program demonstrating counting through a list, one part of the program reads repeatedly through a list of sixteen MIDI pitch numbers. Another part of the program reads through a list of sixteen x,y locations for the display of a black dot. The mapping between the pitches and the dot locations is imprecise, but is still very suggestive of a correlation because the vertical height of the dot corresponds to the pitch height, so the dot seems to depict the pitch contour. The horizontal locations of the dot do not correspond to any specific parameter in the music, but they are chosen so as to suggest a generally circular pattern, thus enhancing the cyclic feeling of the pitch movement when the program counts repeatedly in a loop.

In the programs that demonstrate limiting and moving ranges of random values, pitch is again displayed in the vertical dimension as individual slider positions in a multislider object, and the image is scrolled continuously in the horizontal dimension to depict the passage of time and show the most recent several seconds' worth of note choices.

It's worth noting that in those examples the tempo and rhythm are constant, and are thus not of particular musical interest, so they are not displayed. The dynamics (the note velocities) are of some interest, but arguably of less interest than the pitches, so for the sake of simplicity there's no attempt to display that parameter either.

In the programs that show the first and second steps in modulating the modulator, the pitch of the sound changes in a continuous glissando instead of in discrete steps of the equal-tempered scale, so the pitch value is displayed in a scope~ object capable of displaying the (most recently played one second of the) pitch contour.

The program called "modulating the modulators" produces a sound in which a few different musical attributes are constantly and gradually changing. In that example it's pretty clear what's worth depicting in the sound, because we know what's being modified: the pitch, the volume and the panning (the left-right location). We know this because the computer itself is generating the sound, and we have access to the information that controls those particular parameters. These three parameters can be mapped to three dimensions or characteristics of an animated image.

Here is an example of one possible visualization of the pitch, volume, and panning of the sound generated in the "modulating the modulators" program. (The drawing objects are hidden in the program, but they're shown here in the graphic below.)

This program displays each musical note in terms of its most important attributes: its pitch contour, loudness, and panning. Pitch is shown in the vertical dimension, panning is shown in the horizontal dimension, and volume is shown in terms of the brightness of the note. Since the sound has only a single timbre, it seems reasonable to use a single color to draw the notes. These attributes change during the course of a single note, so the graphic display ends up being a line drawing that depicts the shape of those attributes during the note.

The fragment of the program shown in the picture above has three main focal points: the part that draws the image, the part that stores the image to be drawn, and the part that erases the previously drawn image at the start of each new note.

The drawing part consists of a metro object, a jit.matrix object named notes, and a jit.pwindow object. The dimensions of the jit.pwindow are 128x128 pixels, the same as the dimensions of the jit.matrix. The time interval of the metro is defined such that it will cause the contents of the jit.matrix to be drawn in the jit.pwindow 30 times per second, which is sufficiently often to give a sense of continuity to the animation. The metro gets turned on and off automatically at the same time as the audio is turned on and off, so it's always drawing whenever the sound is playing.

The jit.matrix object is being filled by a jit.poke~ object that refers to the same space in memory. The arguments of jit.poke~ indicate the name of the jit.matrix into which to poke signal values, the number of dimensions of the matrix that we want to access, and the plane number we want to access. (Plane 2 is the green color plane in a 4-plane char ARGB matrix; we're only using the color green.) The purpose of jit.poke~ is to use MSP signals to specify the values and locations to be drawn into the specified plane of the specified matrix. The inlets are, in order from left to right, for the value that will be stored (the brightness of green), the cell location in dimension 0 (the horizontal cell location), and the cell location in dimension 1 (the vertical cell location). The inlets are fed by signals that control volume, panning, and pitch -- the three musical attributes we want to depict. So volume is mapped to brightness, panning is mapped to horizontal location, and pitch is mapped to vertical location. The 128 vertical pixels are mapped to the 0-127 range of MIDI. The signal is first passed through a !-~ 127 object to create an inverse mapping of pitch to pixel number. That's because MIDI pitch values go up from 0 to 127 but cell indices in the matrix go down from 0 to 127. The panning value, which is originally expressed in the range 0 to 1 gets passed through a *~ 127 object to expand it to the horizontal range of the matrix, a range which was chosen for no other reason than to make it the same size as the vertical range. The volume value is originally in the range from 0 to -30 dB, so before it gets to jit.poke~ it gets multiplied by 0.03 to reduce its range to 0 to -0.9, and then has 1.0 added to it to yield brightness values that vary from 1.0 down to 0.1. The result of all of this is that the brightness value that corresponds to volume gets placed into plane 2 of every cell of the matrix that corresponds to a pitch and panning combination that occurs during a given note.

What I'm calling a "note" in this context is each time an amplitude envelope window is generated by the triangle function that modulates the amplitude of the signal coming directly out of the tri~ object in the synthesis patch. Those amplitude envelopes are controlled by the phasor~ that reads through just the first half of the triangle wave function stored in the buffer~. In the drawing portion of the patch you can see the use of the delta~ object to monitor the change in the signal coming from the phasor~. Since the phasor~ generates an upward ramp that goes cyclically from 0 to (almost) 1 and then leaps back to 0, there is a very specific moment, which occurs only at the beginning of each cycle, when the change in the phasor~, instead of being a slight increase, is suddenly a decrease (on the sample when it leaps back to 0). So you can use the <~ 0. object to detect when the change occurs, namely at the instant when delta~ outputs a negative sample value. At that moment the <~ 0. object will output a 1 (the only time the test succeeds), and the edge~ object will report that it occurred. That report is used to trigger a clear message to jit.matrix, causing it to set all its values to 0, effectively erasing the previous note from the matrix.

So in this example, the drawing rate or "refresh" rate of the visual display is 30 times per second, and the content of the display matrix is erased (reset to all 0) once per note. The passage of time is not displayed as a dimension in its own right, but rather by the constant updating of the display. The display changes according to the changing shape of the line being drawn as determined by the change in pitch and panning in the course of each note. This is a non-traditional way to display information about the musical structure, but it directly addresses the three main musical features of this particular sound.

Algorithmic Composition

Blog Archive

About Me

Thursday, September 24, 2009

Some Initial Thoughts on Visualizing Music

1 comment:

Labels