Algorithmic Composition: motive

A control function with a particular shape can serve a role similar to a traditional musical motive. Even when it is modified in duration, rhythm, or shape, its relation to the original remains evident and it serves to unify a larger composition. A motive or shape might be recognizable at different structural/temporal levels, in which case the form may take on a "fractal" or "self-similar" character.

In other chapters we've taken some first and intermediate steps to progressively increase the complexity of examples in which a control function modulates another modulator of the same shape, such that a single shaped is used at different formal levels in a somewhat self-similar manner.

Here's a more full blown example of a single control function used to modulate a sound at many formal levels, with modulators modulating other modulators, in this case including the parameters of pitch, note rate, volume, and panning (location).

The carrier sound is a triangle wave oscillator, in the tri~ object. The volume of that oscillator is continually modulated in a way that actually separates it into individual notes; it is windowed by a repeating triangular shape going from 0 to 1 and back to 0--the first half of the triangle function stored in the wavetable in the buffer~. The rate of those notes is itself modulated by a triangle function, varying from 1 to 15 notes per second every 25 seconds (the rate is varied up and down + and - 7 from a central rate of 8 Hz, by a triangle oscillator with a rate of 0.04 Hz).

The volume of the sound is further modulated by another triangular LFO that creates a swell and dip of + and - 15 dB in the overall volume every ten seconds, to give a periodic crescendo and diminuendo spanning 30 decibels, which is about as much as most instrumentalists do in practice, even though their instruments are often technically capable of a wider range of intensities.

The pitch of the sound is modulated in almost exactly the same way as was demonstrated in another article. The pitch glides in a triangular shape around a central pitch that is itself slowly changing in a triangular shape over a span of every 50 seconds. The rate of the glissandi varies from 1 to 15 Hz, varying triangularly in a 20-second cycle. The depth of the glissandi varies from + and - 0 to 12 semitones, controlled by a 15-second cycle (perceptually a 7.5-second cycle).

The perceived location of the sound pans back and forth between left and right controlled by a triangular function at a rate that varies from 1/16 Hz to 16 Hz -- quite gradually to quite quickly -- with the rate itself determined by a triangular cycle that repeats every 30 seconds, using the most common panning technique, known as "intensity panning". This takes advantage of the fact that one of the main indicators of the location of a sound's source is inter-aural intensity difference (IID), the balance of the sound's intensity in our two ears. The more the intensity of sound in one ear exceeds the intensity in the other ear, the more we are inclined to think the sound comes from that direction. Thus, varying the sound's intensity from 0 to 1 for one ear (or one speaker) as we vary the intensity from 1 to 0 in the other ear (or the other speaker) gives the impression of the sound being at different locations between the two ears (speakers). So a triangle wave with an amplitude of + and - 0.5, centered around 0.5 is used to vary the gain of the right audio channel, and 1 minus that value is used to determine the gan of the left audio channel. As one channel fades from 0 to 1, the other channel fades from 1 to 0, and vice versa.

Our sense of the distance of sound sources is complicated, but in general it's roughly proportional to the amplitude of the sound. So the same sound at half the amplitude would -- all other things being the same -- tend to sound half as close to us (that is, twice as distant). The perceived overall intensity of the sound will depend on the sum of the two audio channels. Perceived intensity is proportional to the square of the amplitude, and the perceived overall intensity is thus proportional to the sum of the squares of the amplitudes of the two channels. So if we want to keep the sound seeming to be the same distance from the listener as we pan from left to right, we need to keep the sum of the squares of their amplitudes the same. So, as a final step before output, we take the square root of the desired intensity for each channel, and use that as the gain control value for the channel. The picture below shows the gain values for the two channels as they are initially calculated by the triangle function (on the left) and then shows the actual gain values that will be used -- the square roots (on the right). The first is the desired intensity of the two channels, and the second is the actual amplitude for the two channels that's required to deliver that intensity as the virtual sound location moves between left and right.

In order to make the rate of panning span the desired range from 1/16 Hz to 16 Hz, we used the triangle function as the exponent of the base 2, using the pow~ object. As the triangle function (the exponent) varies from 0 to 4 to -4 to 0, the result will vary from 1 to 16 to 1/16 to 1. When the rate is less than about 1 Hz, the duration of each panning cycle is greater than 1 second, and we can follow the panning as simulated movement; when the rate is greater than 1 Hz, the complete left-right cycle of panning takes places in less than a second, up to as little as 1/16 of a second (62.5 ms), so we perceive it more as a sort of "location tremolo" sound effect.

So in this example program the triangle wave function was used in nine different ways:
1) as the carrier waveform
2) as a window (amplitude envelope) to make individual "note" events
3) to modulate the rate and duration of the notes
4) to create 10-second volume swells
5) to vary the central pitch of the oscillator
6) to make pitch glissandi around that central pitch
7) to vary the depth of those glissandi
8) to vary the rate of those glissandi
9) to vary the panning of the sound

At the end of the article that defined a control function, I remarked that "One can think of a distinctive control function shape as being analogous to a musical motive or formal structure. The same motivic shape can occur over a long period of time (phrase level), or a short period (note level), or an extremely short period (for timbral effect)."

The idea is that when a control function has a shape that is in some way distinctive and memorable, and that shape is used to control one or more parameters of a sound, the shape is exemplified by the sound, and the resulting sound will be recognizable when it recurs. In this way, it can function like a musical motive. And like a musical motive, it can remain recognizable as related to its original occurrence even if it undergoes certain modifications or transformations. Some typical types of transformations of a musical motive are rhythmic augmentation or diminution (multiplying the duration by a given amount to increase or decrease its total duration, and thus decrease or increase its speed), intervallic augmentation or diminution (keeping the contour the same but changing the size of its vertical range), transposition (moving it vertically by adding a certain amount to it), or delay (this is implicit in anything that recurs, because music takes place in time; delay is crucial to musical ideas such as imitation, canon, etc.). All of these can be crudely emulated mathematically with the simple operations of multiplication (to change the size of the shape horizontally and/or vertically) and addition (to offset the shape vertically or in time). Even as a motive is distorted by these manipulations, it retains aspects of its original shape, and its relationship to its original state can remain recognizable, serving an aesthetically unifying role.

Using the same shapes as were used in the example on line segment control functions, let's see how the shape of a control function can be distorted horizontally and vertically with multiplication and addition (i.e., by changing its range, transposition, duration, and delay). These multiple versions of the same shape can be assembled algorithmically into a musical passage. We'll use the same shapes as control functions, and apply them in the same way as before, to the pitch and amplitude of a simple synthesized tone. This program shows one way to compose and synthesize a sequence of related sounds using variations of a single control function.

The shapes stay the same, but their transformations are chosen pseudo-randomly within specific ranges. The durations, which are also used as the delay time before starting the next shape, are chosen from among four possibilities: 100 ms, 500 ms, 2 sec, or 5 sec. The first duration will cause very short notes in which the envelope functions just have a timbral effect. The second duration is also a short note length (1/2 sec), but is just enough time for us to make out the shape within the note. The next two durations are definitely long enough for us to track the shape of the control functions as they affect pitch glissando and volume over the duration of a short "phrase".

The durations are chosen using a distribution of probabilities. An important thing to notice about probabilistic choices of durations is that long durations take up much more time than short durations. So if we used an equal distribution of probabilities, the long notes would take up way more of the total time than the short notes, and it would give the effect of the long notes predominating. The proportion of the given durations is 1:5:20:50, so if each duration were chosen with equal likelihood, the 2-second and 5-second notes would take up 70/76 (92.1%) of the total time. To counter that effect, we can use a distribution that is the inverse of the proportions of the note durations. (See inside the table object.)

As in the earlier cited example, we use function objects to draw the function shapes, for each note we set the range and domain to the desired values, and then we bang the functions to cause them to send their information to the line~ objects.

It's important to note that in this example, instead of the line segment functions controlling frequency and amplitude of the synthesized sound directly, they control the subjective parameters pitch and volume, specified as MIDI note names and decibels, respectively. Much has been written on the logarithmic nature of subjective perception relative to empirical measurement. This is particularly demonstrable in the case of musical pitch and loudness. We perceive these subjective traits as the logarithm of the empirical sonic attributes called fundamental frequency and amplitude. Fortunately, that logarithmic relationship makes it fairly easy to do a mathematical translation from a linear pitch difference or loudness difference into the appropriate exponential curve in the physical attribute of frequency or amplitude. The formula for translating MIDI pitch m into frequency f, according to the 12-notes-per-octave equal-tempered tuning system is f=440(2^((69-m)/12)). Using the reference frequency of A=440 Hz, which is equivalent to MIDI note number 69, multiply that times the (69-m)/12th root of 2. Max provides objects called mtof (for individual Max messages) and mtof~ (for signals) that do this math for you. The formula for translating volume in decibels d into amplitude a is a=10^(d/20). The amplitude a is calculated as 10 to the power of (d/20). This makes a reference of 0 dB equal to an amplitude of 1 (the greatest possible amplitude playable by Max signal processing), and all negative decibel values will result in an amplitude between 1 and 0. Max provides objects called dbtoa and dbtoa~ to do this math for you.

The arguments in the line~ objects might look a little strange, and they're not very important; they were chosen just to keep the program from making any sound when it's first opened. There's not really a MIDI pitch number that corresponds to a frequency of 0 (which would stop the rect~ oscillator)--MIDI pitch number 0 is a super-low C that would actually have a fundamental frequency of 8.1758 Hz--but -300 is a hypothetical MIDI number that would slow the oscillator down to about 0 Hz. Similarly, there is no decibel value that would truly give an amplitude of 0 (it would have to be negative infinity!), but -100 dB is equal to an amplitude of 0.00001, which is for all practical purposes inaudible.

Now let's look at the actual numbers being used for the pitches and volumes. The volume (peak amplitude) of each note will be one of nine possible decibel values, ranging from 0 dB (fff) to -48 dB (ppp) in increments of 6 dB. That's a pretty wide dynamic range for the peak amplitudes; it's at least as wide as most instrumentalists habitually use. Of course, recording systems need a much wider range than this to record all the subtleties of sounds in their quietest moments, but I'm just referring to the dynamic range of the peak amplitudes (the attacks) of the notes. The low end of the pitch range for each note will be one of 10 possible pitches, ranging from E just below the bass staff (MIDI 40) to E at the top of the treble staff (MIDI 76), in increments of a major third (4 semitones). The size of the range (the difference between Pitchmin and Pitchmax) will range from 0.5 to 8 semitones (from 1/4 tone to a minor 6th), as measured in 16 quarter-tone steps. This gives a possibility of some relatively wide glissandi and some very subtle small ones.

To test this program, just start the audio and click on the go button. Notice that most of the notes are very short (100 ms or 500 ms), with occasional longer notes (2 sec or 5 sec). The dynamics are arbitrarily and equally distributed from ppp to fff, and the pitches are arbitrarily and equally distributed from low to high, with a variety of glissando ranges.

My contention is that even though the compositional decisions made by the computer are arbitrary and undistinguished in their own right, there is a sense of unity and consistency to the sounds being played, due to the constant use of single shapes for pitch glissando and amplitude envelope, and that this helps to give an impression of coherency and intentionality. That is presumably the main utility of motive in composition: it gives a sense of repetition, recognizable variation, and consistency to a variety of different sounds.

Algorithmic Composition

Blog Archive

About Me

Friday, September 18, 2009

Modulating the modulators

Sunday, April 12, 2009

Control function as a recognizable shape

Labels