Gerhard Eckel
IRCAM
Paris, France
Introduction
The technical generation and transformation of sound (sound synthesis) has
strongly influenced musical thought in the twentieth century. For about
four decades composers have used sound synthesis as a compositional device
in various ways. Although the techniques for synthesizing and processing
sound have improved significantly in the recent past, the musical results
obtained until now have, with some significant exceptions, not always been
convincing. On the other hand, sound synthesis has a strong impact on contemporary
music, especially in traditional orchestral or ensemble writing, which has
been inspired by composers' experience with synthetic sound in the studio.
The direct incorporation of synthetic sound material in composition can
still be quite difficult due to a lack of adequate means of control. It
is the flexibility of synthesis control which determines the real compositional
potential of sound synthesis. In this paper we would like to show how the
manipulation of sound signals by means of graphical representations can
improve synthesis control and thereby enhance the musical potential of sound
synthesis. As an example we will discuss the prototype implementation of
a spectrogram editor called SpecDraw.
Musical intentions and constraints
Composers use sound synthesis to construct sound material according to their
compositional needs (i.e., to adequately articulate their musical ideas).
Sound synthesis allows composers to circumvent the constraints imposed by
the mechanical and acoustical limitations of the traditional musical instruments
and their performance. Inevitably it also introduces a new set of constraints:
the limitations of current synthesis tools and audio equipment. In order
to master these new techniques, composers need a working knowledge of the
mechanisms of auditory perception as well as experience with synthesis methods.
Sound imagination and control of sound synthesis
A central problem involves the translation of a sound imagined by a composer
into a set of parameter trajectories used to drive a specific synthesis
method. The composer needs a set of tools that can cope with the complexity
inherent in this translation process, resulting from an interplay between
the idiosyncrasies of the synthesis methods used, the relevant phenomena
of perception and cognition, the language used for formalization, the sound
imagined by the composer, and related compositional intentions (e.g. the
development of a type of sound in a certain context, or the combination
of sound materials into composite entities). The mentioned translation typically
occurs by a process of trial and error, with the added complication that
each attempt to formalize and realize an imagined sound might lead to a
mutation of the mental sound image. The tools used to aid the translation
of a mental sound image might thus have a serious impact on the composer's
imagination itself. Hence the conceptual and technological choices made
during the design of these tools are crucial from a compositional point
of view. The entire software environment used for sound synthesis affects
the imagination, realization, and incorporation into composition of synthetic
sound material. In order to keep the creative process alive and under control,
analysis tools are required that can help the composer to understand the
differences between intended and achieved results. Besides acute hearing,
the composer is in need of specialized tools for analytical and comparative
listening, as well as tools for signal analysis and graphical signal representation.
These tools are used to monitor sounds, to isolate and understand unexpected
side effects, and to explore the developmental potential of the constructed
sound material in a specific musical context.
Signal modification based on signal representation
Since the mentioned translation process is iterative in nature, the modifications
subsequently applied to the synthesis parameters are usually based on analysis
of the previous result. This suggests that one might connect the analysis
tools directly to the synthesis and processing tools. In the case of graphical
signal representation, this would make the representation directly editable
(i.e., interactive), converting the analysis tool into an editor. The graphically
expressed signal modifications could then be applied automatically to the
signal by means of an invertible analysis technique. The interactive
nature of such a tool would greatly enhance the efficiency of the compositional
process because the composer would have a direct handle on the aspects of
the sound that are represented in the analysis.
The difference between an editor of the type described here and the traditional
combination of analysis and synthesis tools is comparable to the difference
between a line-based text editor and a modern "wysiwyg" word processor.
The composer can directly hear and see the effects of the modifications
since their application is immediate. This would permit a better understanding
of the perceptual properties of the analysed signal because ad hoc hypotheses
about their physical correlates (such as the contribution of certain signal
components to the perception of the whole) could be tested directly. Since
the analysis parameters usually affect the type of the signal features that
can be observed in the representation, the modifications possible with the
proposed editor depend largely on the kind of parametrization and analysis
method used.
SpecDraw
A spectrogram editor has been designed by the author in an attempt to realize
these aims. Based on the well-known grey-scale representation and sliding
STFT (Short-Time Fourier Transform), the editor looks much like a combination
of a sonagraph and a drawing program. In its current form as a prototype
implementation (SpecDraw) it can be used to design time-varying filter responses
on top of a graphical signal representation. The resynthesis is performed
via inverse STFT in the commonly used overlap-add form, guaranteeing accurate,
phase-linear filtering. The filter responses can be specified by any number
of arbitrary polygonal areas in time-frequency space, leaving the user considerable
freedom in the choice of regions. Furthermore the basic operations known
from drawing programs (copying, moving, grouping, sizing, etc.) are applicable
to the regions in the spectrogram. The following analysis parameters may
be specified: the size of the transform, the length of the signal window,
the time shift of the window, the dynamic range, and a coefficient for the
mapping of the logarithmic amplitude values onto the grey-scale. The filtering
operation has two modes: The regions are either rejected (band-stop) or
isolated from rest of the signal (band-pass).
The strength of SpecDraw lies in its simplicity. Conceptually it consists
of nothing more than a linear filter, one of the most predictable and therefore
easy-to-use signal processing tools known. The interface to the filter is
graphical and interactive. The regions to be filtered can be specified from
the spectrogram, allowing for easy identification and temporal tracking
of any identifiable signal component. The use of SpecDraw is very intuitive
because the user interacts directly with a time-frequency representation,
a representation that is analogous to the signal analysis occurring in the
auditory periphery and so relatively easy to conceptualize.
The current abilities of SpecDraw are somewhat primitive. So far, the program
does not allow to specify a time-varying amplitude envelope for a selected
region. This would clearly be a useful feature and it would also be easy
to implement. More complicated signal modifications based on selected regions
could include the controlled distortion of their shape along the time and
frequency axes. Such an operation would still be very intuitive from a musical
point of view (time and frequency being the main aspects of sound usually
organized in music), but the corresponding signal modifications would be
quite complicated to implement because of the multiplicity of possible solutions.
The quality of the result would depend on the nature of the signal and the
parametrization chosen for the resynthesis technique (which would probably
have much in common with the structure of a phase vocoder).
These and other functions will be realized in the context of the SignalEditor
project currently realized by Peter Wyngaard as a part of the IRCAM Signal
Processing Workstation. The SignalEditor is a general-purpose signal editing
tool which will include spectrogram editing as one of its representation
and editing modes. It will also support simultaneous multiple representation,
enabling the user to observe the (a)synchrony of different aspects of a
sound - a powerful means of identifying hidden signal features.
Conclusion
Signal modification via editable signal representations has considerable
potential for sound synthesis control and its application to contemporary
composition. Our experience with SpecDraw confirms that editable signal
representations are especially helpful in the process of imagining sound
and the exploration of the musical potential of sound material.
The lack of any form of notation (as a basis for composition rather than
for communication with instrumentalists) for synthetic sound has hindered
the musical exploitation of this exciting compositional possibility. Current
signal representations are still too primitive to be used by composers in
a similar way to common music notation, as most such representations lack
certain abstractions related to auditory cognition (e.g. harmonic pattern
recognition). Nevertheless we think that the concept of editable signal
representations can significantly contribute to the research on representation
and notation tools for sound synthesis.