Manipulation of Sound Signals Based on Graphical Representation
A Musical Point of View

Gerhard Eckel
IRCAM
Paris, France

Introduction

The technical generation and transformation of sound (sound synthesis) has strongly influenced musical thought in the twentieth century. For about four decades composers have used sound synthesis as a compositional device in various ways. Although the techniques for synthesizing and processing sound have improved significantly in the recent past, the musical results obtained until now have, with some significant exceptions, not always been convincing. On the other hand, sound synthesis has a strong impact on contemporary music, especially in traditional orchestral or ensemble writing, which has been inspired by composers' experience with synthetic sound in the studio. The direct incorporation of synthetic sound material in composition can still be quite difficult due to a lack of adequate means of control. It is the flexibility of synthesis control which determines the real compositional potential of sound synthesis. In this paper we would like to show how the manipulation of sound signals by means of graphical representations can improve synthesis control and thereby enhance the musical potential of sound synthesis. As an example we will discuss the prototype implementation of a spectrogram editor called SpecDraw.

Musical intentions and constraints

Composers use sound synthesis to construct sound material according to their compositional needs (i.e., to adequately articulate their musical ideas). Sound synthesis allows composers to circumvent the constraints imposed by the mechanical and acoustical limitations of the traditional musical instruments and their performance. Inevitably it also introduces a new set of constraints: the limitations of current synthesis tools and audio equipment. In order to master these new techniques, composers need a working knowledge of the mechanisms of auditory perception as well as experience with synthesis methods.

Sound imagination and control of sound synthesis

A central problem involves the translation of a sound imagined by a composer into a set of parameter trajectories used to drive a specific synthesis method. The composer needs a set of tools that can cope with the complexity inherent in this translation process, resulting from an interplay between the idiosyncrasies of the synthesis methods used, the relevant phenomena of perception and cognition, the language used for formalization, the sound imagined by the composer, and related compositional intentions (e.g. the development of a type of sound in a certain context, or the combination of sound materials into composite entities). The mentioned translation typically occurs by a process of trial and error, with the added complication that each attempt to formalize and realize an imagined sound might lead to a mutation of the mental sound image. The tools used to aid the translation of a mental sound image might thus have a serious impact on the composer's imagination itself. Hence the conceptual and technological choices made during the design of these tools are crucial from a compositional point of view. The entire software environment used for sound synthesis affects the imagination, realization, and incorporation into composition of synthetic sound material. In order to keep the creative process alive and under control, analysis tools are required that can help the composer to understand the differences between intended and achieved results. Besides acute hearing, the composer is in need of specialized tools for analytical and comparative listening, as well as tools for signal analysis and graphical signal representation. These tools are used to monitor sounds, to isolate and understand unexpected side effects, and to explore the developmental potential of the constructed sound material in a specific musical context.

Signal modification based on signal representation

Since the mentioned translation process is iterative in nature, the modifications subsequently applied to the synthesis parameters are usually based on analysis of the previous result. This suggests that one might connect the analysis tools directly to the synthesis and processing tools. In the case of graphical signal representation, this would make the representation directly editable (i.e., interactive), converting the analysis tool into an editor. The graphically expressed signal modifications could then be applied automatically to the signal by means of an invertible analysis technique. The interactive nature of such a tool would greatly enhance the efficiency of the compositional process because the composer would have a direct handle on the aspects of the sound that are represented in the analysis.

The difference between an editor of the type described here and the traditional combination of analysis and synthesis tools is comparable to the difference between a line-based text editor and a modern "wysiwyg" word processor. The composer can directly hear and see the effects of the modifications since their application is immediate. This would permit a better understanding of the perceptual properties of the analysed signal because ad hoc hypotheses about their physical correlates (such as the contribution of certain signal components to the perception of the whole) could be tested directly. Since the analysis parameters usually affect the type of the signal features that can be observed in the representation, the modifications possible with the proposed editor depend largely on the kind of parametrization and analysis method used.

SpecDraw

A spectrogram editor has been designed by the author in an attempt to realize these aims. Based on the well-known grey-scale representation and sliding STFT (Short-Time Fourier Transform), the editor looks much like a combination of a sonagraph and a drawing program. In its current form as a prototype implementation (SpecDraw) it can be used to design time-varying filter responses on top of a graphical signal representation. The resynthesis is performed via inverse STFT in the commonly used overlap-add form, guaranteeing accurate, phase-linear filtering. The filter responses can be specified by any number of arbitrary polygonal areas in time-frequency space, leaving the user considerable freedom in the choice of regions. Furthermore the basic operations known from drawing programs (copying, moving, grouping, sizing, etc.) are applicable to the regions in the spectrogram. The following analysis parameters may be specified: the size of the transform, the length of the signal window, the time shift of the window, the dynamic range, and a coefficient for the mapping of the logarithmic amplitude values onto the grey-scale. The filtering operation has two modes: The regions are either rejected (band-stop) or isolated from rest of the signal (band-pass).

The strength of SpecDraw lies in its simplicity. Conceptually it consists of nothing more than a linear filter, one of the most predictable and therefore easy-to-use signal processing tools known. The interface to the filter is graphical and interactive. The regions to be filtered can be specified from the spectrogram, allowing for easy identification and temporal tracking of any identifiable signal component. The use of SpecDraw is very intuitive because the user interacts directly with a time-frequency representation, a representation that is analogous to the signal analysis occurring in the auditory periphery and so relatively easy to conceptualize.

The current abilities of SpecDraw are somewhat primitive. So far, the program does not allow to specify a time-varying amplitude envelope for a selected region. This would clearly be a useful feature and it would also be easy to implement. More complicated signal modifications based on selected regions could include the controlled distortion of their shape along the time and frequency axes. Such an operation would still be very intuitive from a musical point of view (time and frequency being the main aspects of sound usually organized in music), but the corresponding signal modifications would be quite complicated to implement because of the multiplicity of possible solutions. The quality of the result would depend on the nature of the signal and the parametrization chosen for the resynthesis technique (which would probably have much in common with the structure of a phase vocoder).

These and other functions will be realized in the context of the SignalEditor project currently realized by Peter Wyngaard as a part of the IRCAM Signal Processing Workstation. The SignalEditor is a general-purpose signal editing tool which will include spectrogram editing as one of its representation and editing modes. It will also support simultaneous multiple representation, enabling the user to observe the (a)synchrony of different aspects of a sound - a powerful means of identifying hidden signal features.

Conclusion

Signal modification via editable signal representations has considerable potential for sound synthesis control and its application to contemporary composition. Our experience with SpecDraw confirms that editable signal representations are especially helpful in the process of imagining sound and the exploration of the musical potential of sound material.

The lack of any form of notation (as a basis for composition rather than for communication with instrumentalists) for synthetic sound has hindered the musical exploitation of this exciting compositional possibility. Current signal representations are still too primitive to be used by composers in a similar way to common music notation, as most such representations lack certain abstractions related to auditory cognition (e.g. harmonic pattern recognition). Nevertheless we think that the concept of editable signal representations can significantly contribute to the research on representation and notation tools for sound synthesis.