WO1999016049A1 - Systeme de bruitage interactif et bruitage a partir d'un modele - Google Patents

Systeme de bruitage interactif et bruitage a partir d'un modele Download PDF

Info

Publication number
WO1999016049A1
WO1999016049A1 PCT/SG1998/000074 SG9800074W WO9916049A1 WO 1999016049 A1 WO1999016049 A1 WO 1999016049A1 SG 9800074 W SG9800074 W SG 9800074W WO 9916049 A1 WO9916049 A1 WO 9916049A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
sound effects
parameters
recited
synthesizer
Prior art date
Application number
PCT/SG1998/000074
Other languages
English (en)
Inventor
Lonce Lamar Wyse
Peter Rowan Kellock
Original Assignee
Kent Ridge Digital Labs (Krdl), National University Of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kent Ridge Digital Labs (Krdl), National University Of Singapore filed Critical Kent Ridge Digital Labs (Krdl), National University Of Singapore
Priority to JP2000513267A priority Critical patent/JP2001517814A/ja
Priority to AU92924/98A priority patent/AU9292498A/en
Priority to EP98945749A priority patent/EP1019901A1/fr
Publication of WO1999016049A1 publication Critical patent/WO1999016049A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H7/00Instruments in which the tones are synthesised from a data store, e.g. computer organs
    • G10H7/002Instruments in which the tones are synthesised from a data store, e.g. computer organs using a common processing for different operations or calculations, and a set of microinstructions (programme) to control the sequence thereof
    • G10H7/004Instruments in which the tones are synthesised from a data store, e.g. computer organs using a common processing for different operations or calculations, and a set of microinstructions (programme) to control the sequence thereof with one or more auxiliary processor in addition to the main processing unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/02Synthesis of acoustic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/155User input interfaces for electrophonic musical instruments
    • G10H2220/315User input interfaces for electrophonic musical instruments for joystick-like proportional control of musical input; Videogame input devices used for musical input or control, e.g. gamepad, joysticks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/281Protocol or standard connector for transmission of analog or digital data to or from an electrophonic musical instrument
    • G10H2240/311MIDI transmission
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/365Gensound applause, e.g. handclapping; Cheering; Booing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/371Gensound equipment, i.e. synthesizing sounds produced by man-made devices, e.g. machines
    • G10H2250/381Road, i.e. sounds which are part of a road, street or urban traffic soundscape, e.g. automobiles, bikes, trucks, traffic, vehicle horns, collisions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/391Gensound footsteps, i.e. footsteps, kicks or tap-dancing sounds
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/395Gensound nature
    • G10H2250/411Water, e.g. seashore, waves, brook, waterfall, dripping faucet
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/395Gensound nature
    • G10H2250/415Weather

Definitions

  • the present invention relates generally to the field of interactive media and sound production.
  • it relates to the modeling and production of interactive model-based sound effects, and a system which can produce wide range of sound effects.
  • Sound effects are widely used in the traditional media such as film, television, radio and theater, as well as in the interactive media such as computer games, multimedia titles, and World-Wide-Web sites. And while the effects are often produced in similar ways, how they are used may differ greatly depending on the medium. This is because the sound effects remain in a fixed form in the traditional media while the interactive media require the sound effects to vary according to a given situation.
  • a sound effect in a film remains the same from one viewing to the next; there is no interaction between the sound effects and the viewer.
  • the interactive media such as computer games
  • both the visual and audible elements must constructed from an underlying model at the moment they are called upon if the user's experience is to be highly interactive.
  • the visual elements are largely constructed from models, thus providing a high degree of interactivity in the visual experience.
  • a model-based image is different from a prerecorded image in that the former is constructed from a set of parameters which are adjustable; the later is fixed and cannot be easily manipulated.
  • the relationship between the sound and the event, i.e., collision is very crude; that is, the sound effect does not take into account the speed of the car, the material of the object it collides with, the angle of impact, or the reverberation characteristics of the surroundings, etc., for instance. While it is possible to take into account different factors by playing different pre- recorded sound segments corresponding to the particular situation at hand, it is impractical or often impossible to collect a complete continuum of sounds that would precisely correspond to every possible scenario.
  • Producing sound effects by recalling pre-recorded sound segments has the limitation that the existing sound segments cannot easily be combined or manipulated in a seamless manner to produce a variation of the original sound segments which closely simulates the subtleties of the real-world conditions. This is because pre-recorded sounds are not based on a model consisting of parameters of the sound-producing phenomenon. So if one were to want to produce a sound effect which accurately models a real-life event for which a sound segment is not available, a totally new segment of sound must be cumbersomely recorded; it cannot be produced by combining or manipulating the existing segments.
  • Pre-recording of sounds is how virtually all sound effects are currently produced in both the traditional and the interactive media.
  • the main shortcoming of this method is the lack of realism as discussed above.
  • the shortcoming is increased labor. For example, if a series of footsteps is needed in a film, a sound effect specialist would have to view a particular scene in the film then create and record a series of actual footstep sounds (however he chooses to do so whether by actually walking or banging physical objects together) that exactly correspond to what appears in the scene. If for some reason, the scene is changed from a walking scene to a running scene, a new recording session would have to be conducted in order to produce footsteps which correspond to the increased speed of the steps. It wouldn't be possible to produce realistic-sounding footsteps merely by manipulating the previously recorded footsteps from the walking scene.
  • CPU central processing unit
  • another networked computer or an external device such as a musical keyboard, functioning as the controller.
  • an external device such as a musical keyboard
  • wavetable synthesis typically relies on recorded material (e.g. notes of an instrument)
  • FM and waveguide synthesis are purely synthetic.
  • Purely synthetic algorithms provide more flexibility in the sounds they produce, but the imitations of natural sounds tend not to be as realistic as wavetable methods based on recordings of the modeled sound source.
  • Wavetable synthesis usually involves some manipulation of the recorded sounds during execution such as looping sections of the sound, pitch shifting, and adjusting onset and decay times.
  • Simple unprocessed playback of prerecorded clips is only trivially "synthesis” but it is currently the method most commonly used for generating sound effects within software environments
  • an event time scale based on the rate at which synthesis algorithms (e.g. musical notes) are started and stopped.
  • the distinction between the different time scales is somewhat arbitrary, and even inappropriate for certain synthesis paradigms, but it is useful for many.
  • the MIDI protocol is based on the distinction between controller and synthesizer and has made the most of a common bus architecture with a transfer rate of about 3K bytes/second.
  • communication bandwidth is similarly at a premium, while client computers are powerful and quite capable of handling a significant amount of synthesis. If clients share a common library of synthesis routines, then an "event and control parameter" representation of a sound is extremely efficient, puts a low demand on communications bandwidth, and is capable of generating arbitrarily high sound quality depending upon the capabilities of the client.
  • a common desktop configuration includes a soundcard with wavetable capabilities, so that many of the sound processing and mixing operations are performed on the card, the host is burdened with sound construction only at the event control rate.
  • the wavetable event-pattern technique is more flexible and memory-efficient than audio playback, and more computationally efficient than sample-by-sample synthesis. This is significant when sounds are embedded in CPU-intensive applications.
  • the wavetable event-pattern representation falls between two extremes. On one end is a single long event, which obviously does not demonstrate the effectiveness of the representation. On the other extreme are large numbers of very short events. Granular synthesis is a general technique capable of representing any time domain signal, but this end of the spectrum does not achieve the desired reduction in computation, bandwidth and memory that is desirable. It is preferred that the rate of events and synthesizer commands be kept low (the sounds rarely use more than 25 per second) to keep computation and bandwidth requirements manageable, especially when several sounds processes are running at once. On the other hand, the shorter the atomic event elements are, the more flexibility we can provide in controlling the sounds. Quite effective interactivity can often be achieved with event rates on the order of 10's per second per sound effect.
  • the desired control may involve the rate of walking or running, the weight of the person, the type of shoe (if any) being worn and surface characteristics.
  • the event rate is manifestly low (though we actually use several events to represent a single step).
  • the number of wavetables is some function of the desired variability, but a considerable amount of the variability can also be achieved with combinations of standard wavetable parameters such as volume and pitch.
  • textured sounds Another class of sound types that can be modeled with these techniques is textured sounds. In this category are waterfalls, wind, room tone, crowds, traffic, scraping, and rolling sounds. In most multimedia applications today, generating statistically stable sounds is done by creating loop points in a short recording. Unfortunately, the ear is very sensitive to the presence of such loops, the detection of which quickly destroys any sense of realism. Statistical event generating algorithms can be effectively used in most situations where loops are currently used; they also keep the amount of stored data to a minimum, but do not produce the distracting looping effect.
  • Sounds that are perceived as single events can also be modeled using synchronous or overlapping multiple wavetable events with flexibility being the prime advantage over simple audio playback, (memory usage could even be greater than a single event recording).
  • Sounds that are more difficult, but not impossible, to model are those that do not have enough texture variation to permit undetected event changes (e.g. the whir of a drill).
  • Sounds that are impractical to model this way are those that have complex characteristic temporal sequences across many attributes (e.g. speech).
  • the present sound effects system produces sound-effects which are based on parameterized models of sound-generating phenomena such as footsteps, earthquake, etc.
  • the behavioral characteristics of the phenomena are taken into consideration when creating the models to ensure realism.
  • the system allows a user greater flexibility and interactivity. By controlling a set of parameters, the user can interact with the system in real time.
  • the system is versatile in that it can produce a wide range of sound-effects, and is not limited to producing any particular sound effects or a class of sound effects.
  • the system includes a central processing unit is connected to random-access memory (RAM), one or more input devices such as keyboard, mouse, joystick, MIDI controller; a visual display device; a sound synthesizer; and audio output system including digital-to-analog converter (DAC), amplifier, loudspeaker or headphone (or alternatively, a soundcard integrating some of these individual devices can be used); and a non- volatile storage device such a hard disk.
  • RAM random-access memory
  • input devices such as keyboard, mouse, joystick, MIDI controller
  • a visual display device such as a touch screen
  • DAC digital-to-analog converter
  • loudspeaker or headphone or alternatively, a soundcard integrating some of these individual devices can be used
  • a non- volatile storage device such a hard disk.
  • SFX program which consists of a sound effects software engine (SFX engine) and optionally a graphical user interface program (GUI).
  • GUI graphical user interface program
  • an operating system program such as the standard
  • the CPU executes the stored instructions of the programs in memory, sharing its processing power between the SFX engine, the GUI or controlling program, the operating system and possibly other programs according to a multitasking scheme such as the well-known time-slicing technique.
  • the SFX engine Under command of the SFX engine, it delivers a stream of events or commands to the sound synthesizer, which produces digital audio in response to these commands.
  • the output of the synthesizer is a digital audio signal which is converted to an analogue form by the digital to analogue converter (DAC), then amplified and delivered to the user by means of the amplifier and loudspeaker or headphones.
  • the digital audio signal may also be delivered back to the CPU allowing it to be further processed or stored for later retrieval.
  • the SFX engine is controlled directly by means of the GUI, or from an external controlling program such as a computer game or rarely by both at the same time.
  • the GUI When under control of the GUI, the user effectively interacts directly with the SFX program, controlling it by means of one or more input devices such as an alphanumeric computer keyboard, a pointing device, a joystick, a specialized controller such as a slider bank or music keyboard connected by means such as the Musical Instrument Digital Interface (MIDI) standard, or other physical controlling means.
  • the GUI program uses the display device to provide the user with visual information on the status of the SFX engine including which sound effects models are currently invoked, the structure of these sound models, the settings of their parameters, and other information.
  • the sound models consist of the interface functions, the parameters for external control, private data for maintaining state while the process is suspended to share CPU time, indexes into the bank wavetables the model uses, and the event-generating code.
  • the sound models are arranged as an object-oriented class hierarchy, with many sound classes being derived directly from the base class. This structure is due to the fact that there are many attributes and methods common to all sounds (e.g. location, volume), while most other attributes are common to one model, or shared with other models that otherwise have little in common (e.g. surface characteristics of footsteps).
  • the models are based on some sound-generating phenomenon.
  • Some examples of a phenomenon are footsteps, earthquake, running air- conditioner, bouncing ball, moving car, etc.
  • the phenomenon can be virtually anything so long as there are some sounds with which it is associated. Indeed, the phenomenon need not even necessarily be a real- life phenomenon in the sense that it does not have to actually exist in the real world.
  • the sound modeling process begins by identifying the behavioral characteristics associated with the particular sound phenomenon which are relevant to the generation of sound. Behavioral characteristics can be defined as the set of properties which a naive listener would perceive as distinguishing the sound effect from other sound effects, including those which define how it changes or evolves in response to different conditions impinging upon it. For some of these conditions, it is useful to analyze the mechanics of how the sound is generated from the phenomenon. Using footsteps as an example, the sound being generated from footsteps results mainly from two separate events, the impact of the heel hitting a surface, then shortly after, the impact of the toe hitting a surface.
  • the totality of the sound generated results from the heel-toe action of one foot followed by the heel-toe action of the other foot, and so on.
  • the time interval between the sound produced from the heel-toe action of one foot and heel-toe action of the other foot decreases.
  • the sample may be obtained either by recording a sample segment of actual sound found in a real-world phenomenon (or simply taking a segment from some existing pre-recording) or by producing a segment through well-known synthesis techniques, whichever is more convenient or desirable given the particular sound effect being modeled.
  • the choice of the length of the sound samples depends on a number of factors. As a general rule, the smaller the sample, the greater the flexibility. On the flip side, the smaller the sample, the greater the labor and the harder it is to achieve realism.
  • a good rule of thumb is to have a sample which is as long as possible without loss of useful flexibility, that is, where most of the perceptual range of sonic possibilities of the equivalent sound in real life can be achieved by varying the user parameters of the model.
  • the choice of the sound samples also depends on the behavioral characteristics of the phenomenon to some extent, and also on the limitation of the parameters.
  • the parameters represent the various factors which need to be controlled in order to produce the modeled sounds.
  • the parameters can be structured in a number of ways to effect a sound effect, for the preferred embodiment of the present invention, it is useful to view the parameter structure as having three layers, top, middle, and bottom.
  • the top layer consists of the user parameters which are the interface between the user and the sound effects system.
  • the middle layer consists of the parameters employed by the SFX engine, or simply referred to as "engine parameters.”
  • the bottom layer consists of the synthesizer parameters which are well-known parameters found in any of the current sound or music synthesizers.
  • Each of the user parameters affects a combination of engine parameters and synthesizer parameters, though, in simpler cases, a user parameter may control only synthesizer parameters or engine parameters. Any combination of engine and synthesizer parameters is theoretically possible; however, the way in which they are combined will depend on how the user parameter is defined in light of the behavioral characteristics of a particular phenomenon. The user parameters are defined in terms of the desired sound effect.
  • the user parameters can be location, walking speed, walking style, limp, weight, hardness, surface type, etc. Although these parameters can be defined in virtually any manner, it is often most useful if they directly reflect the behavioral characteristics of a phenomenon and the purpose for which the sound effect is being produced.
  • the middle layer parameters, or engine parameters, and the bottom layer parameters, or synthesizer parameters, work in combination to produce the sound effects as defined by the user parameters.
  • the bottom layer parameters can include sound manipulation techniques such as volume control, pan, pitch, filter cutoff , filter Q, amplitude envelope, and many others which are well known to those skilled in the art.
  • the middle layer can be viewed as the layer which "models" the sound using the basic sound manipulation parameters provided by the bottom layer.
  • the parameters can broadly be classified as timing, selecting, and patterning.
  • the timing parameters basically control the length of the time intervals between triggering and stopping pieces of sound within a particular sound effect, and time intervals between other commands sent to the synthesizer.
  • the selecting parameters control which sound samples are selected at a given moment, including the order in which samples are selected.
  • the patterning parameters control the relationships between these factors.
  • FIG. 1 is block diagram illustrating the preferred embodiment of the present system.
  • FIG. 2 is a functional block diagram illustrating the logical structure of the present system.
  • FIG. 3 is illustrates a graphic user interface showing user parameters for the sound effects footsteps where the user parameters are represented in the form of graphical sliders.
  • FIGS. 4A through 4E are flow diagrams illustrating the various steps the present system follows for various commands.
  • FIG. 5 illustrates the parameter structure employed by the preferred embodiment of the present system.
  • FIGS. 6 - 14 are tables charting the parameters used for actual sound models.
  • the overall system can be conceptualized as several layers, the top layer being applications that will use the sounds.
  • the graphical user interface is just one such application (though a rather special one).
  • the next layer is the collection of algorithmic sound models. These are objects that encapsulate data and describe the sound behavior.
  • the models provide the interface which applications use to control the sounds by playing, stopping, and sending messages or by updating parameters. Sound models generate commands and pass them at the proper time to the synthesizer.
  • the synthesizer is the back-end where the sample-by- sample audio waveforms are produced, mixed, and sent to the audio output device.
  • FIG. 1 is a block diagram illustrating the main elements of the preferred embodiment of the present invention.
  • a central processing unit 1 is connected to random-access memory (RAM) 2, one or more input devices such as keyboard 3, mouse 4, joystick 5, MIDI controller 6; a visual display device 7; a sound synthesizer 8; and audio output system including digital- to-analog converter (DAC) 9, amplifier 10, loudspeaker or headphone 1 1 (or alternatively, a soundcard integrating some of these individual devices can be used); and a non-volatile storage device such a hard disk 12.
  • RAM random-access memory
  • SFX program which consists of a sound effects software engine (SFX engine) 13 and optionally a graphical user interface program (GUI) 14.
  • GUI graphical user interface program
  • the GUI can be replaced by a controlling program 15 which replaces the GUI.
  • an operating system program 16 such as the standard operating system of any personal computer system.
  • the CPU 1 executes the stored instructions of the programs in memory, sharing its processing power between the SFX engine 13, the GUI 14 or controlling program 15, the operating system 16 and possibly other programs according to a multitasking scheme such as the well-known time- slicing technique.
  • the SFX engine Under command of the SFX engine, it delivers a stream of commands to the sound synthesizer 8, which produces digital audio in response to these commands.
  • the output of the synthesizer is a digital audio signal which is converted to an analogue form by the digital to analogue converter (DAC) 9, then amplified and delivered to the user by means of the amplifier 10 and loudspeaker or headphones 11.
  • the digital audio signal may also be delivered back to the CPU allowing it to be further processed or stored for later retrieval.
  • the hard disk or other nonvolatile storage 12 provides means to store indefinitely the following items: the SFX program itself including the data and instructions representing multiple sound effects models settings of parameters and other variable elements of the SFX program optionally, digital audio output from the synthesizer 8 under control of the SFX program.
  • the SFX engine 13 is controlled directly by means of the GUI 14, or from an external controlling program such as a computer game 15 or rarely by both at the same time.
  • the GUI program 14 uses the display device 7 to provide to the user with visual information on the status of the SFX engine 13 including which sound effects models are currently invoked, the structure of these sound models, the settings of their parameters, and other information.
  • the display device 7 When the control is by means of a pointing device, the display device 7 also provides feedback to the user on the logical position of the pointing device in the usual manner. By observing the display 7 and/or listening to the audio output while manipulating the input devices 3 through 6, the user is able to alter sound effects until satisfied with the results.
  • This mode of operation is designed to allow the user to create specific sound effects according to his/her needs from the generic sound effects models of the SFX system, by selecting sound effects models, initiating or triggering them to produce audio output from the system, adjusting the parameters of the models, selecting elements of models, and other actions.
  • the SFX engine 13 is under the control of an external controlling program 15, such as a computer game, the program of a network-resident information site (website), a virtual reality program, a video editing program, a multimedia authoring tool, or any other program which requires sound effects.
  • an external controlling program 15 such as a computer game, the program of a network-resident information site (website), a virtual reality program, a video editing program, a multimedia authoring tool, or any other program which requires sound effects.
  • the user interacts with the controlling program 15 by means of the input devices 3 through 6 and the display device 7.
  • the SFX engine 13 acts as a slave to the controlling program 15, producing sound effects under its control. This is achieved by allowing the controlling program to send data to the SFX engine 13, this data being interpreted by the SFX engine as controlling messages.
  • the SFX engine will typically not be visible to the user on the display 7, and will be controllable by the user only indirectly via aspects of the controlling program which influence the SFX engine.
  • the manner and degree of control which the user has over the SFX engine is entirely a function of the controlling program and is decided by the designer of the controlling program.
  • the logical structure of the present system is shown in Fig 2.
  • the main elements are the SFX engine 1 which, as described above, may be under control of the GUI 2 or, in the alternative mode of operation, under control of an external controlling program 3.
  • the synthesizer 4 which leads to the audio output system.
  • the user controls the system by means of the GUI 2, which acts to accept user input (such as keystrokes of the computer keyboard or movements of a pointing device) and to inform the user both of the status of the system and of the effect of his/her actions.
  • User actions which affect the production of sound effects generate control messages which are sent from the GUI to the SFX Engine 1 in order to initiate, terminate, and control sound effects. These messages are in a format determined by the SFX Engine and known to the GUI.
  • the SFX engine 1 models the behavior of the currently-active sound effects and generates a stream of events or commands which are sent to the synthesizer 4, which in turn generates the audio output.
  • Certain information affecting the manner of display to be used by the GUI 2 is contained within the SFX engine 1 : for example the manner in which the control parameters of a sound effects model should be displayed varies from one model to another, and the information about the currently-active sound effects models is held by the SFX engine.
  • the SFX engine to send display information to the GUI or allowing the GUI to elicit display information from the SFX engine.
  • the user interacts with the external controlling program 3 in a manner which is completely independent of the invention.
  • the controlling program 3 sends control messages to the SFX engine 1 in order to initiate, terminate, and control sound effects. These messages are in a format determined by the SFX Engine and known to the controlling program, and typically are similar to, or a subset of those used by the GUI in the first mode of operation described above.
  • the main purpose of the SFX engine 1 is to model the behavior of the currently-active sound effects and generate a stream of events or commands which are sent to the synthesizer 4, which in turn generates the audio output.
  • the main internal elements of the SFX engine 1 are a set of interactive sound effects models (SFX models) 5; an Application Programmer Interface (API) 7; A Message Processor 8; a Parameter Linker/Mapper 9; A Timing and Synthesizer Command Processor (TSCP) 10.
  • SFX models interactive sound effects models
  • API Application Programmer Interface
  • Message Processor 8 Message Processor
  • Parameter Linker/Mapper 9 Parameter Linker/Mapper 9
  • TSCP Timing and Synthesizer Command Processor
  • each model consists of data and programmed instructions representing the sound characteristics and behavior of a sound effect, or a class of sound effects. These models may be invoked sequentially or simultaneously, so that the system is capable of producing sound effects in isolation or in combination, typically after an imperceptible or near-imperceptible delay (in so-called "real time").
  • Each SFX model is provided with one or more control parameters which may be used to alter the sound produced by the SFX model, and these control parameters may also be modified in real time to produce audible changes in the output while the system is producing sound effects.
  • compound sound effects models may be made up of other sound effects models arranged in a hierarchy consisting of any number of levels, thus enabling arbitrarily complex models to be built from a number of simpler models.
  • the Application Programmer Interface (API) 7 receives data which is interpreted by the SFX engine as controlling messages, these messages arriving from either the GUI 2 or the external controlling program 3.
  • the API decodes the messages in order to establish which type of message has been sent, and forwards the messages to the Message Processor 8.
  • the Message Processor 8 performs actions as directed by the controlling messages, including starting and stopping particular sound effects, loading and unloading sound effects models from RAM, applying the effect of modifications of control parameters to the SFX models, modifying settings of the SFX engine which influence its overall behavior, and otherwise controlling the SFX Engine.
  • a Parameter Linker/Mapper 9 provides a means of endowing SFX models with one or more alternative sets of control parameters or meta- parameters, where these meta-parameters are linked to the original control parameter set of the SFX model or to other meta-parameters in a hierarchy of parameters. It also provides means of applying mathematical transformations to the values of control parameters and meta-parameters.
  • the Parameter Linker/Mapper is useful because the original control parameters of a particular SFX model are not necessarily the most appropriate or useful in every case, for example when the SFX engine is being controlled by an external controlling program 3 which has its own design constraints, or when the SFX model forms part of a compound SFX model as described above.
  • the Timing and Synthesizer Command Processor (TSCP) 10 provides a number of functions related to timing and to the processing of events and other commands to be sent to the synthesizer 4.
  • the invention is not restricted to any particular method of synthesis, and details of this element depend significantly on the type and design of the synthesizer.
  • the SFX engine operates by producing a stream of commands such as MIDI commands which are delivered to the synthesizer in order to produce sounds, and typically this process occurs in real time. Most synthesizers operate by producing or modifying the output sound at the moment an event or command is received. A simple implementation of the SFX engine might therefore produce synthesizer commands only at the moment they are required by the synthesizer, but this is liable to timing disruption because the CPU may be unable to process the complex command stream of multiple SFX models quickly enough to avoid audible disruption of the output sound.
  • a more sophisticated implementation can achieve greater consistency of timing by generating the commands a short interval ahead of the current time, queuing them in a mechanism such as a data buffer, and delivering them to the synthesizer at the appropriate time.
  • the TSCP provides this function in such a way that the interval by which commands are generated ahead of the current time may be adjusted to an optimum value which may also be set differently for different SFX models.
  • the optimum is a compromise between the need to avoid timing disruption and the need to make the system responsive to changes in its control parameters.
  • each synthesizer channel is set up to create different sound elements of the sound effects.
  • these channels are a limited resource and must be managed carefully, for example allocated dynamically upon demand.
  • the TSCP acts as a synthesis channel manager.
  • one purpose of the hard disk or other nonvolatile storage (12 in Fig 1) is to provide a means to store indefinitely the settings of parameters and other variable elements of the SFX program.
  • Such parameters and other elements may be saved while the system is in the mode where it is being controlled directly by a user using the GUI, then recalled when the system is in the alternative mode under control of an external controlling program.
  • This allows a user to experiment directly with the parameters of the sound effects using the GUI, save the set of values of the parameters found to be most appropriate to the application, then recall this same set of values while the SFX engine is under control of the external controlling program in order to have the system produce an identical or near- identical sound effect.
  • Saving and recalling a sound effect in this way differs from saving and recalling a digital audio signal of the sound effect in that it is entirely based on a model of the sound effect and may therefore by altered after it has been recalled by means of changing its parameters.
  • the sound models may be modeled closely on the physics or other behavior of real-world objects so that the model produces realistic sound, and responds in realistic and/or predictable ways to parameter changes.
  • the sound effects models may be assigned a set of control parameters deemed most important or appropriate to the particular sound effect in question, these being closely related to the behavioral characteristics of the sound generating phenomenon being modeled. This set of parameters may include parameters unique to the particular model, parameters that are generic to sets of similar models, and parameters that are generic to all models of the system.
  • a model of human footsteps might have a parameter for walking style which would be unique to this model, another parameter for walking speed which would be common to all human and animal footstep models, and other parameters such as volume or reverberation depth common to all models.
  • the system can include models which are programmed with realistic simulations of naturally-occurring sound-producing entities, other sound effects which are exaggerated in character for dramatic effect, and other sound effects of a purely imaginative nature which have no counterpart in the real world.
  • the sound effects models may be modeled to any chosen degree of precision on the behavior of their naturally-occurring counterparts, so that the sound effects models will automatically provide accurate reproductions of the sounds, sound-sequences or other audible characteristics of their naturally-occurring counterparts.
  • the system can also support "Compound Sounds”: these are sound models consisting of a hierarchy of other sound models with any number of levels in the hierarchy. Typically they may represent an entire scene consisting of many sonic elements. At the top level the user can make changes to the whole scene (e.g., changing the overall volume), but control over individual elements is also possible, and these lower-level elements can optionally be isolated (listened to "solo") when making adjustments to them.
  • the system includes generic support for "parameter linking" in which parameters may be linked to combinations of other parameters according to mathematical relationships; this allows, for example, high-level parameters to be used to make broad sweeping changes in multiple lower-level parameters, or to apply scaling to other parameters, or to make complex sets of changes in several other parameters.
  • the system can introduce fluctuations (typically of a random or semi-random nature) into the sounds produced in order to avoid exact repetition and achieve a natural effect.
  • fluctuations typically of a random or semi-random nature
  • Techniques for introducing fluctuations include:
  • the synthesizer is one based on replaying samples, randomly selecting samples from a collection of similar but non-identical samples
  • the system generates the stream of commands to the synthesizer a short interval ahead of the current time, this interval being set such that it is long enough to overcome potentially-audible disruption of the sound output which would occur if time-critical commands were generated at the moment they are required, but short enough that the system responds to changes in its control parameters after an imperceptible or near-imperceptible delay.
  • the system provides two modes of triggering.
  • the sound effects typically of a continuous, evolving, or repetitive nature
  • the sound effects typically of a short, non-continuous nature, are triggered each time they are required, thus allowing precise synchronization with visual events in a computer game, film, video production, or animation.
  • the system includes generic sound effects models in which the behavior of a class of sound effects is encoded, and which provides a method by which a user of the system can create specific sound models by selecting options of the generic models, setting the values of variables of the generic models to specific values, and providing the synthesizer with its own samples.
  • the sound models consist of the interface functions, the parameters for external control, private data for maintaining state while the process is suspended to share CPU time, indexes into the bank wavetables or synthesis data the model uses, and the event-generating code.
  • the sound models are arranged as an object-oriented class hierarchy, with many sound classes being derived directly from the base class. This structure is due to the fact that there are many attributes and methods common to all sounds (e.g. location, volume), while most other attributes are common to one model, or shared with other models that otherwise have little in common (e.g. surface characteristics of footsteps).
  • the sound models have a compute-ahead window of time which is the mechanism by which they share the CPU. This window can be different for different sound models, and is usually in the range of 100-300 milliseconds.
  • the sound model process is called back at this rate, and computes all the events up to and slightly beyond the next expected callback time. The events are time-stamped with their desired output times, and sent to the output manager (for further details see FIGs 4A and 4B, and description provided below).
  • a rate parameter is meaningless.
  • the present system provides for either or both methods of control. If event-by-event control is needed, the model's standard play function is not invoked, but the object provides a messaging interface which is used instead. All the other support (e.g. statistical variability of successive sounds, parameter control) is still available. For sound models that use rate to control other attributes, a meaningful rate must be measured from the event triggers. A more complex issue of control can be illustrated with an applause model which, for example, is to be controlled in real-time using a parameter for the number of clappers.
  • the parameter would typically start at 0, be driven up to a level corresponding to how many people are in the virtual audience, remain at that level for some time, then gradually decay back to zero.
  • an application may not need such intimate control. It may be preferable to simply specify the number of people and an "enthusiasm" level (a "meta-time” parameter) that could in turn affect the temporal envelope of the "number of people” parameter.
  • the application would only have to concern itself with the "enthusiasm” parameter when (or before) the applause sound is initiated. The two methods of control are mutually exclusive.
  • the applause example is different from the footsteps example because with footsteps, both types of control discussed (individual footsteps vs. rate) are real-time.
  • the contrasting methods of control in the applause example are between a meta-time specification of a temporal trajectory, and real-time control of the trajectory. We believe that the most useful way to support these control choices is to record parameter trajectories created by the developer using the GUI, and then use the trajectories during playback after a trigger event from the application.
  • the present method and system produce sound effects which simulate sounds associated with a certain phenomenon.
  • Some examples of a phenomenon are footsteps, earthquake, running air-conditioner, bouncing ball, moving car, etc.
  • the phenomenon can be virtually anything so long as there are some sounds with which it is associated. Indeed, the phenomenon need not even necessarily be a real-life phenomenon in the sense that it does not have to actually exist in the real world. For instance, the phenomenon could be a firing of a futuristic phaser gun. Although such a gun may not currently exist (hence the phenomenon cannot exist), this fact is irrelevant so long as there is some perception about what the sounds associated with the phenomenon might be like or what might be acceptable to the listeners. It is also useful to have some perception about how the sounds would vary depending on various hypothetical factors.
  • the sound modeling process begins by identifying the behavioral characteristics associated with the particular sound phenomenon which are relevant to the generation of sound. Behavioral characteristics can be defined as the set of properties which a naive listener would perceive as distinguishing the sound effect from other sound effects, including those which define how it changes or evolves in response to different conditions impinging upon it. In many cases, they bear a one-to-one correspondence to the terms a layman would use to describe the sound effect.
  • the behavioral characteristics are properties which a naive listener might expect such an object or phenomenon to possess if it did exist.
  • the behavioral characteristics would include things such as speed, degree of limp and stagger, weight (of the person producing the footsteps), surface type (e.g., cement, grass, mud), location (i.e., position relative to the listener), surrounding acoustic, etc. It can be easily appreciated that these characteristics define the sound for a particular set of conditions. For instance, the sounds produced from footsteps from a mad dash would be different from those produced in a casual stroll; footsteps on hard marble would sound differently than footsteps on wet mud.
  • the sound being generated from footsteps results mainly from two separate events, the impact of the heel hitting a surface, then shortly after, the impact of the toe hitting a surface.
  • the totality of the sound generated results from the heel-toe action of one foot followed by the heel-toe action of the other foot, and so on.
  • the time interval between the sound produced from the heel-toe action of one foot and heel-toe action of the other foot decreases.
  • the sound produced from the heel and the sound produced from the toe actually overlap and it becomes difficult to distinguish the sounds as being separate and distinct.
  • the heel-to-toe time is affected by another parameter.
  • the leg falls rapidly and perpendicular to the ground, and thus, the heel-to-toe time is very short.
  • a long stride produces a long heel-to-toe time because the heel touches the ground while the leg is far from perpendicular and the toe has a relatively long distance to travel before it touches the ground.
  • the heel-to-toe time is the net result of both walking speed and "walk style" (march versus long stride).
  • the general principle is that the internal parameters of the model may be influenced by many of the external or "user" parameters in mathematical relationships of arbitrary complexity.
  • this knowledge about the mechanics of sound generation is important for two main reasons when attempting sound modeling. First, it allows one to vary the correct set of parameters so as to produce the most realistic sound effect. For instance, in the footstep example given above, the resulting sound effect would not sound very realistic had someone varied the time interval between the sound produced from one heel-toe action to another without also proportionately varying the time interval between the heel and the toe.
  • the second main reason for analyzing the mechanics is that it allows one some notion of the size and type of sound sample that is needed.
  • the sample may be obtained either by recording a sample segment of actual sound found in a real-world phenomenon (or simply taking a segment from some existing prerecording) or by producing a segment through well-known synthesis techniques, whichever is more convenient or desirable given the particular sound effect being modeled. For instance, in a case where a sound of a phaser gun is being modeled, it may be more convenient to simply synthesize a sample, given than no such phaser gun actually exists in the real-world.
  • the choice of the length of the sound samples depends on a number of factors. As a general rule, the smaller the sample, the greater the flexibility. On the flip side, the smaller the sample, the greater the labor and the harder it is to achieve realism.
  • a good rule of thumb is to have a sample which is as long as possible without loss of useful flexibility, that is, where most of the perceptual range of sonic possibilities of the equivalent sound in real life can be achieved by varying the user parameters of the model. For instance, in the case of the footsteps, if one were to want to produce footsteps of different speeds, it would be necessary to obtain a set of samples including heel sounds and toe sounds, for the reasons provided above. However, this does not always mean that one needs to record the two sounds separately since the current editing techniques allow for splicing and other forms of editing to separate a single recording into multiple samples. But the splicing technique may be difficult or impossible for cases where the sounds overlap.
  • the choice of the sound samples also depends on the behavioral characteristics of the phenomenon to some extent, and also on the limitation of the parameters (parameters are discussed in detail below).
  • some sound effects do not require additional samples while some do.
  • the timing needs to be varied, and hence, this can be done with any existing sample.
  • the surface on which the footsteps are made it is easier to simply obtain a sample of footsteps on each of the surfaces rather than attempting to manipulate an existing sample to simulate the effect.
  • it would not be easy to produce a sound of a footstep on a soft muddy surface using only a sample of footsteps on a concrete surface. How many samples are needed for a given phenomenon depends on the scope and the range of sound effects one desires, and varies greatly from one sound effect to another.
  • a continuous spectrum can often be simulated by collecting points along the spectrum.
  • a "strength of laughter” parameter may be constructed by collecting a set of laugh samples at different "degrees of hilarity", then selecting individual samples according to the setting of the "strength of laughter” parameter.
  • this technique is combined with the random selection described above.
  • the parameters represent the various factors which need to be controlled in order to produce the modeled sounds.
  • the parameters can be structured in a number of ways to effect a sound effect, for the preferred embodiment of the present invention, it is useful to view the parameter structure as illustrated in FIG. 5.
  • the top layer consists of the user parameters which are the interface between the user and the sound effects system.
  • the middle layer consists of the parameters employed by the SFX engine, or simply referred to as "engine parameters.”
  • the bottom layer consists of the synthesizer parameters which are well-known parameters found in any of the current sound or music synthesizers.
  • the user parameters are defined in terms of the desired sound effect.
  • the user parameters can be location, walking speed, walking style, limp, weight, hardness, surface type, etc.
  • these parameters can be defined in virtually any manner, it is often most useful if they directly reflect the behavioral characteristics of a phenomenon and the purpose for which the sound effect is being produced. In many cases, they are the obvious, easily-understood parameters that a layman might use to describe the sound.
  • a user parameter such as surface type might be a useful parameter for the phenomenon footsteps, it probably would not be useful for a phenomenon such as earthquake, given that surface type probably has no meaning in the context of an earthquake.
  • the user parameters can be represented in a number of ways so as to give control access to the user.
  • GUI graphic user interface
  • the user can slide the slider bar to control the magnitude of the effect.
  • the walking speed is increased as the slider is moved to the right.
  • the limp slider the amount of limp in the walk is increased as the slider is moved.
  • several user parameters can be invoked at once. For instance, by invoking both the speed slider and the limp slider, one can achieve any combination of limp and speed. Some combinations are obviously not desirable, though may be possible. For instance, one probably would not combine the surface type "metal” with “marble”. In contrast, "leaves” might well be combined with "dirt” to achieve an effect of footsteps on leaves over dirt.
  • the middle layer parameters, or engine parameters, and the bottom layer parameters, or synthesizer parameters, work in combination to produce the sound effects as defined by the user parameters.
  • the bottom layer parameters can include sound manipulation techniques such as volume control, pan, pitch, filter cutoff, filter Q, amplitude envelope, and many others which are well known to those skilled in the art.
  • the middle layer can be viewed as the layer which "models" the sound using the basic sound manipulation parameters provided by the bottom layer.
  • the middle layer parameters can broadly be classified as timing, selecting, and patterning.
  • timing parameters basically control the length of the time intervals between triggering and stopping pieces of sound within a particular sound effect, and time intervals between other commands sent to the synthesizer.
  • the selecting parameters control which sound samples are selected at a given moment, including the order in which samples are selected.
  • the patterning parameters control the relationships between these factors.
  • the user parameter (top layer) is speed.
  • the time parameter is made to decrease the time interval between one heel-toe action to another, as well as the time interval between the heel and the toe.
  • This timing is also affected by the "style" parameter as described above.
  • the pattern or behavior of the footsteps does not change as speed and style are altered. A heel sound is always followed by a toe sound, etc.
  • the only class of engine parameters affected are those concerned with selection, since the timing and patterning aspects do not change.
  • different sets of samples will be selected, but the timing and patterning do not change.
  • synthesizer parameters will have to be invoked either in isolation or in combination with engine parameters.
  • the synthesizer parameters, pitch, volume, etc. need to be controlled in response to the user parameter, weight (of the person making the footsteps), since typically a heavier person would produce footsteps which are deeper in pitch, louder, etc. (though this may not always be true in real life).
  • weight of the person making the footsteps
  • the behavior characteristics will have some bearing on the choice of the synthesizer parameters to be used, there is no hard and fast rule as to how these parameters should be selected.
  • FIGs. 6 through 14 are tables charting the various parameters that were used for some actual sound models. Taking the table in FIG. 6 as an illustrative example, the first column lists the user parameters plus "random fluctuations" (see above for description for "random fluctuations").
  • the subsequent columns have a heading at the top showing the engine and synthesizer parameters, the engine parameters comprising the first three columns subsequent to the user parameter column.
  • the "X" in a box indicates that the parameter in that column was used for the sound modeling for the user parameter found in that particular row.
  • the user parameters, Break, Clutch, and Gas Pedal control two "internal" variables, car speed and engine speed.
  • the two internal variables are governed by a 2-D differential equation with the Pedal settings as inputs.
  • the car speed and engine speed in turn control the synthesizer event generation.
  • the engine speed controls the firing of pistons, each firing is a separately triggered event (many hundreds per second).
  • the car speed controls the "rumble" of the car strolling along the road.
  • the wind sound model consists of a very small number of events. Most of the user parameters (and the random fluctuations) affect the same set of synthesis parameters, i.e., volume, pitch, filter, etc., but they affect them in different ways. For instance, "Strength" controls the mean value of the parameters (stronger wind has higher volume, pitch, filter Q, etc).
  • the "Width of Variation” controls the deviation from the mean (of the same parameters) and "Gustiness” controls the rate of change of the parameters.
  • Wind Strength also controls the number of layers (e.g., number of “whistles”) in the sound.
  • the development environment for embedding sound models into applications needs to be richer than the traditional one where a recorded sound is selected from a CD, loaded into the environment, played and stopped. This is partly because sound selection involves more than selecting a track, but rather involves adjusting parameters. There is thus an important role to be played by an interactive environment which is separate from the target application.
  • the developer's tool consists of two parts; the GUI, Figure 3, and the SFX engine.
  • the GUI the developer browses the database of sound models, chooses the sound models that will be used in an application, and adjusts the sound parameters while listening to the sound in real-time. There is no code-compile-listen cycle; sounds are chosen and adjusted by ear.
  • the parameters and their names are high-level intuitive, and related closely to behavioral characteristics of the real-world sound which is being modeled (e.g., "roughness" rather than "AM depth and regularity").
  • the developer creates file names where the customized sound models (i.e., as set by the developer using the user parameters) will be stored. To then play and control the sound models from within the application, the API calls described below are used.
  • API Application Programmer Interface
  • Functions are provided to the application for initializing the audio engine, loading sounds, creating a map between application variables and sound parameters, playing and stopping sounds, and sending user parameters and messages to sounds. Sounds can be loaded and unloaded from memory as needed.
  • a call to PLAY, FIG. 4A will initiate the playing of the sound over the computers sound system, and a call to STOP, FIG. 4D, will cause the sound to cease playing. After a sound has been stopped, it can be played again, unless a call to unload has been made, in which case the sound model is removed from memory.
  • PLAY and STOP are the only routines that need to be called.
  • sound models can be used exactly as digital audio files have always been used by applications. Developers are confronted with nothing that forces them to deal with the fact that sounds are being generated algorithmically in real-time rather than being played from a file, other than that they may notice that sounds are (in general) more natural and pleasing because they are not identical each time they are played.
  • the application will control a sound using an application variable that has a range of possible values determined by its role in the application.
  • the sound model user parameter has its own range of possible values determined by how the user parameter is used within the sound object.
  • the range of the application variable can be automatically mapped on to the range of the user parameter with a call to a function called S ⁇ tParamMap. This call only needs to be done once for a given user parameter, after which any call to SetParam will automatically perform the proper mapping. Sound locations are similarly controlled by a single call to a "mapping" function to establish the units, and "set" functions to update the location of the sound emitter and receiver.
  • trigger messages that can be sent from the application, FIG. 4E.
  • the messages a sound accepts are unique, all call the same base sound method (which simply does nothing in response to messages a particular sound does not handle). These messages usually have the effect of immediately triggering some element of a sound, such as a train whistle when a steam train sound is playing.
  • This messaging method is also the means by which event-by-event control is implemented (e.g., triggering individual footsteps), and the way parameter presets are switched. Details of Implementation
  • FIG. 4A shows what happens when play is initiated, either in response to a user interacting through the GUI, or a message arriving from a game or other application program.
  • the system picks up the current settings of the user parameters. It then generates a set of commands such as MIDI messages which are sent to the synthesizer. It generates these a short way into the future - a time [X + Safety Margin] which is long enough to ensure that delays in the system will never result in gaps in the audio output, but short enough that changes in user parameters cause results which are perceived as almost immediate by the user.
  • the system then initiates a system timer which will callback (i.e.
  • Time X is typically of the order of 10 - 100 mS
  • Safety Margin is typically of the order of 5 - 10 mS.
  • FIG. 4B shows what happens after time X.
  • the buffered stream of synthesizer commands generated as described above is almost empty (commands corresponding to an amount of time equal to the Safety Margin is left before it underruns).
  • the system first processes any changes in parameters since the last call -these may include the internal results of changes in any of the user parameters and (in some models) changes due to the progress of time. Following this, it makes small random changes in certain internal parameters in order to introduce subtle fluctuations in the sound - this is required for a natural or pleasing effect. It then generates another stream of commands for the following period X, and arranges for another callback after that time. This generate-callback cycle will continue until the system is stopped.
  • FIG. 4C shows what happens when one of the user parameters is changed.
  • the system includes a sophisticated mapping scheme where a change in a user parameter can cause changes in any number of internal parameters of the model. (This is required in order to ensure that the user parameter behaves in a way which corresponds well with the behavioral characteristics of the phenomenon being modeled.)
  • This mapping scheme also provides for both linear and non-linear transformations of the user parameter value, and these transformations may be distinct for each of the possibly-many mappings of user parameter to internal parameters of the model.
  • the parameter changes may also result in the immediate sending of one or more synthesizer commands. In other cases these parameter changes are acted upon at the next callback (see above).
  • FIG. 4D shows what happens when a stop message is received, in response either to a user interacting through the GUI, or a message arriving from a game or other application program.
  • the system first cancels the pending callback. It then generates synthesizer commands into the future as required to produce the desired ending to the sound (for some sounds this may be an abrupt stop, in others it must be more gentle for a natural or pleasing effect; in some cases, it may involve modifications of synthesizer parameters). Finally it releases the synthesizer channels which were granted when the sound was started, and performs a number of other cleanup functions - for example resetting the synthesizer channels and variables of the system ready for re-use by a subsequent Play message.
  • Flowchart 4E shows what happens when a trigger message is received, either in response to a user interacting through the GUI, or a message arriving from a game or other application program.
  • Trigger messages are designed to produce a sound effects (or parts of sound effects) of finite length (typically a few seconds or less) which stop of their own accord after initiation. Thus they act basically like a Play with an automatic Stop, and this flowchart should be self-explanatory in the light of the descriptions above.
  • the process management is one function of the TSCP introduced above and shown in FIG. 2. It mediates both commands destined for the synthesizer and the process callback requests.
  • the interrupts that system timers generate for their decrement and test cycle make them an expensive resource. If each of possibly many concurrent sound process were requesting callbacks from the system directly, servicing the interrupts could significantly degrade performance. Instead, a time sorted queue of processes awaiting callbacks is maintained by the process manager, and a system timer is engaged only for the next sound process to run.
  • a similar mechanism is used for commands sent to the process manager time-stamped for output to the synthesizer. Although each sound process generates commands in chronological order, when many processes are running concurrently, the process manager inserts commands in a sorted queue, and sets up a timer only for the next one scheduled to go out.
  • Some commands such as interactive controller messages, are serviced immediately rather than queued.
  • the back-end synthesizer uses these commands to influence any new events that the process manager sends, even if the events were queued before the "crash through" controller update.
  • user parameters have a latency that can be as great as the callback period for a sound process, but parameters that control the synthesis have a shorter control-rate latency.
  • the process manager is also responsible for allocating synthesizer resources to sound processes and interfacing with specific back-end synthesizer API's.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

Ce système de bruitage permet des bruitages obtenus à partir de modèles paramétrés d'événements sonores tels que les pas, un tremblement de terre, etc... Les caractéristiques comportementales de l'événement sont prises en considération lors de la création des modèles de façon à assurer un certain réalisme. Etant donné que les bruitages sont obtenus à partir de modèles, le système permet à l'utilisateur une souplesse et une interactivité accrues. En gérant le jeu de paramètres, l'utilisateur peut intervenir en temps réel sur le système. Ce système est polyvalent en ce qu'il est capable de produire une large gamme de bruitages, et qu'il n'est pas limité à la production de bruitages particuliers ou d'une classe de bruitages.
PCT/SG1998/000074 1997-09-23 1998-09-22 Systeme de bruitage interactif et bruitage a partir d'un modele WO1999016049A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2000513267A JP2001517814A (ja) 1997-09-23 1998-09-22 音響効果システム
AU92924/98A AU9292498A (en) 1997-09-23 1998-09-22 Interactive sound effects system and method of producing model-based sound effects
EP98945749A EP1019901A1 (fr) 1997-09-23 1998-09-22 Systeme de bruitage interactif et bruitage a partir d'un modele

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG9703500-0 1997-09-23
SG1997003500A SG73470A1 (en) 1997-09-23 1997-09-23 Interactive sound effects system and method of producing model-based sound effects

Publications (1)

Publication Number Publication Date
WO1999016049A1 true WO1999016049A1 (fr) 1999-04-01

Family

ID=20429738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG1998/000074 WO1999016049A1 (fr) 1997-09-23 1998-09-22 Systeme de bruitage interactif et bruitage a partir d'un modele

Country Status (5)

Country Link
EP (1) EP1019901A1 (fr)
JP (1) JP2001517814A (fr)
AU (1) AU9292498A (fr)
SG (1) SG73470A1 (fr)
WO (1) WO1999016049A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1064974A2 (fr) * 1999-06-30 2001-01-03 Konami Co., Ltd. Système de jeu et support d'enregistrement lisible par ordinateur
WO2001033543A1 (fr) * 1999-11-02 2001-05-10 Laurent Clairon Procedes d'elaboration et d'utilisation d'une sonotheque representant les caracteristiques acoustiques de moteur de vehicule automobile, dispositifs pour mise en oeuvre
US7203327B2 (en) * 2000-08-03 2007-04-10 Sony Corporation Apparatus for and method of processing audio signal
US7330769B2 (en) * 2001-05-15 2008-02-12 Nintendo Software Technology Corporation Parameterized interactive control of multiple wave table sound generation for video games and other applications
WO2009039636A1 (fr) * 2007-09-28 2009-04-02 Ati Technologies Ulc Synthèse sonore interactive
GB2475096A (en) * 2009-11-06 2011-05-11 Sony Comp Entertainment Europe Generating a sound synthesis model for use in a virtual environment
EP2468371A1 (fr) * 2010-12-21 2012-06-27 Sony Computer Entertainment Europe Ltd. Procédé et appareil de génération de données audio
US10453434B1 (en) 2017-05-16 2019-10-22 John William Byrd System for synthesizing sounds from prototypes
EP3843422A3 (fr) * 2019-12-27 2021-10-13 Harman International Industries, Incorporated Systèmes et procédés permettant de régler des paramètres de commande d'activité
US11721317B2 (en) * 2017-11-29 2023-08-08 Queen Mary University Of London Sound effect synthesis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5146833A (en) * 1987-04-30 1992-09-15 Lui Philip Y F Computerized music data system and input/out devices using related rhythm coding
US5267318A (en) * 1990-09-26 1993-11-30 Severson Frederick E Model railroad cattle car sound effects
EP0654779A2 (fr) * 1993-11-24 1995-05-24 Paolo Podesta Système multimédia pour la commande et la génération de musique et d'animation en temps réel
EP0764934A1 (fr) * 1995-09-20 1997-03-26 Yamaha Corporation Instrument de musique numérique avec traitement de forme d'onde pour générer un effet sonore

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5146833A (en) * 1987-04-30 1992-09-15 Lui Philip Y F Computerized music data system and input/out devices using related rhythm coding
US5267318A (en) * 1990-09-26 1993-11-30 Severson Frederick E Model railroad cattle car sound effects
EP0654779A2 (fr) * 1993-11-24 1995-05-24 Paolo Podesta Système multimédia pour la commande et la génération de musique et d'animation en temps réel
EP0764934A1 (fr) * 1995-09-20 1997-03-26 Yamaha Corporation Instrument de musique numérique avec traitement de forme d'onde pour générer un effet sonore

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1064974A2 (fr) * 1999-06-30 2001-01-03 Konami Co., Ltd. Système de jeu et support d'enregistrement lisible par ordinateur
EP1064974A3 (fr) * 1999-06-30 2001-12-12 Konami Co., Ltd. Système de jeu et support d'enregistrement lisible par ordinateur
WO2001033543A1 (fr) * 1999-11-02 2001-05-10 Laurent Clairon Procedes d'elaboration et d'utilisation d'une sonotheque representant les caracteristiques acoustiques de moteur de vehicule automobile, dispositifs pour mise en oeuvre
US7203327B2 (en) * 2000-08-03 2007-04-10 Sony Corporation Apparatus for and method of processing audio signal
US7330769B2 (en) * 2001-05-15 2008-02-12 Nintendo Software Technology Corporation Parameterized interactive control of multiple wave table sound generation for video games and other applications
WO2009039636A1 (fr) * 2007-09-28 2009-04-02 Ati Technologies Ulc Synthèse sonore interactive
GB2475096A (en) * 2009-11-06 2011-05-11 Sony Comp Entertainment Europe Generating a sound synthesis model for use in a virtual environment
EP2468371A1 (fr) * 2010-12-21 2012-06-27 Sony Computer Entertainment Europe Ltd. Procédé et appareil de génération de données audio
US10453434B1 (en) 2017-05-16 2019-10-22 John William Byrd System for synthesizing sounds from prototypes
US11721317B2 (en) * 2017-11-29 2023-08-08 Queen Mary University Of London Sound effect synthesis
EP3843422A3 (fr) * 2019-12-27 2021-10-13 Harman International Industries, Incorporated Systèmes et procédés permettant de régler des paramètres de commande d'activité

Also Published As

Publication number Publication date
EP1019901A1 (fr) 2000-07-19
JP2001517814A (ja) 2001-10-09
AU9292498A (en) 1999-04-12
SG73470A1 (en) 2000-06-20

Similar Documents

Publication Publication Date Title
US10134179B2 (en) Visual music synthesizer
US5952599A (en) Interactive music generation system making use of global feature control by non-musicians
Zicarelli M and jam factory
Jordà FMOL: Toward user-friendly, sophisticated new musical instruments
JP2002515987A (ja) リアルタイム音楽作成システム
Casella et al. Magenta: An architecture for real time automatic composition of background music
EP1019901A1 (fr) Systeme de bruitage interactif et bruitage a partir d'un modele
Schertenleib et al. Conducting a virtual orchestra
KR20230173680A (ko) 가상 현실 환경에서의 공연을 위한 시스템 및 방법
Zadel et al. Different strokes: a prototype software system for laptop performance and improvisation
Vets et al. Gamified music improvisation with BilliArT: a multimodal installation with balls
Mitchusson Indeterminate Sample Sequencing in Virtual Reality
WO2000045387A1 (fr) Procede pour etiqueter un son ou une representation de celui-ci
Pachet et al. MusicSpace: a Constraint-Based Control System for Music Spatialization.
Puckette Four surprises of electronic music
Hamilton Perceptually coherent mapping schemata for virtual space and musical method
Plut The Audience of the Singular
Wyse et al. Embedding interactive sounds in multimedia applications
Eigenfeldt The creation of evolutionary rhythms within a multi-agent networked drum ensemble
Winkler Strategies for interaction: Computer music, performance, and multimedia
Shepardson et al. The Living Looper: Rethinking the Musical Loop as a Machine Action-Perception Loop
Yu Computer generated music composition
Plut Application and evaluation of affective adaptive generative music for video games
Zadel A software system for laptop performance and improvisation
Eldridge Fond Punctions: Generative processes in live improvised performance

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 92924/98

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 1998945749

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09509120

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1998945749

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1998945749

Country of ref document: EP