WO2000045387A1 - Procede pour etiqueter un son ou une representation de celui-ci - Google Patents

Procede pour etiqueter un son ou une representation de celui-ci Download PDF

Info

Publication number
WO2000045387A1
WO2000045387A1 PCT/SG1999/000010 SG9900010W WO0045387A1 WO 2000045387 A1 WO2000045387 A1 WO 2000045387A1 SG 9900010 W SG9900010 W SG 9900010W WO 0045387 A1 WO0045387 A1 WO 0045387A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
parameters
model
label
parameter
Prior art date
Application number
PCT/SG1999/000010
Other languages
English (en)
Inventor
Lonce Lemar Wyse
Gurminder Singh
Original Assignee
Kent Ridge Digital Labs
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kent Ridge Digital Labs filed Critical Kent Ridge Digital Labs
Priority to PCT/SG1999/000010 priority Critical patent/WO2000045387A1/fr
Priority to AU32843/99A priority patent/AU3284399A/en
Priority to JP2000596565A priority patent/JP2002536680A/ja
Publication of WO2000045387A1 publication Critical patent/WO2000045387A1/fr

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B31/00Arrangements for the associated working of recording or reproducing apparatus with related apparatus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B23/00Record carriers not specific to the method of recording or reproducing; Accessories, e.g. containers, specially adapted for co-operation with the recording or reproducing apparatus ; Intermediate mediums; Apparatus or processes specially adapted for their manufacture
    • G11B23/30Record carriers not specific to the method of recording or reproducing; Accessories, e.g. containers, specially adapted for co-operation with the recording or reproducing apparatus ; Intermediate mediums; Apparatus or processes specially adapted for their manufacture with provision for auxiliary signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B23/00Record carriers not specific to the method of recording or reproducing; Accessories, e.g. containers, specially adapted for co-operation with the recording or reproducing apparatus ; Intermediate mediums; Apparatus or processes specially adapted for their manufacture
    • G11B23/38Visual features other than those contained in record tracks or represented by sprocket holes the visual signals being auxiliary signals
    • G11B23/40Identifying or analogous means applied to or incorporated in the record carrier and not intended for visual display simultaneously with the playing-back of the record carrier, e.g. label, leader, photograph

Definitions

  • This invention relates to a method of labelling a sound.
  • Multimedia documents are in many cases a combination of separately produced multimedia events, such as visual, video events and corresponding audio events. It is often desired to locate a particular event in a multimedia document, such as video footage and/or the corresponding sound effect, for example of a man walking on sand.
  • One method of providing this information is for the multimedia document to be reviewed and labels of event, to be manually entered to provide a database of such events.
  • an apparatus for labelling a sound or a representation thereof comprising a sound generator capable of generating a family of sounds or their representations by selection of values of parameters of a sound model, at least some parameter values being associated with descriptive labels whereby selection of the value automatically selects the corresponding label.
  • a method of labelling a sound or a representation thereof comprising the steps of: selecting a sound or representation by selection of values of parameters of a sound model, at least some parameter values being associated with descriptive labels whereby selection of a value automatically selects a corresponding label, generating the sound or representation as a file and associating the file with the label .
  • the values of each parameter are divided into a plurality of ranges, a label being associated with each range and the value labels are preferably combined with a model label in a grammatical structure whereby the value label (s) qualify the model label description, as adjectives or adverbs, for example.
  • the sound or representation thereof may be in the form of a digital audio file, analog audio file, control codes for a synthesizer or in the form of the selected parameters values for the model, for example.
  • FIG. 1 is block diagram illustrating an interactive sound effect system with which the present invention may be used.
  • FIG. 2 is a functional block diagram illustrating the logical structure of the system Fig. 1.
  • FIG. 3 illustrates a graphic user interface showing user parameters for the sound effect "footsteps" where the user parameters are represented in the form of graphical sliders.
  • FIG. 4 illustrates the parameter structure employed by the system of Fig. 1.
  • FIGS. 5-13 are tables charting the parameters used for actual sound models.
  • the present invention to be described is concerned with the labelling of sounds produced by a sound model.
  • a sound model -based system for generating sounds from sound models will f rst be described, although it will be appreciated by those skilled in the art that the present invention is applicable to any parameter- adjustable model system.
  • the overall system can be conceptualized as several layers, the top layer being applications that will use the sounds.
  • a graphical user interface is just one such application (though a rather special one).
  • the next layer is the collection of algorithmic sound models. These are objects that encapsulate data and describe the sound behaviour.
  • the models provide the interface which applications use to control the sounds by playing, stopping, and sending messages or by updating parameters. Sound models generate commands and pass them at the proper time to the synthesizer.
  • the synthesizer is the back-end where the sample by sample audio waveforms are produced, mixed, and sent to the audio output device.
  • FIG. 1 is a block diagram illustrating the main elements of the system.
  • a central processing unit 1 is connected to random access memory (RAM) 2, one or more input devices such as keyboard 3, mouse 4, joystick 5, MIDI controller 6; a visual display device 7; a sound synthesizer 8; and audio output system including digital-to-analog converter (DAC) 9, amplifier 10, loudspeaker or headphone 11 (or alternatively, a soundcard integrating some of these individual devices can be used); and a nonvolatile storage device such a hard disk 12.
  • RAM random access memory
  • input devices such as keyboard 3, mouse 4, joystick 5, MIDI controller 6
  • a visual display device 7 such as a touch screen
  • DAC digital-to-analog converter
  • amplifier 10 loudspeaker or headphone 11
  • a nonvolatile storage device such a hard disk 12.
  • SFX program which consists of a sound effects software engine (SFX engine) 13 and optionally a graphical user interface program (GUI) 14. in one mode of operation the GUI
  • the CPU 1 executes the stored instructions of the programs in memory, sharing its processing power between the SFX engine 13, the GUI 14 or controlling program 15, the operating system 16 and possibly other programs according to a multi -tasking scheme such as the well known timesl icing technique, under command of the SFX engine, the CPU delivers a stream of commands to the sound synthesizer 8, which produces digital audio in response to these commands.
  • the output of the synthesizer is a digital audio signal which is converted to an analogue form by the digital to analogue converter (DAC) 9, then amplified and delivered to the user by means of the amplifier 10 and loudspeaker or headphones 11.
  • the digital audio signal may also be delivered back to the CPU allowing it to be further processed or stored as a sound file for later retrieval.
  • the hard disk or other nonvolatile storage 12 provides means to store indefinitely the following items:
  • the SFX program itself including the data and instructions representing multiple sound effects models.
  • sound files comprising digital audio output from the synthesizer 8 under control of the SFX program.
  • the SFX engine 13 is controlled directly by means of the GUI 14, or from an external controlling program such as a computer game 15 or, rarely, by both at the same time.
  • an external controlling program such as a computer game 15 or, rarely, by both at the same time.
  • the GUI program 14 uses the display device 7 to provide to the user with visual information on the status of the SFX engine 13 including which sound effects models are currently invoked, the structure of these sound models, the settings of their parameters, and other information.
  • MIDI Musical instrument Digital Interface
  • the display device 7 when the control is by means of a pointing device, the display device 7 also provides feedback to the user on the logical position of the pointing device in the usual manner. By observing the display 7 and/or listening to the audio output while manipulating the input devices 3 through 6, the user is able to alter sound effects until satisfied with the results.
  • This mode of operation is designed to allow the user to create specific sound effects according to his/her needs from the generic sound effects models of the SFX system, by selecting sound effects models, initiating or triggering them to produce audio output from the system, adjusting the parameters of the models, selecting elements of models, and other actions.
  • the SFX engine 13 is under the control of an external controlling program 15, such as a computer game, the program of a network resident information site (website), a virtual reality program, a video editing program, a multimedia authoring tool, or any other program which requires sound effects, in this mode the user interacts with the controlling program 15 by means of the input devices 3 through 6 and the display device 7.
  • the SFX engine 13 acts as a slave to the controlling program 15, producing sound effects under its control. This is achieved by allowing the controlling program to send data to the SFX engine 13, this data being interpreted by the SFX engine as controlling messages.
  • the SFX engine will typically not be visible to the user on the display 7, and will be controllable by the user only indirectly via aspects of the controlling program which influence the SFX engine.
  • the manner and degree of control which the user has over the SFX engine is entirely a function of the controlling program and is decided by the designer of the controlling program.
  • the logical structure of the present system is shown in Fig 2.
  • the main elements are the SFX engine 1 which, as described above, may be under control of the GUI 14 or, in the alternative mode of operation, under control of an external controlling program 15. Also shown is the synthesizer 8 which leads to the audio output system. These elements are the same as the corresponding elements of Fig 1, but are here shown in a way which highlights their logical interrelationships.
  • the user controls the system by means of the GUI 14, which acts to accept user input (such as keystrokes of the computer keyboard or movements of a pointing device) and to inform the user both of the status of the system and of the effect of his/her actions, user actions which affect the production of sound effects generate control messages which are sent from the GUI to the SFX Engine 13 in order to initiate, terminate, and control sound effects.
  • These messages are in a format determined by the SFX Engine and known to the GUI.
  • the SFX engine 13 models the behaviour of the currently active sound effects and generates a stream of events or commands which are sent to the synthesizer 4, which in turn generates the audio output.
  • Certain information affecting the manner of display to be used by the GUI 14 is contained within the SFX engine 13 for example the manner in which the control parameters of a sound effects model should be displayed varies from one model to another, and the information about the currently active sound effects models is held by the SFX engine.
  • the SFX engine 13 there is a need for information to be returned from the SFX engine to the GUI, and this is achieved by allowing the SFX engine to send display information to the GUI or allowing the GUI to elicit display information from the SFX engine.
  • the user interacts with the external controlling program 15 in a manner which is completely independent of the invention.
  • the controlling program 15 sends control messages to the SFX engine 13 in order to initiate, terminate, and control sound effects. These messages are in a format determined by the SFX Engine and known to the controlling program, and typically are similar to, or a subset of those used by the GUI in the first mode of operation described above.
  • the main purpose of the SFX engine 13 is to model the behaviour of the currently active sound effects and generate a stream of events or commands which are sent to the synthesizer 8, which in turn generates the audio output.
  • the main internal elements of the SFX engine 13 are a set of interactive sound effects models (SFX models) 20; an Application Programmer interface (API) 17; a Message Processor 18; a Parameter Linker/Mapper 19 and a Timing and Synthesizer Command Processor (TSCP)21.
  • SFX models interactive sound effects models
  • API Application Programmer interface
  • Message Processor 18 a Message Processor 18
  • Parameter Linker/Mapper 19 a Parameter Linker/Mapper 19
  • TSCP Timing and Synthesizer Command Processor
  • each model consists of data and programmed instructions representing the sound characteristics and behaviour of a sound effect, or a class of sound effects.
  • SFX models interactive sound effects models
  • Each SFX model is provided with one or more control parameters which may be used to alter the sound produced by the SFX model, and these control parameters may also be modified in real time to produce audible changes in the output while the system is producing sound effects, in certain cases compound sound effects models may be made up of other sound effects models arranged in a hierarchy consisting of any number of levels, thus enabling arbitrarily complex models to be built from a number of simpler models.
  • the Application Programmer Interface (API) 17 receives data which is interpreted by the SFX engine as controlling messages, these messages arriving from either the GUI 14 or the external controlling program 15.
  • the API decodes the messages in order to establish which type of message has been sent, and forwards the messages to the Message Processor 8.
  • the Message Processor 8 performs actions as directed by the controlling messages, including starting and stopping particular sound effects, loading and unloading sound effects models from RAM, applying the effect of modifications of control parameters to the SFX models, modifying settings of the SFX engine which influence its overall behaviour, and otherwise controlling the SFX Engine.
  • a Parameter Linker/Mapper 19 provides a means of endowing SFX models with one or more alternative sets of control parameters or metaparameters , where these metaparameters are linked to the original control parameter set of the SFX model or to other metaparameters in a hierarchy of parameters.
  • the Linker/Mapper 19 also provides means of applying mathematical transformations to the values of control parameters and metaparameters.
  • the Parameter Linker/Mapper 19 is useful because the original control parameters of a particular SFX model are not necessarily the most appropriate or useful in every case, for example when the SFX engine is being controlled by an external controlling program 15 which has its own design constraints, or when the SFX model forms part of a compound SFX model as described above.
  • the Timing and Synthesizer Command Processor (TSCP) 21 provides a number of functions related to timing and to the processing of events and other commands to be sent to the synthesizer 4.
  • the invention is not restricted to any particular method of synthesis, and details of this element depend significantly on the type and design of the synthesizer. However two general functions may be identified:
  • the SFX engine operates by producing a stream of commands such as MIDI commands which are delivered to the synthesizer in order to produce sounds, and typically this process occurs in real time.
  • Most synthesizers operate by producing or modifying the output sound at the moment an event or command is received.
  • a simple implementation of the SFX engine might therefore produce synthesizer commands only at the moment they are required by the synthesizer, but this is liable to timing disruption because the CPU may be unable to process the complex command stream of multiple SFX models quickly enough to avoid audible disruption of the output sound.
  • a more sophisticated implementation can achieve greater consistency of timing by generating the commands a short interval ahead of the current time, queuing them in a mechanism such as a data buffer, and delivering them to the synthesizer at the appropriate time.
  • the TSCP provides this function in such a way that the interval by which commands are generated ahead of the current time may be adjusted to an optimum value which may also be set differently for different SFX models.
  • the optimum is a compromise between the need to avoid timing disruption and the need to make the system responsive to changes in its control parameters.
  • each synthesizer channel is set up to create different sound elements of the sound effects.
  • these channels are a limited resource and must be managed carefully, for example allocated dynamically upon demand.
  • the TSCP acts as a synthesis channel manager.
  • one purpose of the hard disk or other nonvolatile storage (12 in Fig 1) is to provide a means to store indefinitely the settings of parameters and other variable elements of the SFX program.
  • Such parameters and other elements may be saved while the system is in the mode where it is being controlled directly by a user using the GUI, then recalled when the system is in the alternative mode under control of an external controlling program.
  • This allows a user to experiment directly with the parameters of the sound effects using the GUI, save the set of values of the parameters found to be most appropriate to the application, then recall this same set of values while the SFX engine is under control of the external controlling program in order to have the system produce an identical or near identical sound effect.
  • Saving and recalling a sound effect in this way differs from saving and recalling a digital audio signal of the sound effect in that it is entirely based on a model of the sound effect and may therefore by altered after it has been recalled by means of changing its parameters.
  • the sound models may be modelled closely on the physics or produces realistic sound, and responds in realistic and/or predictable ways to parameter changes.
  • the sound effects models may be assigned a set of control parameters deemed most important or appropriate to the particular sound effect in question, these being closely related to the behaviour characteristics of the sound generating phenomenon being modelled.
  • This set of parameters may include parameters unique to the particular model, parameters that are generic to sets of similar models, and parameters that are generic to all models of the system. For example a model of human footsteps might have a parameter for walking style which would be unique to this model, another parameter for walking speed which would be common to all human and animal footstep models, and other parameters such as volume or reverberation depth common to all models.
  • the system can include models which are programmed with realistic simulations of naturally occurring sound producing entities, other sound effects which are exaggerated in character for dramatic effect, and other sound effects of a purely imaginative nature which have no counterpart in the real world, in the case of realistic simulations and exaggerations of real sounds, the sound effects models may be modelled to any chosen degree of precision on the behaviour of their naturally occurring counterparts, so that the sound effects models will automatically provide accurate reproductions of the sounds, sound sequences or other audible characteristics of their naturally occurring counterparts.
  • the system can also support "Compound Sounds”: these are sound models consisting of a hierarchy of other sound models with any number of levels in the hierarchy. Typically they may represent an entire scene consisting of many sonic elements. At the top level the user can make changes to the whole scene (e.g., changing the overall volume), but control over individual elements is also possible, and these lower level elements can optionally be isolated (listened to "solo") when making adjustments to them.
  • Compound Sounds these are sound models consisting of a hierarchy of other sound models with any number of levels in the hierarchy. Typically they may represent an entire scene consisting of many sonic elements. At the top level the user can make changes to the whole scene (e.g., changing the overall volume), but control over individual elements is also possible, and these lower level elements can optionally be isolated (listened to "solo") when making adjustments to them.
  • the generator includes generic support for "parameter linking" in which parameters may be linked to combinations of other parameters according to mathematical relationships; this allows, for example, high level parameters to be used to make broad sweeping changes in multiple lower level parameters, or to apply scaling to other parameters, or to make complex sets of changes in several other parameters.
  • the system can introduce fluctuations (typically of a random or semi -random nature) into the sounds produced in order to avoid exact repetition and achieve a natural effect.
  • fluctuations typically of a random or semi -random nature
  • Techniques for introducing fluctuations include:
  • the system generates the stream of commands to the synthesizer a short interval ahead of the current time, this interval being set such that it is long enough to overcome potentially audible disruption of the sound output which would occur if time critical commands were generated at the moment they are required, but short enough that the system responds to changes in its control parameters after an imperceptible or near imperceptible delay.
  • the system provides two modes of triggering, in one mode, the sound effects, typically of a continuous, evolving, or repetitive nature, will once started run continuously until explicitly stopped, in the other mode, the sound effects, typically of a short, non-continuous nature, are triggered each time they are required, thus allowing precise synchronization with visual events in a computer game, film, video production, or animation.
  • the system includes generic sound effects models in which the behaviour of a class of sound effects is encoded, and which provides a method by which a user of the system can create specific sound models by selecting options of the generic models, setting the values of variables of the generic models to specific values, and providing the synthesizer with its own samples.
  • the sound models consist of the interface functions, the parameters for external control, private data for maintaining state while the process is suspended to share CPU time, indexes into the bank wave tables or synthesis data the model uses and the event generating code.
  • the sound models are arranged as an object oriented class hierarchy, with many sound classes being derived directly from the base class. This structure is due to the fact that there are many attributes and methods common to all sounds (e.g. location, volume), while most other attributes are common to one model, or shared with other models that otherwise have little in common (e.g. surface characteristics of footsteps) .
  • the sound models have a compute ahead window of time which is the mechanism by which the model share the CPU. This window can be different for different sound models, and is usually in the range of 100-300 milliseconds.
  • the sound model process is called back at this rate, and computes all the events up to and slightly beyond the next expected callback time. The events are time-stamped with their desired output times, and sent to the output manager.
  • the first is parameter presets. These come from the frequent need to use a model with several different distinct parameterizations.
  • GUI graphical user interface
  • a rate parameter is meaningless.
  • the present system provides for either or both methods of control, if event by event control is needed, the model's standard play function is not invoked, but the object provides a massaging interface which is used instead. All the other support (e.g. statistical variability of successive sounds, parameter control) is still available. For sound models that use rate to control other attributes, a meaningful rate must be measured from the event triggers.
  • a more complex issue of control can be illustrated with an applause model which, for example, is to be controlled in realtime using a parameter for the number of clappers.
  • the parameter would typically start at 0, be driven up to a level corresponding to how many people are in the virtual audience, remain at that level for some time, then gradually decay back to zero.
  • an application may not need such intimate control. It may be preferable to simply specify the number of people and an "enthusiasm" level (a "metatime” parameter) that could in turn affect the temporal envelope of the "number of people” parameter.
  • the application would only have to concern itself with the "enthusiasm” parameter when (or before) the applause sound is initiated.
  • the two methods of control are mutually exclusive.
  • the applause example is different from the footsteps example because with footsteps, both types of control discussed (individual footsteps vs. rate) are realtime.
  • the contrasting methods of control in the applause example are between a metatime specification of a temporal trajectory, and real time control of the trajectory. It is believed that the most useful way to support these control choices is to record parameter trajectories created by the developer using the GUI, and then use the trajectories during playback after a trigger event from the application.
  • the present method and system produce sound effects which simulate sounds associated with a certain phenomenon.
  • Some examples of a phenomenon are footsteps, earthquake, running ai rconditioner, bouncing ball, moving car, etc.
  • the phenomenon can be virtually anything so long as there are some sounds with which it is associated. indeed, the phenomenon need not even necessarily be a real life phenomenon in the sense that it does not have to actually exist in the real world. For instance, the phenomenon could be a firing of a futuristic phaser gun. Although such a gun may not currently exist (hence the phenomenon cannot exist), this fact is irrelevant so long as there is some perception about what the sounds associated with the phenomenon might be like or what might be acceptable to the listeners. It is also useful to have some perception about how the sounds would vary depending on various hypothetical factors.
  • the sound modefing process begins by identifying the behavioural characteristics associated with the particular sound phenomenon which are relevant to the generation of sound.
  • Behavioural characteristics can be defined as the set of properties which a naive listener would perceive as distinguishing the sound effect from other sound effects, including those which define how it changes or evolves in response to different conditions impinging upon it. in many cases, the characteristics bear a one to one correspondence to the terms a layman would use to describe the sound effect.
  • the behavioural characteristics are properties which a naive listener might expect such an object or phenomenon to possess if it did exist.
  • the behavioural characteristics would include things such as speed, degree of limp and stagger, weight (of the person producing the footsteps), surface type (e.g., cement, grass, mud), location (i.e., position relative to the listener), surrounding acoustic, etc. It can be easily appreciated that these characteristics define the sound for a particular set of conditions. For instance, the sounds produced from footsteps from a mad dash would be different from those produced in a casual stroll; footsteps on hard marble would sound differently than footsteps on wet mud.
  • the sound being generated from footsteps results mainly from two separate events, the impact of the heel hitting a surface, then shortly after, the impact of the toe hitting a surface.
  • the totality of the sound generated results from the heel -toe action of one foot followed by the heel -toe action of the other foot, and so on.
  • the time interval between the sound produced from the heel -toe action of one foot and heel -toe action of the other foot decreases.
  • the heel-to-toe time is affected by another parameter, when marching, the leg falls rapidly and perpendicular to the ground, and thus, the heel-to-toe time is very short, in contrast, a long stride produces a long heel-to-toe time because the heel touches the ground while the leg is far from perpendicular and the toe has a relatively long distance to travel before it touches the ground.
  • the heel- to-toe time is the net result of both walking speed and "walk style" (march versus long stride).
  • the general principle is that the internal parameters of the model may be influenced by many of the external or "user" parameters in mathematical relationships of arbitrary complexity.
  • this knowledge about the mechanics of sound generation is important for two main reasons when attempting sound modelling. First, it allows one to vary the correct set of parameters so as to produce the most realistic sound effect. For instance, in the footstep example given above, the resulting sound effect would not sound very realistic had someone varied the time interval between the sound produced from one heel -toe action to another without also proportionately varying the time interval between the heel and the toe.
  • the second main reason for analysing the mechanics is that it allows one some notion of the size and type of sound sample that is needed.
  • behavioural characteristics of any particular phenomenon there is a rather large range of behavioural characteristics of any particular phenomenon, and the choice of selection and the extent of the analysis of these behavioural characteristics depend largely upon the potential uses of the sound effects, the nature of the sound producing phenomenon, and degree of realism desired.
  • some identification and understanding of the behavioural characteristics of any phenomenon is required to model a sound effect properly.
  • the sample may be obtained either by recording a sample segment of actual sound found in a real world phenomenon (or simply taking a segment from some existing prerecording) or by producing a segment through well known synthesis techniques, whichever is more convenient or desirable given the particular sound effect being modelled.
  • the choice of the length of the sound samples depends on a number of factors. As a general rule, the smaller the sample, the greater the flexibility. On the flip side, the smaller the sample, the greater the labour and the harder it is to achieve realism.
  • a good rule of thumb is to have a sample which is as long as possible without loss of useful flexibility, that is, where most of the perceptual range of sonic possibilities of the equivalent sound in real life can be achieved by varying the user parameters of the model. For instance, in the case of the footsteps, if one were to want to produce footsteps of different speeds, it would be necessary to obtain a set of samples including heel sounds and toe sounds, for the reasons provided above. However, this does not always mean that one needs to record the two sounds separately since the current editing techniques allow for splicing and other forms of editing to separate a single recording into multiple samples. But the splicing technique may be difficult or impossible for cases where the sounds overlap.
  • the choice of the sound samples also depends on the behavioural characteristics of the phenomenon to some extent, and also on the limitation of the parameters (parameters are discussed in detail below), using the footsteps example once again, it should be noted that some sound effects do not require additional samples while some do. For instance, to vary the style of a walk, only the timing needs to be varied, and hence, this can be done with any existing sample. However, to vary the surface on which the footsteps are made, it is easier to simply obtain a sample of footsteps on each of the surfaces rather than attempting to manipulate an existing sample to simulate the effect. For example, it would not be easy to produce a sound of a footstep on a soft muddy surface using only a sample of footsteps on a concrete surface. How many samples are needed for a given phenomenon, of course, depends on the scope and the range of sound effects one desires, and varies greatly from one sound effect to another.
  • a continuous spectrum can often be simulated by collecting points along the spectrum.
  • a "strength of Laughter' parameter may be constructed by collecting a set of laugh samples at different "degrees of hilarity", then selecting individual samples according to the setting of the "Strength of Laughter” parameter.
  • this technique is combined with the random selection described above.
  • the parameters represent the various factors which need to be controlled in order to produce the modelled sounds.
  • the parameters can be structured in a number of ways to effect a sound effect, in this system, it is useful to view the parameter structure as illustrated in FIG. 4.
  • the top layer consists of the user parameters which are the interface between the user and the sound effects system.
  • the middle layer consists of the parameters employed by the SFX engine, or simply referred to as "engine parameters.”
  • the bottom layer consists of the synthesizer parameters which are well known parameters found in any of the current sound or music synthesizers.
  • each of the user parameters affects a combination of engine parameters and synthesizer parameters, though, in simpler cases, a user parameter may control only synthesizer parameters or engine parameters. Any combination of engine and synthesizer parameters is theoretically possible; however, the way in which they are combined will depend on how the user parameter is defined in light of behavioural characteristics of a particular phenomenon, as shall be explained in detail below.
  • the user parameters are defined in terms of the desired sound effect.
  • the user parameters can be location, walking speed, walking style, limp, weight, hardness, surface type, etc.
  • these parameters can be defined in virtually any manner, it is often most useful if they directly reflect the behavioural characteristics of a phenomenon and the purpose for which the sound effect is being produced, in many cases, they are the obvious, easily understood parameters that a layman might use to describe the sound.
  • a user parameter such as surface type might be a useful parameter for the phenomenon footsteps, it probably would not be useful for a phenomenon such as earthquake, given that surface type probably has no meaning in the context of an earthquake.
  • the user parameters can be represented in a number of ways so as to give control access to the user. However, in this system, it is represented in the form of "sliders" on a graphic user interface (GUI), FIG. 3, where the user can slide the slider bar to control the magnitude of the effect. For instance, for the speed slider for the phenomenon footsteps, the walking speed is increased as the slider is moved to the right. For the limp slider, the amount of limp in the walk is increased as the slider is moved. To combine the effects, several user parameters can be invoked at once. For instance, by invoking both the speed slider and the limp slider, one can achieve any combination of limp and speed. Some combinations are obviously not desirable, though may be possible.
  • GUI graphic user interface
  • the middle layer parameters, or engine parameters, and the bottom layer parameters, or synthesizer parameters work in combination to produce the sound effects as defined by the user parameters.
  • the bottom layer parameters can include sound manipulation techniques such as volume control, pan, pitch, filter cut off, filter Q, amplitude envelope, and many others which are well known to those skilled in the art.
  • the middle layer can be viewed as the layer which "models" the sound using the basic sound manipulation parameters provided by the bottom layer.
  • the middle layer parameters are complex, the parameters is can broadly be classified as timing, selecting, and patterning. Although these parameters are defined here as being separate and distinct, it should understood by those skilled in the art that these parameter representations are conceptual tools to illustrate the sound modelling process or techniques employed by the SFX engine and need not necessarily exist as separate and distinct components in the present sound effects system.
  • timing parameters basically control the length of the time intervals between triggering and stopping pieces of sound within a particular sound effect, and time intervals between other commands sent to the synthesizer.
  • the selecting parameters control which sound samples are selected at a given moment, including the order in which samples are selected.
  • the patterning parameters control the relationships between these factors.
  • the footsteps example once again, described above were the behavioural characteristics of footsteps in relation to speed. It was explained that as the speed increases, the time interval between one heel -toe action to another decreases, as well as the time interval between the heel and the toe.
  • the user parameter (top layer) is speed. As the user adjusts the speed slider to increase speed, the time parameter is made to decrease the time interval between one heel -toe action to another, as well as the time interval between the heel and the toe. This timing is also affected by the "style" parameter as described above. However, the pattern or behaviour of the footsteps does not change as speed and style are altered. A heel sound is always followed by a toe sound, etc.
  • the behavioural characteristics are more complicated, and hence, additional parameters need to be involved.
  • the timing parameter needs to be adjusted such that the mean of the time intervals between the events becomes shorter, reflecting the fact that the time intervals between impacting of the hooves to a given surface become shorter on average.
  • the patterning and ordering of the events change as the horse switches between walking, trotting, cantering and galloping. The exact pattern, of course, needs to be determined empirically using the behavioural characteristics of an actual horse.
  • the only class of engine parameters affected are those concerned with selection, since the timing and patterning aspects do not change.
  • different sets of samples will be selected, but the timing and patterning do not change.
  • synthesizer parameters will have to be invoked either in isolation or in combination with engine parameters.
  • the synthesizer parameters, pitch, volume, etc. need to be controlled in response to the user parameter, weight (of the person making the footsteps), since typically a heavier person would produce footsteps which are deeper in pitch, louder, etc. (though this may not always be true in real life).
  • weight of the person making the footsteps
  • the behaviour characteristics will have some bearing on the choice of the synthesizer parameters to be used, there is no hard and fast rule as to how these parameters should be selected.
  • FIGS. 5 through 13 are tables charting the various parameters that are used for some actual sound models. Taking the table in FIG. 5 as an illustrative example, the first column lists the user parameters plus "random fluctuations" (see above for description for "random fluctuations").
  • the subsequent columns have a heading at the top showing the engine and synthesizer parameters, the engine parameters comprising the first three columns subsequent to the user parameter column.
  • the "X" in a box indicates that the parameter in that column was used for the sound modelling for the user parameter found in that particular row.
  • the user parameters, Break, Clutch, and Gas Pedal control two "internal" variables, car speed and engine speed.
  • the two internal variables are governed by a 2D differential equation with the Pedal settings as inputs.
  • the car speed and engine speed in turn control the synthesizer event generation.
  • the engine speed controls the firing of pistons, each firing is a separately triggered event (many hundreds per second).
  • the car speed controls the "rumble" of the car strolling along the road.
  • the wind sound model consists of a very small number of events.
  • “Strength” controls the mean value of the parameters (stronger wind has higher volume, pitch, filter Q, etc).
  • the “width of variation” controls the deviation from the mean (of the same parameters) and "Gustiness” controls the rate of change of the parameters
  • “wind Strength” also controls the number of layers (e.g., number of "whistles”) in the sound.
  • the system is capable of generating a plurality of different sounds from a generic model by selection of parameter values of that model, in order to provide concurrent automatic generation of content-related information describing a generated sound, text label elements are associated with different ranges of values of the parameters so that selection of a parameter value will automatically select an appropriate descriptive label element.
  • the label elements are then combined with a model label element to form the complete sound label.
  • the sound label may be associated with the sound it describes in accordance with any suitable means but preferably in accordance with the MPEG 7 or MPEG 9 standards and the label may be attached to a specific time location in any media, for example a movie, where the sound that the label describes is used.
  • the label may be associated with any representation of the sound.
  • the label may be associated with the actual sound either in digital file form or analog file form.
  • the label may be associated with a file of the control codes provided by the model control system or controlling application to the synthesizer.
  • the actual model used to generate the control codes for the synthesizer would be available where the sound is to be reproduced.
  • a multimedia document may have access to a database of sound models (or references thereto), so that the sound can be specified simply by the particular selected parameter values of that model, with the sound label being associated with those selected parameter values and used to search the database for the required model.
  • the structure of a model can be viewed as:
  • the English grammatical structure of a label similarly consists of a subject (object or event) and attributes of the subject.
  • the "root" object or event is specified by the model label element with a specification of attributes of the root specified by the value label element(s).
  • the adjectives and adverbs that constitute the attribute specifications in a label can be expressed using a structure that is closer to the model structure attribute/value pairs.
  • the degree of explanation/eloquence in a label is a matter of design choice.
  • To translate a numerical value from the model into a label element the allowable range of the parameter is divided into segments or ranges, and each range is given a name. The value label element is then produced by combining the range name with the parameter name, using the range as a modifier for the parameter.
  • the range name For some label elements, it is necessary only to specify the range name as the label element, using again the footsteps model, if a surface is specified, for example "concrete", there may be only two ranges, specifying no concrete surface or the sound of walking on a concrete surface. when the latter range is chosen, it is sufficient for the label element simply to be referred to by the range name, i.e. "concrete” .
  • Another component of the label translation allows certain parameter value settings to prevent the parameter from participating in the sound label.
  • a footsteps model that has a "limp" parameter.
  • the model description elements are also customizable. There are many parameters that are common across a wide range of sounds (eg "speed”), and the range segments might be given useful default text labels. However, new models often require unique parameters, or else a standard parameter name like "speed” might require model -specific labels (speed for walking, trains and cars might use “slow”, “medium”, “fast”, while speed for a wind might more usefully be translated into “gentle” and “strong”). Customization of specific labels for specific parameters is preferably provided, therefore, by user-defining label elements and the corresponding ranges.
  • the root (model label element) is "footsteps".
  • the parameter values are divided up into ranges and a text label element is associated with each range depending upon the information desi red.
  • ⁇ N.A> a value represented by ⁇ N.A> below is associated with such a range.
  • the parameter values lie between 0-1, except for "weight” and "limp" which extend from -1 to 1. Each label is used for a range commencing with the adjacent value up to the next value/label pair, or the top of the range, if appropriate.
  • the text elements are put together using a model -specific "template” description which is a list of control codes, character strings and parameter names that define how the text components from the model and its parameters are strung together into a sentence-like description of the sound.
  • template a list of control codes, character strings and parameter names that define how the text components from the model and its parameters are strung together into a sentence-like description of the sound.
  • the template contains a place-holder for the text corresponding to the root and each model parameter, as well as quoted strings that "glue" the text chunks into a psuedo- sentence.
  • the heart of the footsteps template is:
  • the parameter ranges and respective labels, as shown above, for the "footsteps" model are stored in a file.
  • the same file also stores the template.
  • the system calls a function that compares the current model parameter settings with the file and constructs the label using the template accordingly.
  • the embodiment described is not to be considered limitative.
  • the label has been shown constructed in an intuitive, grammatical way, this is not essential.
  • the label may simply comprise label elements combined in a semi -grammatical way or even as a selection of grammatically separate descriptive elements which together form the defined label.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

L'invention concerne un appareil pour étiqueter un son, l'appareil comprenant un générateur de sons capable de générer une famille de sons par la sélection des valeurs de paramètres d'un modèle de son, au moins certaines valeurs de paramètres étant associées avec des étiquettes de description respectives, la sélection de la valeur sélectionnant automatiquement l'étiquette correspondante. Les étiquettes de valeurs sont de préférence combinées avec une étiquette indiquant le modèle dans une structure grammaticale, pour fournir une description intuitive du son.
PCT/SG1999/000010 1999-01-29 1999-01-29 Procede pour etiqueter un son ou une representation de celui-ci WO2000045387A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/SG1999/000010 WO2000045387A1 (fr) 1999-01-29 1999-01-29 Procede pour etiqueter un son ou une representation de celui-ci
AU32843/99A AU3284399A (en) 1999-01-29 1999-01-29 A method of labelling a sound or a representation thereof
JP2000596565A JP2002536680A (ja) 1999-01-29 1999-01-29 音響またはその表示のラベリング方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG1999/000010 WO2000045387A1 (fr) 1999-01-29 1999-01-29 Procede pour etiqueter un son ou une representation de celui-ci

Publications (1)

Publication Number Publication Date
WO2000045387A1 true WO2000045387A1 (fr) 2000-08-03

Family

ID=20430185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG1999/000010 WO2000045387A1 (fr) 1999-01-29 1999-01-29 Procede pour etiqueter un son ou une representation de celui-ci

Country Status (3)

Country Link
JP (1) JP2002536680A (fr)
AU (1) AU3284399A (fr)
WO (1) WO2000045387A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2486663A (en) * 2010-12-21 2012-06-27 Sony Comp Entertainment Europe Audio data generation using parametric description of features of sounds

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7478640B2 (ja) 2020-10-06 2024-05-07 株式会社電通 移動音生成システム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5218187A (en) * 1990-01-18 1993-06-08 Norand Corporation Hand-held data capture system with interchangeable modules
US5262940A (en) * 1990-08-23 1993-11-16 Lester Sussman Portable audio/audio-visual media tracking device
US5399844A (en) * 1993-01-12 1995-03-21 Facility Management Systems, Inc. Inspection prompting and reading recording system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5218187A (en) * 1990-01-18 1993-06-08 Norand Corporation Hand-held data capture system with interchangeable modules
US5262940A (en) * 1990-08-23 1993-11-16 Lester Sussman Portable audio/audio-visual media tracking device
US5399844A (en) * 1993-01-12 1995-03-21 Facility Management Systems, Inc. Inspection prompting and reading recording system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2486663A (en) * 2010-12-21 2012-06-27 Sony Comp Entertainment Europe Audio data generation using parametric description of features of sounds

Also Published As

Publication number Publication date
JP2002536680A (ja) 2002-10-29
AU3284399A (en) 2000-08-18

Similar Documents

Publication Publication Date Title
Hunt et al. The importance of parameter mapping in electronic instrument design
US8357847B2 (en) Method and device for the automatic or semi-automatic composition of multimedia sequence
Rocchesso et al. Sounding objects
US5952599A (en) Interactive music generation system making use of global feature control by non-musicians
US20200286520A1 (en) Systems and methods for modifying videos based on music
Fontana et al. Physics-based sound synthesis and control: crushing, walking and running by crumpling sounds
JP2011210285A (ja) 創作物作成支援方法及びその装置並びに記録媒体
AU2021221475A1 (en) System and method for performance in a virtual reality environment
US6658309B1 (en) System for producing sound through blocks and modifiers
AU9292498A (en) Interactive sound effects system and method of producing model-based sound effects
DiPaola et al. Emotional remapping of music to facial animation
WO2000045387A1 (fr) Procede pour etiqueter un son ou une representation de celui-ci
Dubnov et al. Delegating creativity: Use of musical algorithms in machine listening and composition
Sporka et al. Design and implementation of a non-linear symphonic soundtrack of a video game
Freeman et al. Auracle: a voice-controlled, networked sound instrument
DiPaola et al. Affective communication remapping in musicface system
KR100383019B1 (ko) 뮤직비디오 제작장치
Hamilton Perceptually coherent mapping schemata for virtual space and musical method
US11797267B2 (en) Method for playing audio source using user interaction and a music application using the same
Plut Application and evaluation of affective adaptive generative music for video games
Yu Computer generated music composition
US20240038205A1 (en) Systems, apparatuses, and/or methods for real-time adaptive music generation
Zadel A software system for laptop performance and improvisation
Kallionpää et al. Climb!—A Composition Case Study: Actualizing and Replicating Virtual Spaces in Classical Music Composition and Performance
JPH10503851A (ja) 芸術作品の再配列

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 596565

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 09889556

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase