WO1997045804A1 - Audio data management for interactive applications - Google Patents

Audio data management for interactive applications Download PDF

Info

Publication number
WO1997045804A1
WO1997045804A1 PCT/IB1997/000359 IB9700359W WO9745804A1 WO 1997045804 A1 WO1997045804 A1 WO 1997045804A1 IB 9700359 W IB9700359 W IB 9700359W WO 9745804 A1 WO9745804 A1 WO 9745804A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
stack
stage
virtual environment
data
Prior art date
Application number
PCT/IB1997/000359
Other languages
French (fr)
Inventor
Richard David Gallery
Timothy Stuart Owlett
Original Assignee
Philips Electronics N.V.
Philips Norden Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philips Electronics N.V., Philips Norden Ab filed Critical Philips Electronics N.V.
Priority to EP97908458A priority Critical patent/EP0847562A1/en
Priority to JP9541885A priority patent/JPH11511268A/en
Publication of WO1997045804A1 publication Critical patent/WO1997045804A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/54Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/062Combinations of audio and printed presentations, e.g. magnetically striped cards, talking books, magnetic tapes with printed texts thereon
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/50Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
    • A63F2300/53Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing
    • A63F2300/538Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing for performing operations on behalf of the game client, e.g. rendering
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6063Methods for processing data by generating or executing the game program for sound processing

Definitions

  • the present invention relates to interactive entertainment apparatuses and in particular to the handling of audio data in such a system. Additionally, but not exclusively, the invention relates to the handling of audio data in conjunction with graphic or video data.
  • an on-screen character (as a filmed sequence or animated sprite) directly asks a question of the viewer, to which question there are a few permitted answers.
  • the possible answers are presented to the user via a display on a hand-held unit which unit also includes voice recognition circuitry to detect the user speaking the answers.
  • the audio data is stored independently of the video data to allow for flexibility in re-use, although a cueing circuit is provided to maintain synchronism between the two at run time.
  • a drawback of such systems is their reliance on scripting with the audio being tied to other features, such as the scripted conversations in the Baer reference, and the branch narrative structure of Best. This can lead to a lack of immersion for the user, with only a limited number of possible responses and these being predictable: this becomes particularly noticeable in the case of an audio-only application such as an electronic storybook, where there is no video to distract from repetitive audio.
  • a further problem with such systems is their flexibility and/or capacity for editing.
  • the audio data is stored in a ROM device in its scripted form and, even if in small, individually addressed audio files (for example tracks of an audio compact disc) there is no simple way to introduce new audio segments or sound effects or to substitute one existing segment for another.
  • an interactive audio entertainment apparatus comprising: a first memory holding data defining a virtual environment and objects within said virtual environment; a processor coupled with the memory and arranged to generate and periodically update a model of the virtual environment and objects therein, and to generate indicators to respective predetermined audio segments to be generated in response to respective predetermined conditions or conjunctions of conditions occurring in the virtual environment; a stack memory coupled to receive and sequentially store said indicators from the processor; and an audio management stage arranged to, on each periodic update of the virtual environment, pull the stack contents, and initiate the generation of the respectively indicated audio segment or segments.
  • objects refers to any feature of the virtual environment, rather than a software construction packaging data or procedures operating on such data.
  • the objects may be "solid" features of the virtual environment, such as animated characters, furniture or buildings, or they may be of a more abstract nature such as temperature or time. Consequently, the above-mentioned conditions or conjunctions of conditions triggering audio segment generation may, for example, be a car hitting a tree (as modelled within the virtual environment) or a dog beginning to bark if it is night and hotter than a predetermined temperature.
  • the stack of indicators transferring data from the processor to the audio management stage provides great flexibility for operation of the system: to substitute an audio segment, it is only necessary to change the indicator against which that segment is recorded, and adding additional audio segments only involves increasing the number of indicators rather than re-recording entire scripted passages to accommodate a small change to, for example, a simple phrase within.
  • the apparatus suitably includes an audio reproduction stage, coupled to the audio management stage, and operable to generate the said predetermined audio segments when initiated by said audio management stage, and an audio data memory may be provided coupled with the audio reproduction stage.
  • the memory may hold individually addressable audio data segments defining respective ones of the said audio segments, and may be accessed following initiation of the said audio reproduction stage by the audio management stage.
  • Each of the indicators generated by the processor may include an identifier for the location of the respective audio data segments within such an audio data memory, with the different audio data segments stored as respective numbered files in the audio data memory, and the processor being arranged to specify at least the file number as part of a common format for indicators within the stack memory.
  • the audio reproduction stage may suitably have a capability for applying one or more signal processing techniques to audio signals, and the data passed to the audio reproduction stage from the audio management stage may include specification of one or more of such techniques to be applied.
  • the particular audio segment identifier may have a null value or carry some further code indicating that the processing technique is to be applied globally or just to audio originating from particular localised areas. This would avoid the need for treatments such as echo to be individually specified for all audio segments originated, for example, within a dungeon scenario.
  • the audio reproduction stage may be operable to output audio signals on two or more channels with different signal processing applied for each channel.
  • User operable input means may be provided coupled to the processor whereby a user is enabled to alter one or more of said respective predetermined conditions pertaining to at least one virtual environment object: in other words, the user may trigger various audio events by deliberately or accidentally setting up the condition or conjunction of conditions within the virtual environment with which the particular audio segment is associated.
  • the apparatus may suitably include a further store holding geometric and surface data describing a physical appearance for the modelled virtual environment and each of the objects therein, together with rendering means operable to periodically generate images of the virtual world from at least one viewpoint therein.
  • Figure 1 is a simplified representation of an interactive entertainment system which may suitably embody the present invention
  • Figure 2 schematically illustrates audio data management according tc the present invention
  • FIG. 3 is a block schematic diagram of an interactive entertainment apparatus embodying the present invention.
  • Figure 4 is a flowchart illustrating the relative order of functions performed by the apparatus of Figure 3;
  • Figure 5 represents exemplary contents for three successive entries in the audio stack of Figures 2 and 3;
  • Figure 6 is a flowchart illustrating operation of the audio management stage of Figure 2.
  • Figure 1 shows a simulation of a virtual world, using real time graphics and audio, as an example of virtual reality. It is a multi-user interactive world, with two users A and B shown.
  • a video display 10 presents a representation of a virtual world to the users, with the representation being produced by control and rendering apparatus 12.
  • Each user has a respective input device 14 coupled with the control apparatus 12, and receives audio feedback from the simulation via speakers 16 (a quadraphonic arrangement being shown).
  • the virtual world in this example takes the form of a room within which two animated characters 18, 20 appear, each animated character being controlled by a respective user A, B via their respective input devices 14.
  • the virtual world features a number of other modelled objects (including a table 22, vase 24, gun 26, and door 28) which may be moved or otherwise interacted with by the characters, together with "invisible” objects such as time or temperature which may also affect or initiate interaction.
  • modelled objects including a table 22, vase 24, gun 26, and door 28
  • the video display may comprise an autostereo display to provide a three-dimensional (3D) image of the virtual world to the users.
  • a multiple view 'autostereo' display may be provided such that, at their respective positions relative to the LCD screen, the users are presented with respective images of the virtual world, suitably from the viewpoint of their character.
  • each user may be provided with a stereoscopic head-mounted display (HMD) unit, with which one or more of the speakers 16 may be integral.
  • HMD head-mounted display
  • the form of user input device may also be subject to variations, from a simple manually operated unit 14 as shown, to full-body suits detecting the users compound motions and, via the control and rendering stage 12, reproducing these as corresponding movements of the users respective character 18, 20.
  • the present invention is particularly concerned with the efficient handling of audio for such simulations, where various audio events are generated (leading to output of respective audio segments via speakers 16) in response to particular events within the virtual world, suitably as part of a periodic world evaluation procedure which takes account of user input and other factors to determine not only how the visual representation of the virtual world should be updated, but also what sounds should be generated to accompany the images.
  • Figure 2 schematically illustrates the handling of audio in relation to the world evaluation procedure 30.
  • the modelled characters and other objects may generate or trigger various audio events, such that the audio-state of the world changes on each world state evaluation.
  • This information is then passed to the audio management procedure 32 via a stack memory 34 as will be described.
  • the audio management procedure 32 takes the information from the stack 34 to generate loudness and scaling factors for spatial localisation, and other audio effects.
  • This information is then passed to an audio generation board 36 which, in conjunction with an audio store 38, synthesises digital audio signals to play out.
  • the use of the audio stack 34 enables efficient passing of the audio event data between the world update process 30 and the audio management process 32, using a relatively small number of common functions for all types of audio, with software functions, during the world evaluation process, passing audio data into the audio stack 34, and the stack being read repeatedly during the audio management procedure 32. This data from the stack is then used to calculate relative positions, loudnesses and other factors for audio processing.
  • FIG. 3 An apparatus implementation of the handling scheme of Figure 2 is schematically illustrated in Figure 3: for reasons of clarity, the features for only a single user input and video output are shown, although it will be well understood how these should be replicated for multiple users and/or stereoscopic video output. Where appropriate, reference numerals from Figures 1 and 2 are used to identify corresponding or directly equivalent features.
  • the apparatus is based around a coupled pair of processor stages respectively handling the world (including video) 42 and audio 32 management, with the world processor 42 passing audio event data to the audio processor 32 via stack memory 34.
  • the two processors 42, 32 are coupled to a suitable clock signal source 44 as will be readily understood.
  • the world processor 42 is coupled to a random access memory (RAM) 46 containing data defining the virtual world and characters and objects therein, with the contents of RAM 46 being updated during world evaluation (30; Fig.2) to reflect the current position, orientation, etc. for the characters and objects which may have changed in response to user input to the world processor via UID 14.
  • RAM 46 random access memory
  • the contents of RAM 46 are periodically read by rendering stage 40 under control of the world processor 42 and used to generate the image or images of the virtual world from global or user viewpoints for subsequent display.
  • the audio processor 32 is coupled to receive function calls from the world processor and to pull data entries from the stack 34. Having calculated the audio processing required, for example to scale the volume for a particular audio event in relation to the relative positions within the virtual world of each user and the source of that audio event, the audio processor outputs the data to audio generation stage 36.
  • the data may comprise a digitised audio signal, or it may simply comprise an index term identifying where the particular audio signal file or other data is stored in an audio data read-only memory (ROM) coupled with the audio generation stage.
  • ROM audio data read-only memory
  • the data from the audio processor is accompanied by the details of the signal processing to be applied within audio generation unit 36 and in fact, as mentioned beforehand, the data from the audio processor may comprise only signal processing details for global or localised effects.
  • a linear arrangement for the world evaluation routine as applied by the apparatus of Figure 3 is represented by the flowchart of Figure 4.
  • the first step 61 is to take account of input data received since the previous world evaluation.
  • each object within the virtual world is processed in turn (step 62) to determine if it has moved or how it should be moved, and what if any audio should be generated as accompaniment.
  • the audio management (as at 32; Fig.2) determines the processing required to be applied to generate spatial or other effects, for passing to the audio generation stage.
  • the fourth stage 64 is to update the geometry database, that is to say the data model of the virtual world and its contents as held in RAM 46 (Fig.3), and the fifth stage 65 is to render an image of the virtual world from the or each selected viewpoint.
  • the steps of Figure 4 may not be performed in the order shown or may be performed concurrently.
  • the object evaluation routines 62 start by taking the first object - character 20.
  • the walking flag is set for this object, which indicates that an audio "walk-event" has occurred and a footfall should sound
  • the world processor pushes the audio stack with an entry.
  • the explosion flag is true, the explosion graphic has started and the explosion audio should be played; consequently, the audio stack is pushed with another entry.
  • the second object is then processed and, as before, the walking flag is true and the animation frame is in the right place.
  • the audio stack is then pushed for a third time to identify the footfall sounds for character 20.
  • the audio stack contains three entries as shown in Figure 5, where the data held for each entry is as follows: filejium: an identifying integer for the particular desired audio file stored within the audio file ROM 38 (Fig.3); loudness: an integer value, suitably ranging between 0 and 255, specifying the relative loudness for an audio file playback at the location within the virtual world which led to its generation; posn: the position of the sound source specified in three dimensions in the virtual world by an appropriate co-ordinate system; obj num: a flag which indicates whether or not the audio is to be muted for its source location on playback to a level equivalent to that at which the audio is heard by other characters following attenuation due to their separation from the source; dist_scale: a flag indicating whether or not the loudness of a audio segment should be attenuated with distance from the source; stop: a flag indicating whether the file should stop playing, particularly for use with looped audio files (see below); localise: a flag indicating whether or not an audio file
  • the audio management stage 32 When the audio management stage 32 is called then, as shown by the flowchart of Figure 6, it repeatedly pulls this audio stack until it is empty (step 70 "Pull Stack” and step 73 "Stack Empty?"). Note that pulling the stack does not necessarily require the removal of the data therefrom: identifying that data to the audio management stage achieves the desired function without requiring copying of the data. For each stack entry, the audio management stage 32 parses the stack data (step 71) and uses the information contained to generate four loudness values for the respective speakers 16 in the quadraphonic set up.
  • the parsing may be a two or more pass process by which the audio management stage 32 uses the information to generate respective loudness values for each of the two users A, B, if these are provided with individual speakers. These loudness values take into account distance and direction and, depending on the information contained in the audio stack, then different aural effects can be used within the system.
  • the derived data is output to the audio generation stage 36.
  • the control software contains a few basic functions for handling the stack and its contents, the functions being called externally.
  • a global variable num_entries which references the stack array, always points to the next free entry.
  • An upper value for this suitably provides a control condition limiting the number of audio events which may be handled for each refresh operation.
  • the first of the functions is push_stack which is used to copy data into the next free entry in the stack as indicated by num_entries.
  • the second function, pull_stack is principally for diagnostic and analytical functions and copies stack entries to other storage areas whilst decrementing the value of num_entries.
  • the third function, show stack is again for diagnostic and analytical purposes and results in the output of a record of current stack contents without affecting those contents.
  • a further routine, initialise_stack is suitably also provided for calling at start-up to ensure that all relevant variables and sections of memory are initialised to zero. This routine might suitably be used in conjunction with an initialising set of stack entries which, rather than identifying audio segments or effects, just identify features such as initial character/object locations within the virtual environment in order to provide a more complete and self-referential data structure.
  • modelled characters and objects within the virtual world may generate various audio events, such that the audio-state of the world changes on each world state evaluation.
  • This information is passed to an audio management procedure via a stack memory, with the audio management procedure taking the information from the stack to generate loudness and scaling factors for spatial localisation, and other audio effects.
  • This information is then passed to an audio generation board which, in conjunction with an audio store, synthesises digital audio signals to play out.
  • the use of the audio stack enables efficient passing of the audio event data between the world update process and the audio management process, using a relatively small number of common functions for all types of audio.

Abstract

An arrangement of apparatus is provided for the handling of audio in relation to an evaluation procedure (30) for a virtual world or environment (which may or may not be visually displayed). For each instant in time, for which the state of the virtual world is evaluated, modelled characters and objects within the virtual world may generate various audio events, such that the audio-state of the world changes on each world state evaluation. This information is passed to an audio management procedure (32) via a stack memory (34), with the audio management procedure (32) taking the information from the stack (34) to generate loudness and scaling factors for spatial localisation, and other audio effects. This information is then passed to an audio generation board (36) which, in conjunction with an audio store (38), synthesises digital audio signals to play out. The use of the audio stack (34) enables efficient passing of the audio event data between the world update process (30) and the audio management process (32), using a relatively small number of common functions for all types of audio.

Description

DESCRIPTION
AUDIO DATA MANAGEMENT FOR INTERACTIVE APPLICATIONS
The present invention relates to interactive entertainment apparatuses and in particular to the handling of audio data in such a system. Additionally, but not exclusively, the invention relates to the handling of audio data in conjunction with graphic or video data.
An example of an interactive audio system is described in US Patent
4,846,693 (Baer), with an animated toy figure such as a teddy bear or doll being coupled to a video animation unit presenting an image of a second figure on a television screen or other display. A scripted conversation occurs between the figures via a speaker within the toy and a speaker at the display, with appropriate controlled animation of each figure in synchronism with the audio. A user input enables a limited amount of user interaction with the screened figure (for example answering multiple choice questions); an arrangement of one or more position sensors is also suggested for triggering phrases such as "turn me over" or "pick me up" from the toy figure. A further example is given in US patent 4,305,131 (Best) which describes a story or game arrangement with a branching narrative structure. At the branch points, an on-screen character (as a filmed sequence or animated sprite) directly asks a question of the viewer, to which question there are a few permitted answers. The possible answers are presented to the user via a display on a hand-held unit which unit also includes voice recognition circuitry to detect the user speaking the answers. The audio data is stored independently of the video data to allow for flexibility in re-use, although a cueing circuit is provided to maintain synchronism between the two at run time.
A drawback of such systems is their reliance on scripting with the audio being tied to other features, such as the scripted conversations in the Baer reference, and the branch narrative structure of Best. This can lead to a lack of immersion for the user, with only a limited number of possible responses and these being predictable: this becomes particularly noticeable in the case of an audio-only application such as an electronic storybook, where there is no video to distract from repetitive audio. A further problem with such systems is their flexibility and/or capacity for editing. In both Baer and Best, the audio data is stored in a ROM device in its scripted form and, even if in small, individually addressed audio files (for example tracks of an audio compact disc) there is no simple way to introduce new audio segments or sound effects or to substitute one existing segment for another.
It is therefore an object of the present invention to provide a fast and flexible means for the handling of audio data, and in particular to provide an apparatus configuration for handling such data as a part of a real-time interactive system.
In accordance with the present invention there is provided an interactive audio entertainment apparatus comprising: a first memory holding data defining a virtual environment and objects within said virtual environment; a processor coupled with the memory and arranged to generate and periodically update a model of the virtual environment and objects therein, and to generate indicators to respective predetermined audio segments to be generated in response to respective predetermined conditions or conjunctions of conditions occurring in the virtual environment; a stack memory coupled to receive and sequentially store said indicators from the processor; and an audio management stage arranged to, on each periodic update of the virtual environment, pull the stack contents, and initiate the generation of the respectively indicated audio segment or segments.
It should be noted that the term "objects" used herein refers to any feature of the virtual environment, rather than a software construction packaging data or procedures operating on such data. The objects may be "solid" features of the virtual environment, such as animated characters, furniture or buildings, or they may be of a more abstract nature such as temperature or time. Consequently, the above-mentioned conditions or conjunctions of conditions triggering audio segment generation may, for example, be a car hitting a tree (as modelled within the virtual environment) or a dog beginning to bark if it is night and hotter than a predetermined temperature.
The stack of indicators transferring data from the processor to the audio management stage provides great flexibility for operation of the system: to substitute an audio segment, it is only necessary to change the indicator against which that segment is recorded, and adding additional audio segments only involves increasing the number of indicators rather than re-recording entire scripted passages to accommodate a small change to, for example, a simple phrase within.
The apparatus suitably includes an audio reproduction stage, coupled to the audio management stage, and operable to generate the said predetermined audio segments when initiated by said audio management stage, and an audio data memory may be provided coupled with the audio reproduction stage. In such an arrangement, the memory may hold individually addressable audio data segments defining respective ones of the said audio segments, and may be accessed following initiation of the said audio reproduction stage by the audio management stage. Each of the indicators generated by the processor may include an identifier for the location of the respective audio data segments within such an audio data memory, with the different audio data segments stored as respective numbered files in the audio data memory, and the processor being arranged to specify at least the file number as part of a common format for indicators within the stack memory.
The audio reproduction stage may suitably have a capability for applying one or more signal processing techniques to audio signals, and the data passed to the audio reproduction stage from the audio management stage may include specification of one or more of such techniques to be applied. In such an arrangement, the particular audio segment identifier may have a null value or carry some further code indicating that the processing technique is to be applied globally or just to audio originating from particular localised areas. This would avoid the need for treatments such as echo to be individually specified for all audio segments originated, for example, within a dungeon scenario. Additionally, the audio reproduction stage may be operable to output audio signals on two or more channels with different signal processing applied for each channel.
User operable input means may be provided coupled to the processor whereby a user is enabled to alter one or more of said respective predetermined conditions pertaining to at least one virtual environment object: in other words, the user may trigger various audio events by deliberately or accidentally setting up the condition or conjunction of conditions within the virtual environment with which the particular audio segment is associated.
Whilst applicable to purely audio applications such as talking books, the apparatus may suitably include a further store holding geometric and surface data describing a physical appearance for the modelled virtual environment and each of the objects therein, together with rendering means operable to periodically generate images of the virtual world from at least one viewpoint therein.
Further features and advantages of the present invention will become apparent from reading of the following description of proffered embodiments of the present invention, given by way of example only, and with reference to the following drawings, in which: Figure 1 is a simplified representation of an interactive entertainment system which may suitably embody the present invention;
Figure 2 schematically illustrates audio data management according tc the present invention;
Figure 3 is a block schematic diagram of an interactive entertainment apparatus embodying the present invention;
Figure 4 is a flowchart illustrating the relative order of functions performed by the apparatus of Figure 3;
Figure 5 represents exemplary contents for three successive entries in the audio stack of Figures 2 and 3; and
Figure 6 is a flowchart illustrating operation of the audio management stage of Figure 2.
Figure 1 shows a simulation of a virtual world, using real time graphics and audio, as an example of virtual reality. It is a multi-user interactive world, with two users A and B shown. A video display 10 presents a representation of a virtual world to the users, with the representation being produced by control and rendering apparatus 12. Each user has a respective input device 14 coupled with the control apparatus 12, and receives audio feedback from the simulation via speakers 16 (a quadraphonic arrangement being shown). The virtual world in this example takes the form of a room within which two animated characters 18, 20 appear, each animated character being controlled by a respective user A, B via their respective input devices 14. In addition to the animated characters 18, 20, the virtual world features a number of other modelled objects (including a table 22, vase 24, gun 26, and door 28) which may be moved or otherwise interacted with by the characters, together with "invisible" objects such as time or temperature which may also affect or initiate interaction.
Many variations on the arrangement of Figure 1 will be apparent to the skilled person. For example, the video display may comprise an autostereo display to provide a three-dimensional (3D) image of the virtual world to the users. In a modification, a multiple view 'autostereo' display may be provided such that, at their respective positions relative to the LCD screen, the users are presented with respective images of the virtual world, suitably from the viewpoint of their character. Alternatively, rather than a single or multiple view screen, each user may be provided with a stereoscopic head-mounted display (HMD) unit, with which one or more of the speakers 16 may be integral.
The form of user input device (UID) may also be subject to variations, from a simple manually operated unit 14 as shown, to full-body suits detecting the users compound motions and, via the control and rendering stage 12, reproducing these as corresponding movements of the users respective character 18, 20. The present invention is particularly concerned with the efficient handling of audio for such simulations, where various audio events are generated (leading to output of respective audio segments via speakers 16) in response to particular events within the virtual world, suitably as part of a periodic world evaluation procedure which takes account of user input and other factors to determine not only how the visual representation of the virtual world should be updated, but also what sounds should be generated to accompany the images.
Figure 2 schematically illustrates the handling of audio in relation to the world evaluation procedure 30. For each instant in time, for which the state of the virtual world is evaluated, the modelled characters and other objects may generate or trigger various audio events, such that the audio-state of the world changes on each world state evaluation. This information is then passed to the audio management procedure 32 via a stack memory 34 as will be described. The audio management procedure 32 takes the information from the stack 34 to generate loudness and scaling factors for spatial localisation, and other audio effects. This information is then passed to an audio generation board 36 which, in conjunction with an audio store 38, synthesises digital audio signals to play out.
The use of the audio stack 34 enables efficient passing of the audio event data between the world update process 30 and the audio management process 32, using a relatively small number of common functions for all types of audio, with software functions, during the world evaluation process, passing audio data into the audio stack 34, and the stack being read repeatedly during the audio management procedure 32. This data from the stack is then used to calculate relative positions, loudnesses and other factors for audio processing.
An apparatus implementation of the handling scheme of Figure 2 is schematically illustrated in Figure 3: for reasons of clarity, the features for only a single user input and video output are shown, although it will be well understood how these should be replicated for multiple users and/or stereoscopic video output. Where appropriate, reference numerals from Figures 1 and 2 are used to identify corresponding or directly equivalent features.
The apparatus is based around a coupled pair of processor stages respectively handling the world (including video) 42 and audio 32 management, with the world processor 42 passing audio event data to the audio processor 32 via stack memory 34. The two processors 42, 32 are coupled to a suitable clock signal source 44 as will be readily understood.
The world processor 42 is coupled to a random access memory (RAM) 46 containing data defining the virtual world and characters and objects therein, with the contents of RAM 46 being updated during world evaluation (30; Fig.2) to reflect the current position, orientation, etc. for the characters and objects which may have changed in response to user input to the world processor via UID 14. The contents of RAM 46 are periodically read by rendering stage 40 under control of the world processor 42 and used to generate the image or images of the virtual world from global or user viewpoints for subsequent display.
The audio processor 32 is coupled to receive function calls from the world processor and to pull data entries from the stack 34. Having calculated the audio processing required, for example to scale the volume for a particular audio event in relation to the relative positions within the virtual world of each user and the source of that audio event, the audio processor outputs the data to audio generation stage 36. The data may comprise a digitised audio signal, or it may simply comprise an index term identifying where the particular audio signal file or other data is stored in an audio data read-only memory (ROM) coupled with the audio generation stage. In either case, the data from the audio processor is accompanied by the details of the signal processing to be applied within audio generation unit 36 and in fact, as mentioned beforehand, the data from the audio processor may comprise only signal processing details for global or localised effects.
A linear arrangement for the world evaluation routine as applied by the apparatus of Figure 3 is represented by the flowchart of Figure 4. The first step 61 is to take account of input data received since the previous world evaluation. Secondly, each object within the virtual world is processed in turn (step 62) to determine if it has moved or how it should be moved, and what if any audio should be generated as accompaniment. In the third step 63, the audio management (as at 32; Fig.2) determines the processing required to be applied to generate spatial or other effects, for passing to the audio generation stage. The fourth stage 64 is to update the geometry database, that is to say the data model of the virtual world and its contents as held in RAM 46 (Fig.3), and the fifth stage 65 is to render an image of the virtual world from the or each selected viewpoint. As will be readily understood, with an asynchronous virtual environment, the steps of Figure 4 may not be performed in the order shown or may be performed concurrently.
Referring again to the scenario of Figure 1 , there are two human- controlled objects (characters 18, 20) in the world, both of which are walking and the first of which is firing the gun (object 26). During the world evaluation process, the object evaluation routines 62 start by taking the first object - character 20. As the walking flag is set for this object, which indicates that an audio "walk-event" has occurred and a footfall should sound, the world processor pushes the audio stack with an entry. Also as the explosion flag is true, the explosion graphic has started and the explosion audio should be played; consequently, the audio stack is pushed with another entry. The second object is then processed and, as before, the walking flag is true and the animation frame is in the right place. The audio stack is then pushed for a third time to identify the footfall sounds for character 20.
At this stage, the audio stack contains three entries as shown in Figure 5, where the data held for each entry is as follows: filejium: an identifying integer for the particular desired audio file stored within the audio file ROM 38 (Fig.3); loudness: an integer value, suitably ranging between 0 and 255, specifying the relative loudness for an audio file playback at the location within the virtual world which led to its generation; posn: the position of the sound source specified in three dimensions in the virtual world by an appropriate co-ordinate system; obj num: a flag which indicates whether or not the audio is to be muted for its source location on playback to a level equivalent to that at which the audio is heard by other characters following attenuation due to their separation from the source; dist_scale: a flag indicating whether or not the loudness of a audio segment should be attenuated with distance from the source; stop: a flag indicating whether the file should stop playing, particularly for use with looped audio files (see below); localise: a flag indicating whether or not an audio file should be spatially localised, such that factors such as direction of origin relative to an observer are taken into account when calculation relative attenuations for the different audio channels; self audio: a flag which, if set, means the audio file may only be heard by the character/object generating it (i.e. for a character thinking); loop: a flag which indicates the current audio file is to be repeated as soon as it has completed (to avoid having to continually re-specify the one file) until the 'stop' flag is set.
When the audio management stage 32 is called then, as shown by the flowchart of Figure 6, it repeatedly pulls this audio stack until it is empty (step 70 "Pull Stack" and step 73 "Stack Empty?"). Note that pulling the stack does not necessarily require the removal of the data therefrom: identifying that data to the audio management stage achieves the desired function without requiring copying of the data. For each stack entry, the audio management stage 32 parses the stack data (step 71) and uses the information contained to generate four loudness values for the respective speakers 16 in the quadraphonic set up. As shown by Step 72 ("More Users?") the parsing may be a two or more pass process by which the audio management stage 32 uses the information to generate respective loudness values for each of the two users A, B, if these are provided with individual speakers. These loudness values take into account distance and direction and, depending on the information contained in the audio stack, then different aural effects can be used within the system. On completion of the processing by audio processor 32, the derived data is output to the audio generation stage 36.
The control software contains a few basic functions for handling the stack and its contents, the functions being called externally. A global variable num_entries, which references the stack array, always points to the next free entry. An upper value for this suitably provides a control condition limiting the number of audio events which may be handled for each refresh operation.
The first of the functions is push_stack which is used to copy data into the next free entry in the stack as indicated by num_entries. The second function, pull_stack, is principally for diagnostic and analytical functions and copies stack entries to other storage areas whilst decrementing the value of num_entries. The third function, show stack, is again for diagnostic and analytical purposes and results in the output of a record of current stack contents without affecting those contents. A further routine, initialise_stack, is suitably also provided for calling at start-up to ensure that all relevant variables and sections of memory are initialised to zero. This routine might suitably be used in conjunction with an initialising set of stack entries which, rather than identifying audio segments or effects, just identify features such as initial character/object locations within the virtual environment in order to provide a more complete and self-referential data structure.
In summary, we have described an arrangement of apparatus provided for the handling of audio in relation to an evaluation procedure for a virtual world or environment (which may or may not be visually displayed). For each instant in time, for which the state of the virtual world is evaluated, modelled characters and objects within the virtual world may generate various audio events, such that the audio-state of the world changes on each world state evaluation. This information is passed to an audio management procedure via a stack memory, with the audio management procedure taking the information from the stack to generate loudness and scaling factors for spatial localisation, and other audio effects. This information is then passed to an audio generation board which, in conjunction with an audio store, synthesises digital audio signals to play out. The use of the audio stack enables efficient passing of the audio event data between the world update process and the audio management process, using a relatively small number of common functions for all types of audio.
From reading the present disclosure, other modifications will be apparent to persons skilled in the art. Such modifications may involve other features which already known in the field of audio signal handling and processing apparatuses and component parts thereof and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present application also includes any novel feature or any novel combination of features disclosed herein either explicitly or implicitly, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.

Claims

1. An interactive audio entertainment apparatus comprising: a first memory holding data defining a virtual environment and objects within said virtual environment; a processor coupled with the memory and arranged to generate and periodically update a model of the virtual environment and objects therein, and to generate indicators to respective predetermined audio segments to be generated in response to respective predetermined conditions or conjunctions of conditions occurring in the virtual environment; a stack memory coupled to receive and sequentially store said indicators from the processor; and an audio management stage arranged to, on each periodic update of the virtual environment, pull the stack contents, and initiate the generation of the respectively indicated audio segment or segments.
2. Apparatus as claimed in Claim 1 , further comprising an audio reproduction stage, coupled to the audio management stage, and operable to generate the said predetermined audio segments when initiated by said audio management stage.
3. Apparatus as claimed in Claim 2, further comprising an audio data memory coupled with the audio reproduction stage, said memory holding individually addressable audio data segments defining respective ones of said audio segments, and being accessed following initiation of said audio reproduction stage by said audio management stage.
4. Apparatus as claimed in Claim 3, wherein each indicator generated by the processor includes an identifier for the location of the respective audio data segments within the audio data memory.
5. Apparatus as claimed in Claim 4, wherein the different audio data segments are stored as respective numbered files in the audio data memory, and the processor is arranged to specify at least the file number as part of a common format for indicators within the stack memory.
6. Apparatus as claimed in Claim 2, wherein the audio reproduction stage is operable to selectively apply one or more signal processing techniques to audio signals, and the data passed to the audio reproduction stage from the audio management stage includes specification of one or more of said techniques to be applied.
7. Apparatus as claimed in Claim 6, wherein the audio reproduction stage is operable to output audio signals on two or more channels with different signal processing applied for each channel.
8. Apparatus as claimed in Claim 1 , further comprising user operable input means coupled to said processor whereby a user is enabled to alter one or more of said respective predetermined conditions pertaining to at least one virtual environment object.
9. Apparatus as claimed in Claim 1 , further comprising a further store holding geometric and surface data describing a physical appearance for the modelled virtual environment and each of the objects therein, the apparatus further comprising rendering means operable to periodically generate images of the virtual environment from at least one viewpoint therein.
PCT/IB1997/000359 1996-05-29 1997-04-07 Audio data management for interactive applications WO1997045804A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP97908458A EP0847562A1 (en) 1996-05-29 1997-04-07 Audio data management for interactive applications
JP9541885A JPH11511268A (en) 1996-05-29 1997-04-07 Voice data management for conversation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB9611144.8A GB9611144D0 (en) 1996-05-29 1996-05-29 Audio data management for interactive applications
GB9611144.8 1996-05-29

Publications (1)

Publication Number Publication Date
WO1997045804A1 true WO1997045804A1 (en) 1997-12-04

Family

ID=10794427

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB1997/000359 WO1997045804A1 (en) 1996-05-29 1997-04-07 Audio data management for interactive applications

Country Status (5)

Country Link
EP (1) EP0847562A1 (en)
JP (1) JPH11511268A (en)
KR (1) KR19990035937A (en)
GB (1) GB9611144D0 (en)
WO (1) WO1997045804A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008117285A2 (en) * 2007-03-26 2008-10-02 Ronen Zeev Levy Virtual communicator for interactive voice guidance
US8977375B2 (en) 2000-10-12 2015-03-10 Bose Corporation Interactive sound reproducing

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE460020T1 (en) * 2000-09-13 2010-03-15 Stratosaudio Inc SYSTEM AND METHOD FOR ORDERING AND PROVIDING MEDIA CONTENT, USING ADDITIONAL DATA TRANSMITTED IN A BROADCAST SIGNAL

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4305131A (en) * 1979-02-05 1981-12-08 Best Robert M Dialog between TV movies and human viewers
US4846693A (en) * 1987-01-08 1989-07-11 Smith Engineering Video based instructional and entertainment system using animated figure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4305131A (en) * 1979-02-05 1981-12-08 Best Robert M Dialog between TV movies and human viewers
US4846693A (en) * 1987-01-08 1989-07-11 Smith Engineering Video based instructional and entertainment system using animated figure

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977375B2 (en) 2000-10-12 2015-03-10 Bose Corporation Interactive sound reproducing
US9223538B2 (en) 2000-10-12 2015-12-29 Bose Corporation Interactive sound reproducing
US10140084B2 (en) 2000-10-12 2018-11-27 Bose Corporation Interactive sound reproducing
US10481855B2 (en) 2000-10-12 2019-11-19 Bose Corporation Interactive sound reproducing
WO2008117285A2 (en) * 2007-03-26 2008-10-02 Ronen Zeev Levy Virtual communicator for interactive voice guidance
WO2008117285A3 (en) * 2007-03-26 2009-02-26 Ronen Zeev Levy Virtual communicator for interactive voice guidance

Also Published As

Publication number Publication date
EP0847562A1 (en) 1998-06-17
GB9611144D0 (en) 1996-07-31
KR19990035937A (en) 1999-05-25
JPH11511268A (en) 1999-09-28

Similar Documents

Publication Publication Date Title
US5684715A (en) Interactive video system with dynamic video object descriptors
US5692212A (en) Interactive multimedia movies and techniques
JP4015173B1 (en) GAME SOUND OUTPUT DEVICE, GAME SOUND CONTROL METHOD, AND PROGRAM
EP0690426B1 (en) A computer based training system
CN104580973A (en) Recording and playback method and device of virtual surgical simulation process
EP0847562A1 (en) Audio data management for interactive applications
KR20000069830A (en) Method for coding a presentation
US6744487B2 (en) Producing a soundtrack for moving picture sequences
EP3903177B1 (en) Asynchronous communications in mixed-reality
JP2885157B2 (en) Audio output control device
Anstey et al. Building a VR narrative
JP2017184842A (en) Information processing program, information processing device, and information processing method
KR100398291B1 (en) How to run distributed multimedia programs interactively and local station suitable for this method
Pape Composing networked virtual environments
Shilling et al. Videogame and entertainment industry standard sound design techniques and architectures for use in videogames, virtual environments and training systems
TWI774208B (en) Story representation system and method thereof
Hughes Integrating and delivering sound using motion capture and multi-tiered speaker placement
Hughes et al. The evolution of a framework for mixed reality experiences
Liu et al. Newbie Guides for Omnidirectional Guidance in Head-Mounted-Device-Based Museum Applications
Greuel et al. Sculpting 3D worlds with music: advanced texturing techniques
Uhlmann Sensory Modality Mapping for Game Adaptation and Design
Anderberg et al. Follow the Raven: A Study of Audio Diegesis within a Game’s Narrative
Pair et al. COOLVR: Implementing audio in a virtual environments toolkit
CN117762374A (en) Audio playing method and device, virtual reality equipment and storage medium
JP2023516847A (en) Ambient acoustic persistence

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1997908458

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1019980700598

Country of ref document: KR

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1997 541885

Kind code of ref document: A

Format of ref document f/p: F

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1997908458

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1019980700598

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1997908458

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1019980700598

Country of ref document: KR