GB2620591A - System for audio and video simulation - Google Patents

System for audio and video simulation Download PDF

Info

Publication number
GB2620591A
GB2620591A GB2210228.9A GB202210228A GB2620591A GB 2620591 A GB2620591 A GB 2620591A GB 202210228 A GB202210228 A GB 202210228A GB 2620591 A GB2620591 A GB 2620591A
Authority
GB
United Kingdom
Prior art keywords
sample
sound
metadata
audio
simulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2210228.9A
Other versions
GB2620591A8 (en
GB202210228D0 (en
Inventor
Augar Will
Hammond Ben
Hair Andrew
Bartlett Tim
Ricalde Gonzalo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Frontier Dev Ltd
Frontier Developments PLC
Original Assignee
Frontier Dev Ltd
Frontier Developments PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Frontier Dev Ltd, Frontier Developments PLC filed Critical Frontier Dev Ltd
Priority to GB2210228.9A priority Critical patent/GB2620591A/en
Publication of GB202210228D0 publication Critical patent/GB202210228D0/en
Publication of GB2620591A publication Critical patent/GB2620591A/en
Publication of GB2620591A8 publication Critical patent/GB2620591A8/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/54Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/65Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor automatically by game devices or servers from real world data, e.g. measurement in live racing competition
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/80Special adaptations for executing a specific game genre or game mode
    • A63F13/803Driving vehicles or craft, e.g. cars, airplanes, ships, robots or tanks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6063Methods for processing data by generating or executing the game program for sound processing
    • A63F2300/6081Methods for processing data by generating or executing the game program for sound processing generating an output signal, e.g. under timing constraints, for spatialization
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Stereophonic System (AREA)

Abstract

A system for synthesis of a virtual environment. The system comprising a visual simulation subsystem and an audio simulation subsystem that create visual and audio simulations of a corresponding location in a virtual environment. the audio simulation subsystem including i. a storage pool populated with a multiplicity of sound samples, each sound sample being a recording made in a real-world environment, wherein each sample is associated with sample metadata that describes conditions existing in the real-world environment when the sample was recorded; and ii. A matching engine operative, at run-time of a simulation, to select a sound sample from the storage pool which has sample metadata substantially corresponding to conditions within the virtual environment at run-time; and iii. an audio output device that is operative at run-time to reproduce the selected sound sample. Also disclosed is a method of synthesising a virtual environment comprising visual and audio simulation subsystems.

Description

S TITLE: SYSTEM FOR AUDIO AND VIDEO SIMULATION
DESCRIPTION
The present invention relates to a system and method for audio and video simulation It has particular, but not exclusive, application to simulation of sound in a video game or augmented reality system.
Conventional implementation of car audio in video games uses a synthesis technique to approximate the sound of a car engine using real-time modulation of audio loops to imitate how the car might sound that is being driven under specific operating conditions. Modulation of pitch and filtering of the audio loops can be used to emulate the sound of a car under various operating conditions, such as speed, throttle, cornering movement and selected gear. Although such systems can generate sounds that are quite realistic when the intention it to simulate sound perceived within the vehicle, they provide a less convincing portrayal of real-world audio that might be perceived from track-side viewpoints as cars drive past cameras positioned around a track. The main reasons for this shortcoming are the absence of the environmental effect on the sound of a vehicle as it moves within the real world and the lack of audible nuance that real car engines present The present applicant has appreciated the need for an improved audio simulation 30 technique that can be used to produce a simulation of the sound of a vehicle that is perceptually more realistic than has been possible with existing simulation systems. This invention has as an aim the creation of a highly realistic audio simulation of a vehicle, such as a racing car, passing by an observer.
In accordance with a first aspect of the present invention, there is provided a system for synthesis of a virtual environment, the system comprising: a. a visual simulation subsystem (e.g. 3D visual simulation subsystem) operative to create a visual simulation of a simulated location within the virtual environment; b. an audio simulation subsystem operative to create an audio simulation of the virtual environment that corresponds to the visual simulation produced by the visual simulation subsystem, the audio simulation subsystem including: i. a storage pool populated with a multiplicity of sound samples, each sound sample being a recording made in a real-world environment, wherein each sample is associated with sample metadata that describes conditions existing in the real-world environment when the sample was recorded (e.g. metadata relating to a plurality of attributes of the real-world environment); and a matching engine operative, at nn-time of a simulation, to select a sound sample from the storage pool which has sample metadata substantially corresponding to conditions within the virtual environment at run-time; and an audio output device that is operative at run-time to reproduce the selected sound sample.
In this way, a system is provided in which realistic audio may be reproduced in conjunction with a 3D visual simulation of the virtual environment using sound samples collected using a novel audio simulation pipeline process.
Using sound samples that are derived from recordings that were captured in a real-world environment corresponding to that being simulated allows the system to generate audio output that will sound convincing to a user, especially in cases where the quality of sound is influenced significantly by the environment in which it is produced and perceived. By attributing metadata to the samples and selecting a sound sample for reproduction based on the metadata, sound samples with suitably close metadata characteristics may be chosen whereby the total number of samples required to cover a full range of situations in the simulated virtual world may be minimised.
Embodiments of the invention may provide video and audio output for a video game. For example, it may provide video and audio simulation of a vehicle (e.g. car) being driven on a track (e.g. race track). Embodiments may have other applications for the simulation of video and audio that represents situations in which sounds are significantly shaped by the environment that is being simulated. Examples include environments in which objects are moving rapidly and/or moving within a crowded space, such as in a roller-coaster, a theme park or a zoo.
In one embodiment, the matching engine analyses data representative of the condition of operation of the (e.g. visual) simulation in making its selection of a sound sample from the pool.
In one embodiment, the system is operative to synthesise audio and video of an object 10 moving in the virtual environment.
In one embodiment, the system is operative to synthesise audio and video of the object from a point of view external to the object.
In one embodiment, the sound samples are recordings of a real-world object made in a real-world environment.
In one embodiment, the associated metadata includes metadata describing one or more attribute of the real-world object existing in the real-world environment when the sample was recorded.
In one embodiment, the matching engine analyses data representative of at least one attribute of the object in making its selection of a sound sample from the storage pool.
The sample metadata may include elements specifying one or more attribute of the object (e.g. one or more fixed attribute of the object and/or one or more dynamically variable attribute object).
In one embodiment, the selection of the sound sample from the storage pool is based on at least one attribute of the object (e.g. attribute of the object at run-time as it passes the 25 simulated location).
In one embodiment, the object is a vehicle (e.g. car or other racing vehicle).
In one embodiment, the sample metadata includes elements specifying one or more of a) a driving intensity value (e.g. based on throttle position and/or gear selection); b) vehicle manufacturer; c) vehicle model; d) selected gear; e) acceleration behaviour; and f) 30 time position in the sound sample of the closest approach of the vehicle to the point of view In one embodiment, the virtual environment includes a plurality of camera positions. In one embodiment, each camera position corresponds to a different simulated location within the virtual environment.
In one embodiment, each camera position is a point of view external to the object.
In one embodiment, camera metadata is associated with each camera position.
In one embodiment, the camera metadata include elements specifying one or more of a) a trigger position; b) a camera pass-by position; and c) behaviour of the object (e.g. 5 driving intensity value and/or acceleration behaviour of a vehicle) passing the camera position.
In one embodiment, the matching engine is operative (e.g. in advance of run-time) to compare sample metadata with the camera metadata for each camera position and select a plurality of n sound samples which have sample metadata that differs from the camera 10 metadata by less than a threshold value to form a sample list for that camera position.
In one embodiment, the matching engine is operative at run-time to compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select an appropriate sound sample from the sample list of n sound samples for that camera position.
In one embodiment, the matching engine is operative at run-time to compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select an appropriate sound sample (e.g. single sound sample) from the sample list of ii sound samples for that camera position (e.g. to pass to the audio output device).
In another embodiment, the matching engine is operative at run-time to: i) compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select a subset of appropriate sound samples in from the sample list of il sound samples for that camera position; and ii) subsequently select an appropriate sound sample (e.g. single sound sample) from the sample list of in sound samples for that camera position (e.g. to pass to the audio output device). In one embodiment, the sample list of in sound samples is stored for use each time the camera position is used by the visual simulation subsystem.
In one embodiment, selection of a particular sound sample from the list (e.g. sample list of n sound samples or the reduced sample list of in sound samples) is made at random or in sequence (e.g. whereby the same sound sample is not selected for each simulation of a particular simulated location).
In one embodiment, metadata in the storage pool is derived programmatically from the sound samples.
In one embodiment, the system provides video and audio output for a video game. In one embodiment, the system provides video and audio simulation of a vehicle (e.g. 5 car) being driven on a track (e.g. race track).
In accordance with a second aspect of the present invention, there is provided a method of synthesising a virtual environment, the method comprising: providing a simulation system comprising: a a visual simulation subsystem (e.g. 3D visual simulation subsystem) operative to create a visual simulation of a simulated location within the virtual environment; b. an audio simulation subsystem operative to create an audio simulation of the virtual environment that corresponds to the visual simulation produced by the visual simulation subsystem, the audio simulation subsystem including: i) a storage pool populated with a multiplicity of sound samples, each sound sample being a recording made in a real-world environment, wherein each sample is associated with sample metadata that describes conditions existing in the real-world environment when the sample was recorded (e.g. metadata relating to a plurality of attributes of the real-world environment) ii) a matching engine; and iii) an audio output device; wherein at run-time of a simulation the method comprises the steps of i, at the matching engine, selecting a sound sample from the storage pool which has sample metadata substantially corresponding to conditions within the virtual environment; and at the audio output device, reproducing the selected sound sample.
In one embodiment, the matching engine analyses data representative of the condition 25 of operation of the (e.g. visual) simulation in making its selection of a sound sample from the pool.
In one embodiment, the simulation system is operative to synthesise audio and video of an object moving in the virtual environment.
In one embodiment, the simulation system is operative to synthesise audio and video 30 of the object from a point of view external to the object.
In one embodiment, the sound samples are recordings of a real-world object made in a real-world environment.
In one embodiment, the associated metadata includes metadata describing one or more attribute of the real-world object existing in the real-world environment when the sample was recorded.
In one embodiment, the matching engine analyses data representative of at least one 5 attribute of the object in making its selection of a sound sample from the storage pool. The sample metadata may include elements specifying one or more attribute of the object (e.g. one or more fixed attribute of the object and/or one or more dynamically variable attribute object).
In one embodiment, the selection of the sound sample from the storage pool is based 10 on at least one attribute of the object (e.g. attribute of the object at run-time as it passes the simulated location).
In one embodiment, the object is a vehicle (e.g. car or other racing vehicle).
In one embodiment, the sample metadata includes elements specifying one or more of a) a driving intensity value (e.g. based on throttle position and/or gear selection); b) 15 vehicle manufacturer; c) vehicle model; d) selected gear; e) acceleration behaviour; and 0 time position in the sound sample of the closest approach of the vehicle to the point of view In one embodiment, the virtual environment includes a plurality of camera positions. In one embodiment, each camera position corresponds to a different simulated location within the virtual environment.
In one embodiment, each camera position is a point of view external to the object.
In one embodiment, camera metadata is associated with each camera position. In one embodiment, the camera metadata include elements specifying one or more of a) a trigger position; b) a camera pass-by position; and c) behaviour of the object (e.g. driving intensity value and/or acceleration behaviour of a vehicle) passing the camera 25 position.
In one embodiment, the matching engine is operative (e.g. in advance of run-time) to compare sample metadata with the camera metadata for each camera and select a plurality of n sound samples which have sample metadata that differs from the camera metadata by less than a threshold value to form a sample list for that camera position.
In one embodiment, the matching engine is operative at run-time to compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select an appropriate sound sample from the sample list of n sound samples for that camera position.
In one embodiment, the matching engine is operative at run-time to compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select an appropriate sound sample (e.g. single sound sample) from the sample list of 11 sound samples for that camera position (e.g. to pass to the audio output device).
In another embodiment, the matching engine is operative at run-time to: i) compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select a subset of appropriate sound samples in from the sample list of n sound samples for that camera position; and ii) subsequently select an appropriate sound sample (e.g. single sound sample) from the sample list of In sound samples for that camera position (e.g. to pass to the audio output device). In one embodiment, the sample list of in sound samples is stored for use each time the camera position is used by the visual simulation subsystem.
In one embodiment, selection of a particular sound sample from the list (e.g. sample list of 11 sound samples or the reduced sample list of in sound samples) is made at random or in sequence (e.g. whereby the same sound sample is not selected for each simulation of a 20 particular simulated location).
In one embodiment, metadata in the storage pool is derived programmatically from the sound samples.
In one embodiment, the simulation system provides video and audio output for a video game.
In one embodiment, the simulation system provides video and audio simulation of a vehicle (e.g. car) being driven on a track (e.g. race track).
In accordance with a third aspect of the present invention, there is provided a computer program which, when executed by a processor, causes the processor to carry out a method in accordance with any embodiment of the second aspect of the invention.
In accordance with a fourth aspect of the present invention, there is provided a computer program product carrying instructions which, when executed by a processor, causes the processor to carry out a method in accordance with any embodiment of the second aspect of the invention.
The computer program product may be provided on any suitable carrier medium.
In one embodiment, the computer program product comprises a tangible carrier medium (e.g. a computer-readable storage medium).
In another embodiment, the computer program product comprises a transient carrier 5 medium (e.g. data stream).
An embodiment of the invention will now be described in detail, with reference to the accompanying drawings, in which: Figure 1 is a diagram that shows a racing car passing along a section of track with two gear changes; Figure 2 is a diagram that shows a racing car passing along a section of track with two phases of acceleration; Figure 3 shows the capture of the sound of a racing car passing along the section of track of Figures 1 and 2; Figure 4 shows the triggering of capture of the sound of a racing car passing along 15 the section of track of Figures 1 and 2; Figure 5 is an overview of processing steps used in the embodiment of the invention; and Figure 6 is a block diagram of a system embodying the invention.
The illustrated embodiment is intended to provide sound in a video game that simulates motor racing. The video game is implemented in software that executes on hardware capable of producing a video and audio output in response to inputs received from a user through input controls. A video subsystem of interacting software and hardware generates the video output and an audio subsystem of interacting software and hardware generates the audio output.
During development of the system a pool of real-world sound samples and associated metadata is created 2, and the samples are then matched to virtual camera positions based on the metadata and thresholds 4. Then at run time 6 of the simulation, samples are selected and synchronised with the video output of the simulation.
During execution, the video game simultaneously generates a video and audio 30 representation of a simulated virtual environment. The video and audio simulations are synchronised, such that the audio representation corresponds to the sound that might be produced by the objects represented in the video representation.
The virtual environment includes a race track 10, adjacent to which several camera positions 12 are defined. As with a real-world camera, each camera position 12 is a point of view from which the visual world is observed visually and audibly, so the game must be able to simulate a racing car 14 passing a camera position 12 in both video and audio.
The sound that the car will make is dependent on the type of car (i.e. fixed parameters of the object) and on several operating (i.e. dynamic) parameters, such as gear selection, applied power, whether the car is accelerating or braking and whether the car is approaching or receding from the camera position 12. For example, as the car 14 negotiates an S-bend, as shown in Figure 1, it may have different gears selected on entry to the bend, mid-way through the bend and on exit from the bend, these being referred to as the start gear, mid gear and end gear. In this example, gear changes occur at positions on the track indicated at 20 and 22 which are, respectively, before and after the camera position 12. As shown in Figure 2, the car 14 may be braking (negative acceleration) until it reaches a mid-point in the bend closest to the camera position 12, at a pass-by point 24 at which power is applied and acceleration becomes positive.
As shown in Figure 3, the position of the car 14 relative to the camera position 12 also influences its sound, the sound changing rapidly as the car 14 passes the camera position 12. For example, noise from the car's exhaust will be louder as the car 14 recedes from the camera position, even if its operating parameters remain unchanged. This change in sound occurs around a camera pass-by point 24 at which the car 14 is closest to the camera 12.
In implementing an embodiment of the invention, an initial step is to create a large sample pool 30 of sound samples and associated metadata.
The first step in the creation of the sample pool 30 is to capture sound samples which are created by making recordings of real-world racing cars being driven on a track. Most preferably, the recordings are made of the actual cars operating on the track that will be simulated in the video game. These recordings sound highly plausible since their source is real race cars driving around a track during races and practice sessions. The microphones used for these recordings are positioned at the side of the track in positions that correspond to the camera positions 12 used in final simulation. The recordings capture all of the environmental effects on the sound and the nuances of real-world race car engines.
For each camera position 12, the aim is to create a set of sound samples that capture the sound of a car approaching the camera position 12, the car passing the camera position, and departure of the car 14 from the camera position 12. Each sound sample should ideally contain the sound of just one car, and the sound of other cars or other noise sources should be minimised. The sound samples can be processed to minimise noise and audio from other sources using spectral manipulation and nose reduction techniques familiar to those skilled in the technical field.
The recorded and processed sound samples are stored in a sample database 32. In step 34, developers then edit and process the real-world recordings 32 to create the sample pool 30. During such processing, each real-world sound sample is assessed by a developer and described with metadata based on the audible properties of the sound sample (referred to as sample metadata) In this embodiment, the sample metadata includes the following elements: 1. Driving intensity category -race, coasting slow or coasting fast.
In the real world, racing cars are driven at different intensities based on the throttle applied and gear chosen by the driver. Different driving intensities are employed based on racing conditions such as normal racing conditions, during qualifying sessions, during racing incidents (avoiding other cars etc.), under safety car conditions and during practice sessions. Each driving intensity has its own characteristic sound.
2. Car manufacturer -which team did the car recorded belong to. Each racing team's car has a unique sound.
3. Start gear (e.g., gear 1-8).
Describes the gearing used by driver at the start of the recording. Each gear has its own characteristic sound. Phases of the recording correspond to use of different gears, as shown in Figure 1, and each gear has its own characteristic sound.
4. Mid gear.
This element describes the gearing used by driver as the car passes the microphone in the recording.
5. End gear.
Describes the gearing used by driver at the end of the recording.
6. Acceleration behaviour at start of sound sample (either acceleration or deceleration).
Describes the acceleration behaviour of the driver in the first half of the recording, as illustrated in Figure 2.
7. Acceleration behaviour at end of sound sample (either acceleration or deceleration). Describes the acceleration behaviour of the driver in the second half of the recording.
8. Camera pass-by time At which point in the sound sample where the car was closest to the camera as illustrated in Figure 3 As developers assess each sound sample and write metadata, they use the following tools: a. recorded audio and the associated video footage; b. their knowledge of motor sport and racing car audio; c. publicly available speed gear charts for each circuit; and d. on-board broadcast footage which includes speed and gear telemetry.
First, the manufacturer of the car is ascertained from the video footage and added to metadata. Secondly, the track and track section are noted from the video and are used to estimate the mid gear field in metadata by looking up the corner on broadcast footage and publicly available data sets for speed and gearing (these are available from several official and fan-based sources). Third, start and end gear metadata are derived from the mid gear by assessing the audio contained in the sound sample. Fourth, start and end acceleration behaviours are determined by assessing the recording. Finally, the camera pass-by time is determined by assessing the audio contained in the sound sample.
Next, each camera 12 on every track 10 is assessed by a developer at step 40 and described with metadata 42 (referred to as camera metadata) based on the track properties 44 20 of the section that the camera overlooks. In this embodiment, the camera metadata includes the following elements: a. Track trigger position.
The point 26 along the virtual track at which when the car passes it, a sound sample will be played during a simulation.
b. Camera pass-by track position.
The point along the track where the car will be closest to the camera.
c. Start gear (e.g., gear 1-8).
The typical gear chosen by the driver at the start of a section of track overlooked by the camera.
d. Mid gear.
The typical gear chosen by the driver at the section of track closest to the camera.
e. End gear.
U
The typical gear chosen by the driver at the end of a section of track overlooked by the camera.
f. Start acceleration behaviour (acceleration or deceleration).
The acceleration behaviour in the first half of the track section overlooked by the camera.
g End acceleration behaviour (acceleration or deceleration).
The acceleration behaviour in the second half of the track section overlooked by the camera.
As developers assess each sound sample and write camera metadata, they use the 10 following tools: a. 1n-simulation track and camera data (such as geometry, position, camera director properties, etc.) b. Their knowledge of motor sport and racing car audio.
c. Publicly available speed gear charts for each circuit.
d. On-board broadcast footage which includes speed and gear telemetry.
The process of assessment for each camera 12 will now be described.
First, an automated process fills out the initial metadata for each camera based on in-simulation track and camera data. Second, in a similar fashion the sample metadata process, the start gear, mid gear and end gear and acceleration behaviour is determined using a combination of publicly available gear data sets and on-board broadcast footage with car telemetry. Third, the trigger and crossover metadata is tuned by developers assessing the resulting audio and visuals in-simulation.
It is possible to use sound samples during a simulation that were captured at real-world camera positions other than those actually present in the simulation. This avoids the need to obtain sound samples from every real-world camera position that might be simulated, and also allows simulation of camera positions that do not exist in the real world. This capability is enabled through a process of sample-to-camera matching process at step 50 in a matching engine.
Once the creation of metadata is complete, a process of sample-to-camera matching is performed. The aim of sample-to-camera matching is, for each camera position, to create a list of sound samples that have sample metadata that most closely matches the camera metadata, these being the sound samples that have sound properties that are likely to sound most like those that would be perceived at the real-world camera position.
Both the sample metadata and the camera metadata are used in combination during sample-to-camera matching to determine which sound samples from the sample pool closely match each camera. For each camera on a particular simulated track, every sound sample is tested for compatibility. The camera metadata and sample metadata are compared, and a compatibility rating is derived based on how many of the metadata fields match. Once all sound samples have been tested, the most compatible n samples are chosen (where 11 is a parameter set by developers to meet various technical and aesthetic requirements) and those n samples are added to the list for the list for that camera.
The sample-to-camera matching process 50 permits thresholds to be set that allow matches to be made when metadata does not exactly match (threshold values are set by developers to meet various technical and aesthetic requirements). For example, a threshold of 1 on the start gear element would allow a sound sample with start gear of 3 or 5 to be matched with a camera with a start gear element value of 4. The matching process must ensure that at least 11 camera samples 52 are matched for each manufacturer for each camera, so if there not enough sound samples are of close enough match based on initial thresholds, an increasingly large threshold is used. This means that there is always a sound sample for every in-simulation scenario. Again, fallback thresholds are set by developers to meet various technical and aesthetic requirements.
A number of reports are generated to help developers understand the output of the 20 pipeline and so that they can set suitable parameters. This is particularly important when the sample pool is large and there are many tracks and cameras.
The camera samples 52 for each camera are built at step 54 into sound banks 56 using an audio engine. This step makes use of the authoring application programming interface 58 to create data structures 60 that from which sound banks 56 can be created programmatically.
The following steps take place at run-time of the game. As a game is nm, the video game generates video and audio output that is responsive to inputs received through input controls. The outputs simulate one or more cars 12 driving around a virtual race track 10. Game data 62 is generated that represents the instant state of the game. As cars drive round the track at run-time, game data is used to determine when a car 14 is approaching a camera 12.
Additional context (car manufacturer, speed, gear, position and race conditions) is then used to select a subset of m samples from the list of 11 audio samples that were associated with the camera position during development. The subset of m samples has metadata that matches the instantaneous state of the game. Any one of these m samples can be selected at random
N
for reproduction. This random selection provides two benefits: it allows the system to match the manufacturer of the car in the simulation to the car manufacturer recorded in the sample (each car manufacturer sounds different); and provides sample variety. This means the player will not hear exactly the same sample every time a specific car 14 passes a specific camera 12. Avoiding this kind of repetition avoids audio fatigue and is key to authentic sounding gamc-audio.
The sound sample is played back on an audio emitter positioned in the virtual, simulated environment on the car using an audio engine 66. To ensure tight synchronisation between in-simulation visual and the audio playback, the exact time the car 14 will pass the camera position 12 is calculated at step 68 and used to adjust the audio playback timing. This uses an algorithm that takes car speed and gear and acceleration behaviour from camera metadata to estimate when the car will pass-by the nearest point 24 of the track 10 to the camera 12. Initiation of the audio playback will take place at a trigger position 26 that is in advance of the by-pass position 22.
The invention has been described above with reference to a racing car simulation, but it could find alternative applications. Embodiments could be adapted to generate videogame audio for other scenarios. Likely use-cases would have some of the following properties: a large sample pool of sound samples (either from broadcast, existing sample packs or unique recordings); b audio representation of real-world scenario that is critically shaped by environment; c audio representation of highly nuanced real-world scenario; d some specific alternate scenarios include: a, roller-coaster audio; b. theme park crowd audio; c, zoo animal audio.
In any case, specific metadata must be defined for samples and in-simulation context. Run-time synchronisation will be dependent on the specific in-simulation context.

Claims (21)

  1. Claims I. A system for synthesis of a virtual environment, the system comprising: a. a visual simulation subsystem operative to create a visual simulation of a simulated location within the virtual environment; b. an audio simulation subsystem operative to create an audio simulation of the virtual environment that corresponds to the visual simulation produced by the visual simulation subsystem, the audio simulation subsystem including: i. a storage pool populated with a multiplicity of sound samples, each sound sample being a recording made in a real-world environment, wherein each sample is associated with sample metadata that describes conditions existing in the real-world environment when the sample was recorded; and i. a matching engine operative, at run-time of a simulation, to select a sound sample from the storage pool which has sample metadata substantially corresponding to conditions within the virtual environment at run-time; and iii. an audio output device that is operative at nn-time to reproduce the selected sound sample.
  2. 2. A system according to claim 1, wherein the matching engine analyses data representative of the condition of operation of the simulation in making its selection of a sound sample from the storage pool.
  3. 3. A system according to claim 2, wherein the system is operative to synthesise audio and video of an object moving in the virtual environment.
  4. 4. A system according to claim 3, wherein the system is operative to synthesise audio and video of the object from a point of view external to the object.
  5. 5. A system according to claim 3 or claim 4, wherein the sound samples are recordings of a real-world object made in a real-world environment and the associated metadata includes metadata describing one or more attribute of the real-world object existing in the real-world 30 environment when the sample was recorded
  6. 6. A system according to any of claims 2-5, wherein the matching engine analyses data representative of at least one attribute of the object in making its selection of a sound sample from the storage pool.
  7. 7. A system according to claim 6, wherein the sample metadata includes elements specifying one or more attribute of the object and selection of the sound sample from the storage pool is based on at least one attribute of the object as it passes the simulated location.
  8. 8. A system according to claim 7 in which the object is a vehicle and the sample metadata includes elements specifying one or more of a) a driving intensity value; b) vehicle manufacturer; c) vehicle model; d) selected gear; e) acceleration behaviour; and 0 time position in the sound sample of the closest approach of the vehicle to the point of view
  9. 9. A system according to any of the preceding claims, wherein the virtual environment includes a plurality of camera positions each corresponding to a different simulated location 10 within the virtual environment.
  10. 10. A system according to claim 9 when dependent upon claim 3, wherein each of the camera positions is a point of view external to the object.
  11. 11. A system according to claim 9 or claim 10, wherein camera metadata is associated with each camera position.
  12. 12. A system according to claim 11, wherein the camera metadata includes elements speciiVing one or more of: a) a trigger position; b) a camera pass-by position; and c) behaviour of the object passing the camera position.
  13. 13. A system according to claim 11 or claim 12, wherein the matching engine is operative to compare sample metadata with the camera metadata for each camera position and select a 20 plurality of ri sound samples which have sample metadata that differs from the camera metadata by less than a threshold value to form a sample list for that camera position.
  14. 14. A system according to claim 13 in which the matching engine is operative at run-time to compare sample metadata for the sample list associated with the camera position that is the point of view of the simulated location with the attributes of the object at the time the object is passing the location and select an appropriate sound sample from the sample list of ii sound samples for that camera position.
  15. 15. A system according to claim 14, wherein selection of a particular sound sample from the list is made at random or in sequence whereby the same sound sample is not selected for each simulation of a particular simulated location.
  16. 16. A system according to any preceding claim, wherein metadata in the storage pool is derived programmatically from the sound samples.
  17. 17. A system according to any preceding claim, wherein the system provides video and audio output for a video game.
  18. 18. A system according to any preceding that provides video and audio simulation of a vehicle being driven on a track.
  19. 19. A method of synthesising a virtual environment, the method comprising: providing a simulation system comprising: a. a visual simulation subsystem operative to create a visual simulation of a simulated location within the virtual environment; b. an audio simulation subsystem operative to create an audio simulation of the virtual environment that corresponds to the visual simulation produced by the visual simulation subsystem, the audio simulation subsystem the audio simulation subsystem including: i) a storage pool populated with a multiplicity of sound samples, each sound sample being a recording made in a real-world environment, wherein each sample is associated with sample metadata that describes conditions existing in the real-world environment when the sample was recorded; ii) a matching engine; and iii) an audio output device; wherein at run-time of a simulation the method comprises the steps of i. at the matching engine, selecting a sound sample from the storage pool which has sample metadata substantially corresponding to conditions within the virtual environment; and H. at the audio output device, reproducing the selected sound sample.
  20. 20. A computer program which, when executed by a processor, causes the processor to carry out a method in accordance with claim 19.
  21. 21. A computer program product carrying instructions which, when executed by a processor, causes the processor to carry out a method in accordance with claim 19.
GB2210228.9A 2022-07-12 2022-07-12 System for audio and video simulation Pending GB2620591A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2210228.9A GB2620591A (en) 2022-07-12 2022-07-12 System for audio and video simulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2210228.9A GB2620591A (en) 2022-07-12 2022-07-12 System for audio and video simulation

Publications (3)

Publication Number Publication Date
GB202210228D0 GB202210228D0 (en) 2022-08-24
GB2620591A true GB2620591A (en) 2024-01-17
GB2620591A8 GB2620591A8 (en) 2024-03-20

Family

ID=84540012

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2210228.9A Pending GB2620591A (en) 2022-07-12 2022-07-12 System for audio and video simulation

Country Status (1)

Country Link
GB (1) GB2620591A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100197401A1 (en) * 2009-02-04 2010-08-05 Yaniv Altshuler Reliable, efficient and low cost method for games audio rendering
US20110081023A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Real-time sound propagation for dynamic sources
US20150321101A1 (en) * 2014-05-08 2015-11-12 High Fidelity, Inc. Systems and methods for implementing distributed computer-generated virtual environments using user contributed computing devices
WO2019193244A1 (en) * 2018-04-04 2019-10-10 Nokia Technologies Oy An apparatus, a method and a computer program for controlling playback of spatial audio
US20210289310A1 (en) * 2017-07-14 2021-09-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100197401A1 (en) * 2009-02-04 2010-08-05 Yaniv Altshuler Reliable, efficient and low cost method for games audio rendering
US20110081023A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Real-time sound propagation for dynamic sources
US20150321101A1 (en) * 2014-05-08 2015-11-12 High Fidelity, Inc. Systems and methods for implementing distributed computer-generated virtual environments using user contributed computing devices
US20210289310A1 (en) * 2017-07-14 2021-09-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description
WO2019193244A1 (en) * 2018-04-04 2019-10-10 Nokia Technologies Oy An apparatus, a method and a computer program for controlling playback of spatial audio

Also Published As

Publication number Publication date
GB2620591A8 (en) 2024-03-20
GB202210228D0 (en) 2022-08-24

Similar Documents

Publication Publication Date Title
US6959094B1 (en) Apparatus and methods for synthesis of internal combustion engine vehicle sounds
CN1816375B (en) Personalized behavior of computer controlled avatars in a virtual reality environment
KR20210110620A (en) Interaction methods, devices, electronic devices and storage media
JP6270330B2 (en) Engine sound output device and engine sound output method
Farnell Behaviour, Structure and Causality in Procedural Audio1
Böttcher Current problems and future possibilities of procedural audio in computer games
Ghose et al. Foleygan: Visually guided generative adversarial network-based synchronous sound generation in silent videos
Cristani et al. Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios
CN104919813A (en) Computationally generating turn-based game cinematics
GB2620591A (en) System for audio and video simulation
CN111564064A (en) Intelligent education system and method based on game interaction
Bogaers et al. Music-driven animation generation of expressive musical gestures
KR20100074225A (en) Interactive sound synthesis
Böttcher et al. Procedural audio in computer games using motion controllers: an evaluation on the effect and perception
US10515563B2 (en) Apparatus and method for providing realistic education media
US11948599B2 (en) Audio event detection with window-based prediction
Petersen et al. Suggestive Sound Design-How to use Active Interior Sound Design to improve traffic safety
Jin et al. Embedded systems feel the beat in new orleans: Highlights from the ieee signal processing cup 2017 student competition [sp competitions]
Krejsa et al. A novel lip synchronization approach for games and virtual environments
US20120215507A1 (en) Systems and methods for automated assessment within a virtual environment
Abdelnour et al. From visual to acoustic question answering
Weinel Quake Delirium revisited: system for video game ASC simulations
Af Malmborg Evaluation of Car Engine Sound Design Methods in Video Games
CN110753960A (en) Sound pressure signal output device, sound pressure signal output method, and sound pressure signal output program
Allman-Ward et al. Designing and delivering the right sound for quiet vehicles