US11950085B2 - Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description - Google Patents
Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description Download PDFInfo
- Publication number
- US11950085B2 US11950085B2 US17/898,016 US202217898016A US11950085B2 US 11950085 B2 US11950085 B2 US 11950085B2 US 202217898016 A US202217898016 A US 202217898016A US 11950085 B2 US11950085 B2 US 11950085B2
- Authority
- US
- United States
- Prior art keywords
- sound field
- field description
- reference location
- group
- objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000013598 vector Substances 0.000 claims abstract description 74
- 238000013519 translation Methods 0.000 claims description 94
- 238000000034 method Methods 0.000 claims description 64
- 238000004458 analytical method Methods 0.000 claims description 47
- 238000012545 processing Methods 0.000 claims description 25
- 238000004364 calculation method Methods 0.000 claims description 20
- 238000000926 separation method Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 description 20
- 238000003786 synthesis reaction Methods 0.000 description 20
- 238000004091 panning Methods 0.000 description 13
- 230000005236 sound signal Effects 0.000 description 13
- 230000009466 transformation Effects 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 230000004048 modification Effects 0.000 description 8
- 238000009877 rendering Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 238000001228 spectrum Methods 0.000 description 5
- 230000003321 amplification Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000013707 sensory perception of sound Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 208000001992 Autosomal Dominant Optic Atrophy Diseases 0.000 description 1
- 206010011906 Death Diseases 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0346—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/02—Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
- H04S7/306—For headphones
Definitions
- the present invention is related to audio processing and, particularly, audio processing in relation to sound fields that are defined with respect to a reference location such as a microphone or a virtual microphone location.
- Ambisonics signals comprise a truncated spherical harmonic decomposition of the sound field.
- Ambisonics comes in different flavors.
- There is ‘traditional’ Ambisonics [31] which today is known as ‘First-Order Ambisonics’ (FOA) and comprises four signals (i.e., one omnidirectional signal and up to three figure-of-eight directional signals).
- More recent Ambisonics variants are known as ‘Higher-Order Ambisonics’ (HOA) and provide enhanced spatial resolution and larger listener sweet-spot area at the expense of carrying more signals.
- a fully defined N-th order HOA representation consists of (N+1) 2 signals.
- the Directional Audio Coding (DirAC) representation has been conceived to represent a FOA or HOA sound scene in a more compact, parametric style. More specifically, the spatial sound scene is represented by one (or more) transmitted audio channels which represent a downmix of the acoustic scene and associated side information of the direction and diffuseness in each time-frequency (TF) bin. More information on DirAC can be found in [32, 33].
- DirAC [32] can be used with different microphone systems and with arbitrary loudspeaker setups.
- the purpose of the DirAC system is to reproduce the spatial impression of an existing acoustical environment as precisely as possible using a multichannel/3D loudspeaker system.
- responses continuous sound or impulse responses
- W omnidirectional microphone
- a common method is to apply three figure-of-eight microphones (X,Y,Z) aligned with the corresponding Cartesian coordinate axes [34].
- a way to do this is to use a Sound field microphone, which directly yields all the desired responses.
- the W, X, Y, and Z signals can also be computed from a set of discrete omnidirectional microphones.
- the sound signal is first divided into frequency channels.
- the sound direction and diffuseness is measured depending on time at each frequency channel.
- one or more audio channels are sent, together with analyzed direction and diffuseness data.
- the audio which is applied to the loudspeakers can be for example the omnidirectional channel W, or the sound for each loudspeaker can be computed as a weighed sum of W, X, Y, and Z, which forms a signal which has a certain directional characteristics for each loudspeaker.
- Each audio channel is divided into frequency channels, which are then divided optionally to diffuse and to non-diffuse streams depending on analyzed diffuseness.
- a diffuse stream is reproduced with a technique, which produces a diffuse perception of a sound scene, e.g., the decorrelation techniques used in Binaural Cue Coding [35-37].
- Non-diffuse sound is reproduced with a technique which aims to produce a point-like virtual source according to the direction data (e.g. VBAP [38]).
- a single Ambisonics signal is computed using: 1) simulating HOA playback and listener movement within a virtual loudspeaker array, 2) computing and translating along plane-waves, and 3) re-expanding the sound field about the listener.
- Binauralization is a simulation of how the human head, ears, and upper torso change the sound of a source depending on its direction and distance. This is achieved by convolution of the signals with head-related transfer functions (HRTFs) for their relative direction [1, 2]. Binauralization also makes the sound appear to be coming from the scene rather than from inside the head [3].
- HRTFs head-related transfer functions
- the user is either wearing an HMD or holding a tablet or phone in his hands. By moving her/his head or the device, the user can look around in any direction.
- This is a three-degrees-of-freedom (3DoF) scenario, as the user has three movement degrees (pitch, yaw, roll).
- 3DoF three-degrees-of-freedom
- Audio is often recorded with a spatial microphone [6], e.g., first-order Ambisonics (FOA), close to the video camera.
- FOA first-order Ambisonics
- the user's head rotation is adapted in a straightforward manner [7].
- the audio is then for example rendered to virtual loudspeakers placed around the user. These virtual loudspeaker signals are then binauralized.
- Modern VR applications allow for six-degrees-of-freedom (6DoF). Additionally to the head rotation, the user can move around resulting in translation of her/his position in three spatial dimensions.
- the 6DoF reproduction is limited by the overall size of the walking area. In many cases, this area is rather small, e.g., a conventional living room. 6DoF is commonly encountered in VR games.
- the whole scene is synthetic with computer-generated imagery (CGI).
- CGI computer-generated imagery
- the audio is often generated using object-based rendering where each audio object is rendered with distance-dependent gain and relative direction from the user based on the tracking data. Realism can be enhanced by artificial reverberation and diffraction [8, 9, 10].
- Directional audio coding (DirAC) [16] is a popular method to transform the recording into a representation that consists of an audio spectrum and parametric side information on the sound direction and diffuseness. It is used for acoustic zoom [11] and virtual microphone [14] applications.
- the method proposed here enables 6DoF reproduction from the recording of a single FOA microphone. Recordings from a single spatial position have been used for 3DoF reproduction or acoustic zoom. But, to the inventors' knowledge, no method for interactive, fully 6DoF reproduction from such data has been proposed so far.
- Ambisonics sound field representations (be it as regular FOA or HOA Ambisonics or as DirAC-style parametric sound field representation) provide sufficient information to allow a translational shift of the listener's position as it may be used for 6DoF applications since neither object distance nor absolute object positions in the sound scene are determined in these formats. It should be noted that the shift in the listener's position can be translated into an equivalent shift of the sound scene in the opposite direction.
- FIG. 1 b A typical problem when moving in 6DoF is illustrated in FIG. 1 b.
- the sound scene is described at Position A using Ambisonics.
- sounds from Source A and Source B arrive from the same direction, i.e., they have the same direction-of-arrival (DOA).
- DOA direction-of-arrival
- the DOA of Source A and Source B are different.
- an apparatus for generating a modified sound field description may have: an interface configured for receiving a sound field description and meta data relating to spatial information of the sound field description; and a sound field calculator configured for calculating the modified sound field description using the spatial information, the sound field description and a translation information indicating a translation of a reference location to a different reference location.
- a method of generating a modified sound field description may have the steps of: receiving a sound field description and meta data relating to spatial information of the sound field description; and calculating the modified sound field description using the spatial information, the sound field description and a translation information indicating a translation from a reference location to a different reference location.
- Another embodiment may have a non-transitory storage medium having stored thereon a computer program for performing, when running on a computer or a processor, a method of generating a modified sound field description having the steps of: receiving a sound field description and meta data relating to spatial information of the sound field description; and calculating the modified sound field description using the spatial information, the sound field description and a translation information indicating a translation from a reference location to a different reference location.
- the present invention is based on the finding that typical sound field descriptions that are related to a reference location need additional information so that these sound field descriptions can be processed so that a modified sound field description that is not related to the original reference location but to another reference location can be calculated.
- meta data relating to spatial information of this sound field is generated and the meta data together with the sound field description corresponds to the enhanced sound field description that can, for example, be transmitted or stored.
- the modified sound field is calculated using this spatial information, the sound field description and a translation information indicating a translation from a reference location to a different reference location.
- the enhanced sound field description consisting of a sound field description and meta data relating to spatial information of this sound field underlying the sound field description is processed to obtain a modified sound field description that is related to a different reference location defined by additional translation information that can, for example, be provided or used at a decoder-side.
- the present invention is not only related to an encoder/decoder scenario, but can also be applied in an application where both, the generation of the enhanced sound field description and the generation of the modified sound field description take place on basically one and the same location.
- the modified sound field description may, for example, be a description of the modified sound field itself or actually the modified sound field in channel signals, binaural signals or, once again, a reference location-related sound field that, however, is now related to the new or different reference location rather than the original reference location.
- Such an application would, for example, be in a virtual reality scenario where a sound field description together with a meta data exists and where a listener moves out from the reference location to which the sound field is given and moves to a different reference location and where, then, the sound field for the listener moving around in the virtual area is calculated to correspond to the sound field but now at the different reference location where the user has moved to.
- the enhanced sound field description has a first sound field description related the (first) reference location and a second sound field description related to a further (the second) reference location which is different from the (first) reference location, and the metadata has information on the reference location and the further reference location such as vectors pointing from a predetermined origin to these reference locations.
- the metadata can be a single vector pointing to either the reference location or the further reference location and a vector extending between the two reference locations, to which the two different sound field descriptions are related to.
- the sound field descriptions can be non-parametric sound field descriptions such as first-order Ambisonics or higher-order Ambisonics descriptions.
- the sound field descriptions can be DirAC descriptions or other parametric sound field descriptions, or one sound field description can, for example, be a parametric sound field description and the other sound field description can be, for example, a non-parametric sound field description.
- the sound field description may generate, for each sound field description, a DirAC description of the sound field having one or more downmix signals and individual direction data and optionally diffuseness data for different time-frequency bins.
- the metadata generator is configured to generate geometrical metadata for both sound field descriptions so that the reference location and the additional reference location can be identified from the metadata. Then, it will be possible to extract individual sources from both sound field descriptions and to perform an additional processing for the purpose of generating an enhanced or modified sound field description.
- Ambisonics has become one of the most commonly used formats for 3D audio in the context of virtual, augmented, and mixed reality applications.
- a wide variety of audio acquisition and production tools have been developed that generate an output signal in Ambisonics format.
- the Ambisonics format is converted to a binaural signal or channels for reproduction.
- the listener is usually able to interactively change his/her orientation in the presented scene to the extent that he/she can rotate his/her head in the sound scene enabling three-degrees-of-freedom (3DoF, i.e., pitch, yaw, and role) and still experience an appropriate sound quality.
- 3DoF three-degrees-of-freedom
- First-order Ambisonics (FOA) recordings can be processed and reproduced over headphones. They can be rotated to account for the listeners head orientation.
- virtual reality (VR) systems allow the listener to move in six-degrees-of-freedom (6DoF), i.e., three rotational plus three transitional degrees of freedom.
- 6DoF six-degrees-of-freedom
- a technique to facilitate 6DoF is described.
- a FOA recording is described using a parametric model, which is modified based on the listener's position and information about the distances to the sources. The method is evaluated by a listening test, comparing different binaural renderings of a synthetic sound scene in which the listener can move freely.
- the enhanced sound field description is output by an output interface for generating an output signal for transmission or storage, where the output signal comprises, for a time frame, one or more audio signals derived from the sound field and the spatial information for the time frame.
- the sound field generator is in further embodiments adaptive to derive direction data from the sound field, the direction data referring to a direction of arrival of sound for a time period or a frequency bin and the meta data generator is configured to derive the spatial information as data items associating a distance information to the direction data.
- an output interface is configured to generate the output signals so that the data items for the time frame are linked to the direction data for the different frequency bins.
- the sound field generator is also configured to generate a diffuseness information for a plurality of frequency bins of a time frame of the sound field, wherein the meta data generator is configured to only generate a distance information for a frequency bin being different from a predetermined value, or being different from infinity or to generate a distance value for the frequency bin at all, when the diffuseness value is lower than a predetermined or adaptive threshold.
- the meta data generator is configured to only generate a distance information for a frequency bin being different from a predetermined value, or being different from infinity or to generate a distance value for the frequency bin at all, when the diffuseness value is lower than a predetermined or adaptive threshold.
- embodiments comprise a translation interface for providing the translation information or rotation information indicating a rotation of an intended listener to the modified sound field, a meta data supplier for supplying the meta data to the sound field calculator and a sound field supplier for supplying the sound field description to the sound field calculator and, additionally, an output interface for outputting the modified sound field comprising the modified sound field description and modified meta data, the modified meta data being derived from the meta data using the translation information, or the output interface outputs a plurality of loudspeaker channels, each loudspeaker channel being related a predefined loudspeaker position, or the output interface outputs a binaural representation of the modified sound field.
- the sound field description comprises a plurality of sound field components.
- the plurality of sound field components comprise an omnidirectional component and at least one directional component.
- a sound field description is, for example, a first-order Ambisonics sound field description having an omnidirectional component and three directional components X, Y, Z or such a sound field is a higher-order Ambisonics description comprising the omnidirectional component, the three directional components with respect to the X, Y, and Z directions and, additionally, further directional components that relate to other directions than the X, Y, Z directions.
- the apparatus comprises an analyzer for analyzing the sound field components to derive, for different time or frequency bins, direction of arrival information.
- the apparatus additionally has a translation transformer for calculating modified DoA information per frequency or time bin using the DoA information and the meta data, where the meta data relate to a depth map associating a distance to a source included in both sound field descriptions as obtained by for example triangulation processing using two angles with respect to two different reference locations and the distance/positions or the reference locations. This may apply to a fullband representation or to different frequency bins of a time frame.
- the sound field calculator has a distance compensator for calculating the modified sound field using a distance compensation information depending from the distance calculated using the meta data being the same for each frequency or time bin of a source of being different for each or some of the time/frequency bins, and from a new distance associated with the time or frequency bin, the new distance being related to the modified DoA information.
- the sound field calculator calculates a first vector pointing from the reference location to a sound source obtained by an analysis of the sound field. Furthermore, the sound field calculator calculates a second vector pointing from the different reference location to the sound source and this calculation is done using the first vector and the translation information, where the translation information defines a translation vector from the reference location to the different reference location. And, then, a distance from the different reference location to the sound source is calculated using the second vector.
- the sound field calculator is configured to receive, in addition to the translation information, a rotation information indicating a rotation of the listener's head in one of the three rotation directions given by pitch, yaw and roll.
- the sound field calculator is then configured to perform the rotation transformation to rotate a modified direction of arrival data for a sound field using the rotation information, where the modified direction of arrival data is derived from a direction of arrival data obtained by a sound analysis of the sound field description and the translation information.
- the sound field calculator is configured to determine source signals from the sound field description and directions of the source signals related to the reference location by a sound analysis.
- new directions of the sound sources are calculated that are related to the different reference location and this is done using the meta data, and then distance information of the sound sources related to the different reference location is calculated and, then, the modified sound field is synthesized using the distance information and the new directions of the sound sources.
- a sound field synthesis is performed by panning the sound source signals to a direction given by the new direction information in relation to a reproduction setup, and a scaling of the sound source signals is done using the distance information before performing the panning operation or subsequent to performing the panning operation.
- a diffuse part of the sound source signal is added to a direct part of the sound source signal, the direct part being modified by the distance information before being added to the diffuse part.
- the sound source synthesis in a spectral representation where the new direction information is calculated for each frequency bin, where the distance information is calculated for each frequency bin, and where a direct synthesis for each frequency bin using the audio signal for the frequency bin is performed using an audio signal for the frequency bin, a panning gain for the frequency bin derived from the new direction information and a scaling factor for the frequency bin derived from the distance information for the frequency bin is performed.
- a diffuse synthesis is performed using a diffuse audio signal derived from the audio signal from the frequency bin and using a diffuseness parameter derived by the signal analysis for the frequency bin and, then, the direct signal and the diffuse signal are combined to obtain a synthesized audio signal for the time or frequency bin and, then, a frequency-time conversion is performed using audio signals for other time/frequency bins to obtain a time domain synthesized audio signal as the modified sound field.
- the sound field calculator is configured to synthesize, for each sound source, a sound field related to the different reference location by, for example, processing, for each source, a source signal using the new direction for the source signal to obtain a sound field description of the source signal related to the different/new reference location. Furthermore, the source signal is modified before processing the source signal or subsequent to processing the source signal using the direction information. And, finally, the sound field descriptions for the sources are added together to obtain the modified sound field related to the different reference location.
- the sound field calculator calculates the modified sound field using the spatial information on the first sound field description, using the spatial information on the second sound field description, and using the translation information indicating a translation of a reference location to a different reference location.
- the metadata may, for example, be a vector directed to the reference location of the sound field description and another vector directed from the same origin to the further reference location of the second sound field description.
- objects are generated by applying a source separation, or beamforming, or, generally, any kind of sound source analysis to the first and the second sound field description. Then, the direction of arrival information of all objects irrespective of whether these objects are broadband objects or objects for individual time/frequency bins are computed. Then, the objects extracted from the different sound field descriptions are matched with each other in order to find at least one matched object, i.e., an object occurring both in the first and the second sound field descriptions. This matching is performed, for example, by means of a correlation or coherence calculation using the object signals and/or direction of arrival information or other information.
- the result of the procedure is that there does exist, for a matched object, a first DoA information related to the reference location and the second DoA information related to the further reference location. Then, the positions of the matched objects and, particularly, the distance of the matched object to the reference location or the further reference location is calculated based on triangulation using the information on the reference location or the reference location included in the associated metadata.
- This information, and, particularly, the position information for the matched object is then used for modifying each matched object based on the estimated position and the desired position, i.e., after translation, using a distance compensation processing.
- the old DoA information from both reference locations and the translation information is used.
- this processing can be performed for both individual sound field descriptions, since each matched object occurs in both sound field descriptions.
- the sound field description having a reference location being closest to the new listener position subsequent to the translation is used.
- the new DoA is used for calculating a new sound field description for the matched object related to the different reference location, i.e., to which the user has moved. Then, and in order to also incorporate the non-matched objects, sound field descriptions for those objects are calculated as well but using the old DoA information. And, finally, the modified sound field is generated by adding all individual sound field descriptions together.
- Any change with orientation can be realized by applying a single rotation to the virtual Ambisonics signal.
- the metadata is not used for directly providing the distance of an object to a reference location. Instead, the metadata is provided for identifying the reference location of each of two or more sound field descriptions and the distance between a reference location and a certain matched object is calculated based on, for example, triangulation processing steps.
- FIG. 1 a is an embodiment of an apparatus for generating an enhanced sound field description
- FIG. 1 b is an illustration explaining an exemplary problem underlying the present invention
- FIG. 2 is an implementation of the apparatus for generating an enhanced sound field description
- FIG. 3 a illustrates the enhanced sound field description comprising audio data, and side information for audio data
- FIG. 3 b illustrates a further illustration of an enhanced sound field comprising audio data and meta data relating to spatial information such as geometrical information for each sound field description;
- FIG. 4 a illustrates an implementation of an apparatus for generating a modified sound field description
- FIG. 4 b illustrates a further implementation of an apparatus for generating a modified sound field description
- FIG. 4 c illustrates a scenario with a reference position/location A, a further reference position/location B, and a different reference location due to translation;
- FIG. 5 illustrates the 6DoF reproduction of spatial audio in a general sense
- FIG. 6 a illustrates an embodiment for the implementation of a sound field calculator
- FIG. 6 b illustrates an implementation for calculating a new DoA and a new distance of a sound source with respect to a new/different reference location
- FIG. 6 c illustrates an embodiment of a 6DoF reproduction comprising an apparatus for generating an enhanced sound field description, for example, for each individual sound field description and an apparatus for generating a modified sound field description for the matched sources;
- FIG. 7 illustrates an embodiment for selecting the one of the first and the second sound field descriptions for the calculation of a modified sound field for a broadband or narrow band object
- FIG. 8 illustrates an exemplary device for generating a sound field description from an audio signal such a mono-signal and direction of arrival data
- FIG. 9 illustrates a further embodiment for the sound field calculator
- FIG. 10 illustrates an implementation of the apparatus for generating a modified sound field description
- FIG. 11 illustrates a further implementation of an apparatus for generating a modified sound field description
- FIG. 12 a illustrates a known DirAC analysis implementation
- FIG. 12 b illustrates a known DirAC synthesis implementation.
- these representations may be extended in a way that provides the missing information for translational processing. It is noted that this extension could, e.g., 1) add the distance or positions of the objects to the existing scene representation, and/or 2) add information that would facilitate the process of separating the individual objects.
- the sound scene is described using two or more Ambisonics signals each describing the sound scene at a different position, or in other words from a different perspective. It is assumed that the relative positions are known.
- a modified Ambisonics signal at a desired position in the sound scene is generate from the input Ambisonics signals.
- a signal-based or parametric-based approach can be used to generate a virtual Ambisonics signal at the desired position.
- a virtual Ambisonics signal at a desired position is computed using the following steps in a signal-based translation embodiment:
- a virtual Ambisonics signal at a desired position is computed using the following steps in a parametric-based translation embodiment in accordance with a further embodiment:
- Generating multi-point Ambisonics signals is simple for computer-generated and produced content as well as in the context of natural recording via microphone arrays or spatial micro-phones (e.g., B-format microphone).
- one or more steps of both embodiments can also be used in the corresponding other embodiments.
- a change in orientation can be realized by applying a single rotation to the virtual Ambisonics signal.
- FIG. 1 a illustrates an apparatus for generating an enhanced sound field description comprising a sound field (description) generator 100 for generating at least one sound field description indicating a sound field with respect to at least one reference location. Furthermore, the apparatus comprises a meta data generator 110 for generating meta data relating to spatial information of the sound field. The meta data receives, as an input, the sound field or alternatively or additionally, separate information on sound sources.
- a sound field (description) generator 100 for generating at least one sound field description indicating a sound field with respect to at least one reference location.
- the apparatus comprises a meta data generator 110 for generating meta data relating to spatial information of the sound field.
- the meta data receives, as an input, the sound field or alternatively or additionally, separate information on sound sources.
- both, the output of the sound field description generator 100 and the meta data generator 110 constitute the enhanced sound field description.
- both, the output of the sound field description generator 100 and the meta data generator 110 can be combined within a combiner 120 or output interface 120 to obtain the enhanced sound field description that includes the spatial meta data or spatial information of the sound field as generated by the meta data generator 110 .
- FIG. 1 b illustrates the situation that is addressed by the present invention.
- the position A for example, is the at least one reference location and a sound field is generated by source A and source B and a certain actual or, for example, virtual microphone being located at the position A detects the sound from source A and source B.
- the sound is a superposition of the sound coming from the emitting sound sources. This represents the sound field description as generated by the sound field description generator.
- meta data generator would, by certain implementations derive a spatial information with respect to source A and another spatial information with respect to source B such as the distances of these sources to the reference position such as position A.
- the reference position could, alternatively, be position B.
- the actual or virtual microphone would be placed at position B and the sound field description would be a sound field, for example, represented by the first-order Ambisonics components or higher-order Ambisonics components or any other sound components having the potential to describe a sound field with respect to at least one reference location, i.e., position B.
- the meta data generator might, then, generate, as the information on the sound sources, the distance of sound source A to position B or the distance of source B to position B.
- Alternative information on sound sources could, of course, be the absolute or relative position with respect to a reference position.
- the reference position could be the origin of a general coordinate system or could be located in a defined relation to the origin of a general coordinate system.
- meta data could be the absolute position of one sound source and the relative position of another sound source with respect to the first sound source and so on.
- FIG. 2 illustrates an apparatus for generating an enhanced sound field description
- the sound field generator comprises a sound field generator 250 for the first sound field, a sound field generator 260 for the second sound field and, an arbitrary number of sound field generators for one or more sound fields such as a third, fourth and so on sound field.
- the metadata is configured to calculate and forward to the combiner 120 an information on the first sound field and the second sound field. All this information is used by the combiner 120 in order to generate the enhanced sound field description.
- the combiner 120 is also configured as an output interface to generate the enhanced sound field description.
- FIG. 3 a illustrates an enhanced sound field description as a datastream comprising a first sound field description 330 , a second sound field description 340 and, associated thereto, the metadata 350 comprising information on the first sound field description and the second sound field description.
- the first sound field description can, for example, be a B-format description or a higher-order description or any other description that allows to determine a directional distribution of sound sources either in a full-band representation or in a frequency-selected representation.
- the first sound field description 330 and the second sound field description 340 can, for example, also be parametric sound field descriptions for the different reference locations having a, for example, a downmix signal and directional of arrival data for different time/frequency bins.
- the geometrical information 350 for the first and the second sound field descriptions is the same for all sources included in the first sound field description 330 or, for the sources in the second sound field description 340 , respectively.
- this geometrical information is the same for the three sources in the first sound field description.
- the geometrical information for the second sound field included in the metadata 350 is the same for all the sources in the second sound field description.
- FIG. 3 b illustrates an exemplary construction of the metadata 350 of FIG. 3 a .
- the reference location 351 can be included in the metadata. However, this is not necessarily the case in the reference location information 351 can also be omitted.
- a first geometrical information is given which can, for example, be an information on vector A illustrated in FIG. 4 c pointing from an origin to the reference position/location A, to which the first sound field is related.
- the second geometrical information can, for example, be an information on the vector B pointing from the origin to the second reference position/location B, to which the second sound field description is related.
- a and B are the reference locations or recording positions for both sound field descriptions.
- Alternative geometrical information can, for example, be an information on the vector D extending between reference location A and the further reference location B and/or an origin and a vector pointing from the origin to one of both points.
- the geometrical information included in the metadata may comprise vector A and vector D or may comprise vector B and vector D or may comprise vector A and vector B without vector D or may comprise other information, from which the reference location A and the reference location B can be identified in a certain three-dimensional coordinate system.
- the same consideration is additionally apply for a two-dimensional sound description as well as particularly illustrated in FIG. 4 c that only shows the two-dimensional case.
- FIG. 4 a illustrates an implementation of an apparatus for generating a modified sound field description from a sound field description and meta data relating to spatial information of the sound field description.
- the apparatus comprises a sound field calculator 420 that generates the modified sound field using meta data, the sound field description and translation information indicating a translation from a reference location to a different reference location.
- the sound field calculator 420 is connected to an input interface 400 for receiving the enhanced sound field description as, for example, discussed with respect to FIG. 1 a or 2 and the input interface 400 then separates the sound field description on the one hand, i.e., what has been generated by block 100 of FIG. 1 a or block 210 of FIG. 2 .
- the input interface 400 separates the meta data from the enhanced sound field description, i.e., item 350 of FIG. 3 a or optional 351 and 352 to 354 of FIG. 3 b.
- a translation interface 410 obtains the translation information and/or additional or separate rotation information from a listener.
- An implementation of the translation interface 410 can be a head-tracking unit that not only tracks the rotation of a head in a virtual reality environment, but also a translation of the head from one position, i.e., position A in FIG. 1 b to another position, i.e., position B in FIG. 1 b.
- FIG. 4 b illustrates another implementation similar to FIG. 1 a, but not related to an encoder/decoder scenario, but related to a general scenario where the meta data supply indicated by a meta data supplier 402 , the sound field supply indicated by a sound field supplier 404 are done without a certain input interface separating an encoded or enhanced sound field description, but are all done, for example, in an actual scenario existing, for example, in a virtual reality application.
- the present invention is not limited to virtual reality applications, but can also be implemented in any other applications, where the spatial audio processing of sound fields that are related to a reference location is useful in order to transform a sound field related to a first reference location to another sound field related to a different second reference location.
- the sound field calculator 420 then generates the modified sound field description or, alternatively, generates a (virtual) loudspeaker representation or generates a binaural representation such as a two-channel representation for a headphone reproduction.
- the sound field calculator 420 can generate, as the modified sound field, a modified sound field description, being basically the same as the original sound field description, but now with respect to a new reference position.
- a virtual or actual loudspeaker representation can be generated for a predetermined loudspeaker setup such as 5.1 scheme or a loudspeaker setup having more loudspeakers and, particularly, having a three-dimensional arrangement of loudspeakers rather than only a two-dimensional arrangement, i.e., a loudspeaker arrangement having loudspeakers being elevated with respect to the user position.
- Other applications that are specifically useful for virtual reality applications are applications for binaural reproduction, i.e., for a headphone that can be applied to the virtual reality user's head.
- FIG. 6 illustrates a situation, where a DirAC synthesizer only operates on a downmix component such as the omnidirectional or pressure component, while, in a further alternative embodiment illustrated with respect to FIG. 12 b, the DirAC synthesizer operates on the whole sound field data, i.e., the full component representation having, in this embodiment in FIG. 12 b, a field description with an omnidirectional component w and three directional components x, y, z.
- FIG. 4 c illustrates the scenario underlying embodiments of the present invention.
- the Figure illustrates a first reference position/location A, a second reference position/location B and two different sound sources A and B, and a translation vector I.
- Both sound sources A and B are included in the sound field description related to reference location A and the second sound field description related to reference position B.
- both the different sound field descriptions related to A and B are subjected to a source separation procedure and, then, a matching of the sources obtained by these different sound separation procedures is obtained.
- source A for example.
- Source A is found in the source separation algorithm for the first sound field description and also for the second sound field description.
- the direction of arrival information for source A will be, when obtained from the first sound field description related to reference position A the angle ⁇ . Additionally, the direction of arrival information for the same source A but now obtained from the second sound field description related to the further reference position B will be angle ⁇ .
- the triangle defined by source A, the reference position A and the reference position B is fully defined.
- the distance from source A to reference position A or the distance from source A to reference position B or the general position of source A, i.e., the vector pointing from the origin to the actual position of source A can be calculated, for example by triangulation processing operations.
- the position or distance both represent information on a distance or on a position.
- a distance/position information for each matched source is calculated and, then, each matched source can be processed as if the distance/position is fully known or is, for example, given by additional metadata.
- the geometrical information for the first sound field description and the second sound field description may be used instead of any distance/depth information for each individual source.
- FIG. 8 illustrates another implementation for performing a synthesis different from the DirAC synthesizer.
- a sound field analyzer generates, for each source signal, a separate mono signal S and an original direction of arrival and when, depending on the translation information, a new direction of arrival is calculated, then the Ambisonics signal generator 430 of FIG. 8 , for example, would be used to generate a sound field description for the sound source signal, i.e., the mono signal S but for the new direction of arrival (DoA) data consisting of a horizontal angle ⁇ or an elevation angle ⁇ and an azimuth angle ⁇ . Then, a procedure performed by the sound field calculator 420 of FIG.
- DoA new direction of arrival
- 4 b would be to generate, for example, a first-order Ambisonics sound field representation for each sound source with the new direction of arrival and, then, a further modification per sound source could be performed using a scaling factor depending on the distance of the sound field to the new reference location and, then, all the sound fields from the individual sources could superposed to each other to finally obtain the modified sound field, once again, in, for example, an Ambisonics representation related to a certain new reference location.
- each time/frequency bin processed by a DirAC analyzer 422 , 422 a, 422 b of FIG. 6 represents a certain (bandwidth limited) sound source
- the Ambisonics signal generator 430 could be used, instead of the DirAC synthesizer 425 , 425 a, 425 b to generate, for each time/frequency bin, a full Ambisonics representation using the downmix signal or pressure signal or omnidirectional component for this time/frequency bin as the “mono signal S” of FIG. 8 .
- an individual frequency-time conversion in frequency-time converter for each of the W, X, Y, Z component would then result in a sound field description different from what is illustrated in FIG. 4 c.
- the scene is recorded from the point of view (PoV) of the microphone, which position is used as the origin of the reference coordinate system.
- the scene has to be reproduced from the PoV of the listener, who is tracked in 6DoF, cf. FIG. 5 .
- a single sound source is shown here for illustration, the relation holds for each time-frequency bin.
- FIG. 5 illustrates the 6DoF reproduction of spatial audio.
- a sound source is recorded by a microphone with the DoA r r in the distance d r relative to the microphones position and orientation (black line and arc). It has to be reproduced relative to the moving listener with the DoA r l and distance d l (dashed). This has to consider the listeners translation I and rotation o (dotted).
- the DOA is represented as a vector with unit length pointing towards the source.
- the listener is tracked in 6DoF. At a given time, he is at a position I ⁇ 3 relative to the microphone and has a rotation o ⁇ 3 relative to the microphones' coordinates system.
- the recording position is chosen as the origin of the coordinate system to simplify the notation.
- the proposed method is based on the basic DirAC approach for parametric spatial sound encoding cf. [16]. It is assumed that there is one dominant direct source per time-frequency instance of the analyzed spectrum and these can be treated independently.
- the recording is transformed into a time-frequency representation using short time Fourier transform (STFT).
- STFT short time Fourier transform
- the time frame index is denoted with n and the frequency index with k.
- the transformed recording is then analyzed, estimating directions r r (k,n) and diffuseness ⁇ (k,n) for each time-frequency bin of the complex spectrum P(k,n).
- the signal is divided into a direct and diffuse part.
- loudspeaker signals are computed by panning the direct part depending on the speaker positions and adding the diffuse part.
- the method for transforming an FOA signal according to the listeners perspective in 6DoF can be divided into five steps, cf. FIG. 6 c.
- FIG. 6 c illustrates a method of 6DoF reproduction.
- the recorded FOA signal in B-Format is processed by a DirAC encoder that computes direction and diffuseness values for each time-frequency bin of the complex spectrum.
- the direction vector is then transformed by the listener's tracked position and according to the distance information given in a distance map for each source derived by e.g. triangulation calculations.
- the resulting direction vector is then rotated according to the head rotation.
- signals for 8+4 virtual loudspeaker channels are synthesized in the DirAC decoder. These are then binauralized.
- the input signal is analyzed in the DirAC encoder 422 , the distance information is added from the distance map m(r) giving a distance for each (matched) source, then the listener's tracked translation and rotation are applied in the novel transforms 423 and 424 .
- the DirAC decoder 425 synthesizes signals for 8+4 virtual loudspeakers, which are in turn binauralized 427 for headphone playback. Note that as the rotation of the sound scene after the translation is an independent operation, it could be alternatively applied in the binaural renderer.
- the only parameter transformed for 6DoF is the direction vector.
- the diffuse part is assumed to be isotropic and homogeneous and thus is kept unchanged.
- the input to the DirAC encoder is an FOA sound signal in B-format representation. It consists of four channels, i.e., the omnidirectional sound pressure and the three first-order spatial gradients, which under certain assumptions are proportional to the particle velocity.
- This signal is encoded in a parametric way, cf. [18].
- the DirAC representation consists of the signal P(k,n), the diffuseness ⁇ (k,n) and direction r(k,n) of the sound wave at each time-frequency bin.
- the diffuseness is estimated from the coefficient of Variation of this vector [18].
- ⁇ ⁇ ( k , n ) 1 - ⁇ E ⁇ ⁇ I a ( k , n ) ⁇ ⁇ E ⁇ ⁇ ⁇ I a ( k , n ) ⁇ ⁇ ( 2 )
- E denotes the expectation operator along time frames, implemented as moving average.
- the variance of the direction estimates should be low in an optional embodiment. As the frames are typically short, this is not always the case. Therefore, a moving average is applied to obtain a smoothed direction estimate ⁇ a (k,n).
- the DoA of the direct part of the signal is then, in an embodiment computed as unit length vector in the opposite direction:
- r r ( k , n ) - I _ a ( k , n ) ⁇ I _ a ⁇ ( k , n ) _ ⁇ ( 3 )
- the direction is encoded as a three-dimensional vector of unit length for each time-frequency bin, it is straightforward to integrate the distance information.
- the attenuation is assumed to be a function of the distance between sound source and listener [19].
- the length of the direction vectors is to encode the attenuation or amplification for reproduction.
- the distance to the recording position is encoded in d r (k, n) according to the distance map, and the distance to be reproduced encoded in d 1 (k, n). If one normalizes the vectors to unit length and then multiply by the ratio of old and new distance, one sees that the useful length is given by dividing d 1 (k, n) by the length of the original vector:
- the changes for the listener's orientation are applied in the following step.
- r p ( k , n ) d p ( k , n ) ⁇ d p ( k , n ) ⁇ ( 8 )
- the transformed direction vector, the diffuseness, and the complex spectrum are used to synthesize signals for a uniformly distributed 8+4 virtual loudspeaker setup.
- Eight virtual speakers are located in 45° azimuth steps on the listener plane (elevation 0°), and four in a 90° cross formation above at 45° elevation.
- edge fading amplitude panning (EFAP) panning is applied to reproduce the sound from the right direction given the virtual loudspeaker geometry [20].
- EFAP edge fading amplitude panning
- this provides a panning gain G i (r) for each virtual loudspeaker channel i.
- the distance-dependent gain for each DoA is derived from the resulting length of the direction vector, d p (k, n).
- the pressure P(k,n) is used to generate I decorrelated signals ⁇ tilde over (P) ⁇ i (k,n). These decorrelated signals are added to the individual loudspeaker channels as the diffuse component. This follows the standard method [16]:
- each channel The diffuse and direct part of each channel are added together, and the signals are transformed back into the time domain by an inverse STFT.
- These channel time domain signals are convolved with HRTFs for the left and right ear depending on the loudspeaker position to create binauralized signals.
- FIG. 6 a illustrates a further embodiment for calculating the modified sound field using the spatial information, and the first and the second sound field descriptions and the translation information indicating a translation of a reference location to a different reference location as, for example, discussed with respect to vector I in FIG. 4 c or FIG. 5 .
- FIG. 6 a illustrates block 700 indicating an application of a sound separation or, generally, sound analysis procedure to the first sound field description related to reference position A of FIG. 4 c and the second sound field description related to reference position B of FIG. 4 c.
- This procedure will result in a first group of one or more extracted objects and, additionally, in a second group of one or more extracted objects.
- These groups are used within block 702 for calculating the direction of arrival information for all separated sources, i.e., for the first group of extracted sources and the second group of one or more extracted sources.
- steps 700 and 702 are implemented within a single procedure providing, on the one hand, the signal for the source and on the other hand the DoA information for the source.
- This is also true for parametric procedures such as time/frequency-selective procedures as DirAC, where the source signal is the signal of the B-format representation in a time/frequency bin or the pressure signal or omnidirectional signal of the time/frequency bin and the DoA information as the DoA parameter for this specific bin.
- step 704 a source matching is performed between the sources of the first group and the sources of the second group and the result of the source matching are matched sources.
- matched sources are used for computing a sound field for each matched object using the new DoA and the new distance as illustrated in block 710 .
- the directional of arrival information of the matched objects i.e., two per each object such as ⁇ and ⁇ of FIG. 4 c for source A are used in block 706 in order to calculate the positions of the matched objects or alternatively or additionally, the distances of the matched objects using, for example, triangulation operations.
- the result of block 706 is the position of each matched object or alternatively or additionally, the distance of a matched object to one of the first or the second reference location A, B, illustrated, for example in FIG. 4 c.
- the positions of the matched objects are input into step 708 , it is to be emphasized, that for only calculating the new direction of arrival information for a matched object, the actual position of the matched object or, in other words, the distance of the matched object is not necessary for calculating the new direction of arrival with respect to a new (different) reference location, to which a listener has moved to, for example, the distance is not necessary.
- the distance may then be used in order to adapt the source signal to the new situation.
- a scaling factor will be calculated that is lower than one.
- a scaling factor is calculated to be higher than one as, for example, discussed with respect to FIG. 6 b.
- FIG. 6 a it is not necessarily the case that explicit positions of the matched objects and, then, the distances of the matched objects are calculated and, then, the sound field is calculated for each matched object using the new direction of arrival and the new distance. Instead, only the distance of a matched object to one reference location of the two reference locations is generally sufficient and, then, a sound field for each matched object is calculated using the new DoA and the new distance.
- block 714 illustrates the calculation of sound fields for the non-matched objects using the old DoA information obtained by block 702 .
- the sound fields for the matched objects obtained in blocks 710 and the non-matched objects obtained by block 714 are combined in block 712 in order to obtain the modified sound field description that can, for example, be an Ambisonics description such as a first-order Ambisonics description, a higher-order Ambisonics description or, alternatively, a loudspeaker channel description related to certain loudspeaker setup that, of course, is the same for block 710 and block 714 so that a simple channel-by-channel addition can be performed in block 712 .
- an Ambisonics description such as a first-order Ambisonics description, a higher-order Ambisonics description or, alternatively, a loudspeaker channel description related to certain loudspeaker setup that, of course, is the same for block 710 and block 714 so that a simple channel-by-channel addition can be performed in block 712 .
- FIG. 6 b illustrates an implementation of the sound field calculator 420 .
- a source separation and a direction of arrival or generally direction information calculation for each source is performed.
- the direction of arrival vector is multiplied by the distance information vector, i.e., the vector from the original reference location to the sound source, i.e., the vector from item 520 to item 510 of FIG. 5 , for example.
- the translation information i.e., the vector from item 520 to item 500 of FIG. 5 is taken into account in order to calculate the new translated direction vector that is the vector from the listener position 500 to the sound source position 510 .
- the new direction of arrival vector with the correct length indicated by d v is calculated in block 1108 .
- This vector is directed in the same direction as d r , but has a different length, since the length of this vector reflects the fact that the sound source 510 is recorded in the original sound field with a certain volume and, therefore, the length of d v more or less indicates the loudness change.
- This is obtained by dividing vector d l by the recording distance d r , i.e., the length of vector d r from the microphone 520 to the sound source 510 .
- the length of the vector d r from the microphone 520 to the sound source 510 can be derived by triangulation calculation.
- the microphone When the microphone is in the reference location of the first sound field description, then the distance from the reference location of the first sound field description to the sound source is used. When, however, the microphone is in the further reference location of the second sound field description, then the distance from the further reference location of the second sound field description to the sound source is used.
- the reproduced distance is greater than the recorded distance, then the length of d v will be lower than unity. This will result an attenuation of the sound source 510 for the reproduction at the new listener position.
- the reproduced distance d l is smaller than the recorded distance, then the length of d v as calculated by block 1108 will be greater than 1 and a corresponding scaling factor will result in an amplification of the sound source.
- item 710 indicates that the sound field for each matched object is calculated using the new direction of arrival information and the new distance.
- the object signals obtained from either the first group of one or more extracted sources or the second group of one or more extracted sources can be used in general. In an embodiment, however, a specific selection illustrated in FIG. 7 is performed in order to determine which sound field description is used for performing the sound field computation in block 710 .
- the first distance of the new listener position to the first reference location of the first sound field description is determined. With respect to FIG. 4 c, this is the distance between the difference reference location and reference position A.
- step 722 the second distance of the new listener position to the second reference location of the second sound field description is determined. In this embodiment of FIG. 4 c, this would be the distance between the different reference location (due to translation) and reference position B.
- step 724 the object signal is selected from the group derived from the sound field description with the smaller distance.
- the sound source signals derived from the second sound field description related to the further reference position B would be used.
- the smaller distance would be from this other reference location to the reference position A and, then, the first sound field description would be used for finally computing the sound field for each matched object in block 710 of FIG. 6 b.
- the selection would be performed by the procedure illustrated in FIG. 7 .
- FIG. 9 illustrates a further embodiment.
- a sound field analysis on the first sound field description is performed, for example, a parametric sound field analysis in the form of a DirAC analysis illustrated in block 422 of FIG. 6 c.
- each set of parameters comprises a DoA parameter and, optionally, a diffuseness parameter.
- step 741 a sound field analysis is performed on the second sound field description and, again, a DirAC analysis is performed as in block 740 and as, for example, discussed with respect to block 422 of FIG. 6 c.
- a position for each parameter pair can be determined using the corresponding DoA parameter from the first time/frequency bin and the DoA parameter from the same time/frequency bin from the second set of parameters. This will result in a position for each parameter pair. However, the position will be more useful the lower the diffuseness for the corresponding time/frequency bin is in the first set of parameters and/or the second set of parameters.
- the “source matching” of step 704 in FIG. 6 a can, for example, be fully avoided and be replaced by a determination of matched sources/matched time/frequency bins based on the diffuseness parameters or the matching can be performed additionally using the corresponding signal in the time/frequency bin from the B-format components for example, or from the pressure signal or object signal output by block 422 of FIG. 6 c.
- block 46 will result in certain positions for certain (selected) time/frequency bins that correspond to the “matched objects” found in block 704 of FIG. 6 a.
- block 748 modified parameters and/or signals for the positions obtained by block 746 and/or the corresponding translation/rotation as obtained, for example, by a hat tracker are calculated and the output of block 748 represents modified parameters and/or modified signals for different time/frequency bins.
- block 748 may correspond to the translation transform 423 and rotation transform of block 424 for the purpose of calculating modified parameters and the calculation of modified signals would, for example, be performed by block 425 of FIG. 6 c also under the consideration of a certain scaling factor derived from the positions for the corresponding time/frequency bins.
- a synthesis of the sound field description is performed in block 750 using the modified data. This can, for example, be done by a DirAC synthesis using either the first or the second sound field description or can be performed by Ambisonics signal generator as illustrated in block 425 and the result will be the new sound field description for transmission/storage/rendering.
- FIG. 10 illustrates a further implementation of the sound field calculator 420 . At least parts of the procedure illustrated in FIG. 10 are performed for each matched source separately.
- the block 1120 determines the distance for a matched source e.g. by triangulation calculation.
- a full band direction of arrival or a per band direction of arrival is determined in 1100 .
- These direction of arrival information represent the direction of arrival data of the sound field.
- a translation transformation is performed in block 1110 .
- block 1120 calculates the distance for each matched source.
- block 1110 generates the new direction of arrival data for the sound field that, in this implementation, only depends on the translation from the reference location to the different reference location.
- block 1110 receives the translation information generated, for example, by a tracking in the context of a virtual reality implementation.
- a rotation data is used as well.
- block 1130 performs a rotation transformation using the rotation information.
- the new sound field description is generated.
- the original sound field description can be used or, alternatively, source signals that have been separated from the sound field description by a source separation algorithm can be used or any other applications can be used.
- the new sound field description can be, for example, a directional sound field description as obtained by the Ambisonics generator 430 or as generated by a DirAC synthesizer 425 or can be a binaural representation generated from a virtual speaker representation in the subsequent binaural rendering.
- the distance per direction of arrival is also used in generating the new sound field description in order to adapt the volume or loudness of a certain sound source to the new location, i.e., the new or different reference location.
- FIG. 10 illustrates a situation, where the rotation transformation is performed subsequent to the translation transformation, it is to be noted that the order can be different.
- the rotation transformation can be applied to the DoAs of the sound field as generated by block 1100 and, then, the additional translation transformation is applied that is due to the translation of a subject from the reference location to the different reference location.
- the distance information is acquired from the meta data using block 1120 and this distance information is then used by generating the new sound field description in block 1140 for accounting for a changed distance and, therefore, a changed loudness of the certain source with respect to a certain reference location.
- the specific sound source signal is attenuated while, when the distance becomes shorter, then the sound source signal is amplified.
- the attenuation or amplification of the certain sound source depending on the distance is made in proportion to the distance change, but, in other embodiments, less complex operations can be applied to this amplification or attenuation of sound source signals in quite coarse increments. Even such a less complex implementation provides superior results compared to a situation where any distance change is fully neglected.
- FIG. 11 illustrates a further implementation of the sound field calculator.
- the individual sources from the sound field are determined, for example, per band or full band like. When a determination per frame and band is performed, then this can be done by a DirAC analysis. If a full band or subband determination is performed, then this can be done by any kind of a full band or subband source separation algorithm.
- a translation and/or a rotation of a listener is determined, for example, by head tracking.
- an old distance for each source is determined by using the meta data and, for example, by using the meta data for the triangulation calculation
- each band is considered to be a certain source (provided that the diffuseness is lower than a certain threshold), and then, a certain distance for each time/frequency bin having a low diffuseness value is determined.
- a new distance per source is obtained, for example, by a vector calculation per band that is, for example, discussed in the context of FIG. 6 b.
- an old direction per source is determined, for example, by a DoA calculation obtained in a DirAC analysis or by a direction of arrival or direction information analysis in a source separation algorithm, for example.
- a new direction per source is determined, for example by performing a vector calculation per band or full band.
- a new sound field is generated for the translated and rotated listener. This can be done, for example, by scaling the direct portion per channel in the DirAC synthesis.
- the distance modification can be done in blocks 1270 a, 1270 b or 1270 c in addition or alternatively to performing the distance modification in block 1260 .
- the distance modification can already be performed in block 1270 a.
- the distance modification can be performed for the individual sources in block 1270 b, before the actual new sound field is generated in block 1260 .
- the distance modification can also be performed subsequent to the generation in block 1260 , which means in block 1270 c.
- a distance modification can also be distributed to several modifiers so that, in the end, a certain sound source is in a certain loudness that is directed by the difference between the original distance between the sound source and the reference location and the new distance between the sound source and the different reference location.
- FIG. 12 a illustrates a DirAC analyzer as originally disclosed, for example, in the earlier cited reference “Directional Audio Coding” from IWPASH of 2009.
- the DirAC analyzer comprises a bank of band filters 1310 , an energy analyzer 1320 , an intensity analyzer 1330 , a temporal averaging block 1340 and a diffuseness calculator 1350 and the direction calculator 1360 .
- the energetic analysis can be performed, when the pressure signal and velocity signals in one, two or three dimensions are captured from a single position.
- the omnidirectional signal is called W-signal, which has been scaled down by the square root of two.
- the vector estimates the sound field velocity vector, and is also expressed in STFT domain.
- the energy E of the sound field is computed.
- the capturing of B-format signals can be obtained with either coincident positioning of directional microphones, or with a closely-spaced set of omnidirectional microphones. In some applications, the microphone signals may be formed in a computational domain, i.e., simulated.
- the direction of sound is defined to be the opposite direction of the intensity vector I.
- the direction is denoted as corresponding angular azimuth and elevation values in the transmitted meta data.
- the diffuseness of sound field is also computed using an expectation operator of the intensity vector and the energy. The outcome of this equation is a real-valued number between zero and one, characterizing if the sound energy is arriving from a single direction (diffuseness is zero), or from all directions (diffuseness is one). This procedure is appropriate in the case when the full 3D or less dimensional velocity information is available.
- FIG. 12 b illustrates a DirAC synthesis, once again having a bank of band filters 1370 , a virtual microphone block 1400 , a direct/diffuse synthesizer block 1450 , and a certain loudspeaker setup or a virtual intended loudspeaker setup 1460 . Additionally, a diffuseness-gain transformer 1380 , a vector based amplitude panning (VBAP) gain table block 1390 , a microphone compensation block 1420 , a loudspeaker gain averaging block 1430 and a distributer 1440 for other channels is used.
- VBAP vector based amplitude panning
- the high quality version of DirAC synthesis shown in FIG. 12 b receives all B-format signals, for which a virtual microphone signal is computed for each loudspeaker direction of the loudspeaker setup 1460 .
- the utilized directional pattern is typically a dipole.
- the virtual microphone signals are then modified in non-linear fashion, depending on the meta data.
- the low bitrate version of DirAC is not shown in FIG. 12 b, however, in this situation, only one channel of audio is transmitted as illustrated in FIG. 6 .
- the difference in processing is that all virtual microphone signals would be replaced by the single channel of audio received.
- the virtual microphone signals are divided into two streams: the diffuse and the non-diffuse streams, which are processed separately.
- the non-diffuse sound is reproduced as point sources by using vector base amplitude panning (VBAP).
- panning a monophonic sound signal is applied to a subset of loudspeakers after multiplication with loudspeaker-specific gain factors.
- the gain factors are computed using the information of a loudspeaker setup, and specified panning direction.
- the input signal is simply panned to the directions implied by the meta data.
- each virtual microphone signal is multiplied with the corresponding gain factor, which produces the same effect with panning, however it is less prone to any non-linear artifacts.
- the directional meta data is subject to abrupt temporal changes.
- the gain factors for loudspeakers computed with VBAP are smoothed by temporal integration with frequency-dependent time constants equaling to about 50 cycle periods at each band. This effectively removes the artifacts, however, the changes in direction are not perceived to be slower than without averaging in most of the cases.
- the aim of the synthesis of the diffuse sound is to create perception of sound that surrounds the listener.
- the diffuse stream is reproduced by decorrelating the input signal and reproducing it from every loudspeaker.
- the virtual microphone signals of diffuse stream are already incoherent in some degree, and they need to be decorrelated only mildly. This approach provides better spatial quality for surround reverberation and ambient sound than the low bit-rate version.
- DirAC For the DirAC synthesis with headphones, DirAC is formulated with a certain amount of virtual loudspeakers around the listener for the non-diffuse stream and a certain number of loudspeakers for the diffuse steam.
- the virtual loudspeakers are implemented as convolution of input signals with a measured head-related transfer functions (HRTFs).
- HRTFs head-related transfer functions
- aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
- the inventive enhanced sound field description can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
- embodiments of the invention can be implemented in hardware or in software.
- the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
- a digital storage medium for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
- Some embodiments according to the invention comprise a non-transitory data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
- embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
- the program code may for example be stored on a machine readable carrier.
- inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
- an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
- a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
- a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
- the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
- a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
- a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
- a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
- a programmable logic device for example a field programmable gate array
- a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
- the methods are performed by any hardware apparatus.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Stereophonic System (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
Description
-
- the extended representations can be converted into the existing non-extended ones (e.g. for rendering), and
- allow re-use of existing software and hardware implementations when working with the extended representation.
-
- 1. Objects are generated by applying source separation to each traditional Ambisonics signal.
- 2. The DOA of all objects are computed for each traditional Ambisonics signal.
- 3. The objects extracted from one traditional Ambisonics signal are matched to the objects extracted from the other traditional Ambisonics signals. The matching is performed based on the corresponding DOAs and/or the signals (e.g., by means of correlation/coherence).
- 4. The positions of the matched objects are estimated based on triangulation.
- 5. Each matched object (single-channel input) is modified based on the estimated position and the desired position (i.e., after translation) using a distance compensation filter.
- 6. The DOA at the desired position (i.e., after translation) is computed for each matched object. This DOA is represented by DOA′.
- 7. An Ambisonics object signal is computed for each matched object. The Ambisonics object signal is generated such that the matched object has a direction-of-arrival DOA′.
- 8. An Ambisonics object signal is computed for each non-matched object. The Ambisonics object signal is generated such that the non-matched object has a direction-of-arrival DOA.
- 9. The virtual Ambisonics signal is obtained by adding all Ambisonics object signals together.
-
- 1. A sound field model is assumed. The sound field can be decomposed into one or more direct sound components and diffuse sound components. The direct sound components consist of a signal and position information (e.g., in polar or Cartesian coordinates). Alternatively, the sound field can be decomposed into one or more direct/principle sound components and a residual sound component (single-or multi-channel).
- 2. The signal components and parameters of the assumed sound field model are estimated using the input Ambisonics signals.
- 3. The signal components and/or parameters are modified depending on the desired translation, or desired position, in the sound scene.
- 4. The virtual Ambisonics signal is generated using the modified signal components and modified parameters.
I a(k,n)=½Re(P(k,n)U*(k,n)). (1)
where E denotes the expectation operator along time frames, implemented as moving average.
d r(k,n)=r r(k,n)d r(k,n)=r r(k,n)m(r r(k,n)) (4)
where dr (k, n) is a vector pointing from the recording position of the microphone to the sound source active at time n and frequency bin k.
d 1(k,n)=d r(k,n)−l(n) (5)
d p(k,n)=R Y(o Y(n))R Z(o Z(n))R X(o X(n))d v(k,n) (7)
Y i(k,n)=Y i,S(k,n)+Y i,D(k,n) (9)
Y i,S(k,n)=√{square root over (1−ψ(k,n))}P(k,n) G i(r p(k,n))(∥d p(k,n)∥)−γ (10)
where the exponent γ is a tuning factor that is typically set to about 1 [19]. Note that with γ=0 the distance-dependent gain is turned off.
- [1] Liitola, T., Headphone sound externalization, Ph.D. thesis, Helsinki University of Technology. Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing., 2006.
- [2] Blauert, J., Spatial Hearing—Revised Edition: The Psychophysics of Human Sound Localization, The MIT Press, 1996, ISBN 0262024136.
- [3] Zhang, W., Samarasinghe, P. N., Chen, H., and Abhayapala, T. D., “Surround by Sound: A Re-view of Spatial Audio Recording and Reproduction,” Applied Sciences, 7(5), p. 532, 2017.
- [4] Bates, E. and Boland, F., “Spatial Music, Virtual Reality, and 360 Media,” in Audio Eng. Soc. Int. Conf. on Audio for Virtual and Augmented Reality, Los Angeles, CA, U.S.A., 2016.
- [5] Anderson, R., Gallup, D., Barron, J. T., Kontkanen, J., Snavely, N., Esteban, C. H., Agarwal, S., and Seitz, S. M., “Jump: Virtual Reality Video,” ACM Transactions on Graphics, 35(6), p. 198, 2016.
- [6] Merimaa, J., Analysis, Synthesis, and Perception of Spatial Sound: Binaural Localization Modeling and Multichannel Loudspeaker Reproduction, Ph.D. thesis, Helsinki University of Technology, 2006.
- [7] Kronlachner, M. and Zotter, F., “Spatial Transformations for the Enhancement of Ambisonics Recordings,” in 2nd International Conference on Spatial Audio, Erlangen, Germany, 2014.
- [8] Tsingos, N., Gallo, E., and Drettakis, G., “Perceptual Audio Rendering of Complex Virtual Environments,” ACM Transactions on Graphics, 23(3), pp. 249-258, 2004.
- [9] Taylor, M., Chandak, A., Mo, Q., Lauterbach, C., Schissler, C., and Manocha, D., “Guided multi-view ray tracing for fast auralization,” IEEE Trans. Visualization & Comp. Graphics, 18, pp. 1797-1810, 2012.
- [10] Rungta, A., Schissler, C., Rewkowski, N., Mehra, R., and Manocha, D., “Diffraction Kernels for Interactive Sound Propagation in Dynamic Environments,” IEEE Trans. Visualization & Comp. Graphics, 24(4), pp. 1613-1622, 2018.
- [11] Thiergart, O., Kowalczyk, K., and Habets, E. A. P., “An Acoustical Zoom based on Informed Spatial Filtering,” in Int. Workshop on Acoustic Signal Enhancement, pp. 109-113, 2014.
- [12] Khaddour, H., Schimmel, J., and Rund, F., “A Novel Combined System of Direction Estimation and Sound Zooming of Multiple Speakers,” Radioengineering, 24(2), 2015.
- [13] Ziegler, M., Keinert, J., Holzer, N., Wolf, T., Jaschke, T., op het Veld, R., Zakeri, F. S., and Foessel, S., “Immersive Virtual Reality for Live-Action Video using Camera Arrays,” in IBC, Amsterdam, Netherlands, 2017.
- [14] Thiergart, O., Galdo, G. D., Taseska, M., and Habets, E. A. P., “Geometry-Based Spatial Sound Acquisition using Distributed Microphone Arrays,” IEEE Trans. Audio, Speech, Language Process., 21(12), pp. 2583-2594, 2013.
- [15] Kowalczyk, K., Thiergart, O., Taseska, M., Del Galdo, G., Pulkki, V., and Habets, E. A. P., “Parametric Spatial Sound Processing: A Flexible and Efficient Solution to Sound Scene Acquisition, Modification, and Reproduction,” IEEE Signal Process. Mag., 32(2), pp. 31-42, 2015.
- [16] Pulkki, V., “Spatial Sound Reproduction with Directional Audio Coding,” J. Audio Eng. Soc., 55(6), pp. 503-516, 2007.
- [17] International Telecommunication Union, “ITU-R BS.1534-3, Method for the subjective assessment of intermediate quality level of audio systems,” 2015.
- [18] Thiergart, O., Del Galdo, G., Kuech, F., and Prus, M., “Three-Dimensional Sound Field Analysis with Directional Audio Coding Based on Signal Adaptive Parameter Estimators,” in Audio Eng. Soc. Conv. Spatial Audio: Sense the Sound of Space, 2010.
- [19] Kuttruff, H., Room Acoustics, Taylor & Francis, 4 edition, 2000.
- [20] Borβ, C., “A polygon-based panning method for 3D loudspeaker setups,” in Audio Eng. Soc. Conv., pp. 343-352, Los Angeles, CA, USA, 2014.
- [21] Rummukainen, O., Schlecht, S., Plinge, A., and Habets, E. A. P., “Evaluating Binaural Reproduction Systems from Behavioral Patterns in a Virtual Reality—A Case Study with Impaired Binaural Cues and Tracking Latency,” in Audio Eng. Soc. Conv. 143, New York, NY, USA, 2017.
- [22] Engelke, U., Darcy, D. P., Mulliken, G. H., Bosse, S., Martini, M. G., Arndt, S., Antons, J. -N., Chan, K. Y., Ramzan, N., and Brunnström, K., “Psychophysiology-Based QoE Assessment: A Survey,” IEEE Selected Topics in Signal Processing, 11(1), pp. 6-21, 2017.
- [23] Schlecht, S. J. and Habets, E. A. P., “Sign-Agnostic Matrix Design for Spatial Artificial Reverberation with Feedback Delay Networks,” in Proc. Audio Eng. Soc. Conf., pp. 1-10-accepted, Tokyo, Japan, 2018
- [31] M. A. Gerzon, “Periphony: With-height sound reproduction,” J. Acoust. Soc. Am., vol. 21,110. 1, pp. 2-10, 1973.
- [32] V. Pulkki, “Directional audio coding in spatial sound reproduction and stereo upmixing,” in Proc. of the 28th AES International Conference, 2006.
- [33] —, “Spatial sound reproduction with directional audio coding,” Journal Audio Eng. Soc, vol. 55, no. 6, pp. 503-516, June 2007.
- [34] C. G. and G. M., “Coincident microphone simulation covering three dimensional space and yielding various directional outputs,” U.S. Pat. No. 4,042,779, 1977.
- [35] C. Faller and F. Baumgarte, “Binaural cue coding—part ii: Schemes and applications,” IEEE Trans. Speech Audio Process, vol. 11, no. 6, November 2003.
- [36] C. Faller, “Parametric multichannel audio coding: Synthesis of coherence cues,” IEEE Trans. Speech Audio Process., vol. 14, no. 1, January 2006.
- [37] H. P. J. E. E. Schuijers, J. Breebaart, “Low complexity parametric stereo coding,” in Proc. of the 116th AES Convention, Berlin, Germany, 2004.
- [38] V. Pulkki, “Virtual sound source positioning using vector base amplitude panning,” J. Acoust. Soc. Am., vol. 45, no. 6, pp. 456-466, June 1997.
- [39] J. G. Tylka and E. Y. Choueiri, “Comparison of techniques for binaural navigation of higher-order ambisonics sound fields,” in Proc. of the AES International Conference on Audio for Virtual and Augmented Reality, New York, September 2016.
Claims (21)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/898,016 US11950085B2 (en) | 2017-07-14 | 2022-08-29 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
US18/513,090 US20240098445A1 (en) | 2017-07-14 | 2023-11-17 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17181488 | 2017-07-14 | ||
EP17181488.2 | 2017-07-14 | ||
EP17181488 | 2017-07-14 | ||
PCT/EP2018/069140 WO2019012131A1 (en) | 2017-07-14 | 2018-07-13 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
US16/740,272 US11463834B2 (en) | 2017-07-14 | 2020-01-10 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
US17/898,016 US11950085B2 (en) | 2017-07-14 | 2022-08-29 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/740,272 Continuation US11463834B2 (en) | 2017-07-14 | 2020-01-10 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/513,090 Division US20240098445A1 (en) | 2017-07-14 | 2023-11-17 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220417695A1 US20220417695A1 (en) | 2022-12-29 |
US11950085B2 true US11950085B2 (en) | 2024-04-02 |
Family
ID=59631530
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/740,272 Active 2038-08-20 US11463834B2 (en) | 2017-07-14 | 2020-01-10 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
US17/898,016 Active US11950085B2 (en) | 2017-07-14 | 2022-08-29 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
US18/513,090 Pending US20240098445A1 (en) | 2017-07-14 | 2023-11-17 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/740,272 Active 2038-08-20 US11463834B2 (en) | 2017-07-14 | 2020-01-10 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/513,090 Pending US20240098445A1 (en) | 2017-07-14 | 2023-11-17 | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
Country Status (14)
Country | Link |
---|---|
US (3) | US11463834B2 (en) |
EP (1) | EP3652735A1 (en) |
JP (2) | JP7119060B2 (en) |
KR (2) | KR102654507B1 (en) |
CN (2) | CN111149155B (en) |
AR (1) | AR112451A1 (en) |
AU (1) | AU2018298874C1 (en) |
BR (1) | BR112020000775A2 (en) |
CA (1) | CA3069241C (en) |
RU (1) | RU2736418C1 (en) |
SG (1) | SG11202000330XA (en) |
TW (1) | TWI713866B (en) |
WO (1) | WO2019012131A1 (en) |
ZA (1) | ZA202000020B (en) |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BR112020015835A2 (en) | 2018-04-11 | 2020-12-15 | Dolby International Ab | METHODS, APPARATUS AND SYSTEMS FOR 6DOF AUDIO RENDERIZATION AND DATA REPRESENTATIONS AND BIT FLOW STRUCTURES FOR 6DOF AUDIO RENDERIZATION |
US10735882B2 (en) * | 2018-05-31 | 2020-08-04 | At&T Intellectual Property I, L.P. | Method of audio-assisted field of view prediction for spherical video streaming |
BR112020026728A2 (en) * | 2018-07-04 | 2021-03-23 | Sony Corporation | DEVICE AND METHOD OF PROCESSING INFORMATION, AND, LEGIBLE STORAGE MEDIA BY COMPUTER |
US12102420B2 (en) | 2018-10-03 | 2024-10-01 | Arizona Board Of Regents On Behalf Of Arizona State University | Direct RF signal processing for heart-rate monitoring using UWB impulse radar |
US11019449B2 (en) * | 2018-10-06 | 2021-05-25 | Qualcomm Incorporated | Six degrees of freedom and three degrees of freedom backward compatibility |
GB2582748A (en) * | 2019-03-27 | 2020-10-07 | Nokia Technologies Oy | Sound field related rendering |
WO2021018378A1 (en) * | 2019-07-29 | 2021-02-04 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method or computer program for processing a sound field representation in a spatial transform domain |
US11341952B2 (en) * | 2019-08-06 | 2022-05-24 | Insoundz, Ltd. | System and method for generating audio featuring spatial representations of sound sources |
CN110544486B (en) * | 2019-09-02 | 2021-11-02 | 上海其高电子科技有限公司 | Speech enhancement method and system based on microphone array |
WO2021086809A1 (en) | 2019-10-28 | 2021-05-06 | Arizona Board Of Regents On Behalf Of Arizona State University | Methods and systems for remote sleep monitoring |
EP4052067A4 (en) * | 2019-11-01 | 2022-12-21 | Arizona Board of Regents on behalf of Arizona State University | Remote recovery of acoustic signals from passive sources |
DE112020005550T5 (en) * | 2019-11-13 | 2022-09-01 | Sony Group Corporation | SIGNAL PROCESSING DEVICE, METHOD AND PROGRAM |
CN112153538B (en) * | 2020-09-24 | 2022-02-22 | 京东方科技集团股份有限公司 | Display device, panoramic sound implementation method thereof and nonvolatile storage medium |
FR3115103B1 (en) * | 2020-10-12 | 2023-05-12 | Renault Sas | Device and method for measuring and displaying a sound field |
KR102500694B1 (en) * | 2020-11-24 | 2023-02-16 | 네이버 주식회사 | Computer system for producing audio content for realzing customized being-there and method thereof |
JP7536735B2 (en) | 2020-11-24 | 2024-08-20 | ネイバー コーポレーション | Computer system and method for producing audio content for realizing user-customized realistic sensation |
JP7536733B2 (en) | 2020-11-24 | 2024-08-20 | ネイバー コーポレーション | Computer system and method for achieving user-customized realism in connection with audio - Patents.com |
CN114584913B (en) * | 2020-11-30 | 2023-05-16 | 华为技术有限公司 | FOA signal and binaural signal acquisition method, sound field acquisition device and processing device |
US11653166B2 (en) | 2021-05-27 | 2023-05-16 | Qualcomm Incorporated | Directional audio generation with multiple arrangements of sound sources |
JP2024531541A (en) * | 2021-09-03 | 2024-08-29 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Musical synthesizer with spatial metadata output - Patents.com |
WO2024044113A2 (en) * | 2022-08-24 | 2024-02-29 | Dolby Laboratories Licensing Corporation | Rendering audio captured with multiple devices |
Citations (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4042779A (en) | 1974-07-12 | 1977-08-16 | National Research Development Corporation | Coincident microphone simulation covering three dimensional space and yielding various directional outputs |
JPH08107600A (en) | 1994-10-04 | 1996-04-23 | Yamaha Corp | Sound image localization device |
WO1997041711A1 (en) | 1996-04-30 | 1997-11-06 | Srs Labs, Inc. | Audio enhancement system for use in a surround sound environment |
JP2006074589A (en) | 2004-09-03 | 2006-03-16 | Matsushita Electric Ind Co Ltd | Acoustic processing device |
EP2346028A1 (en) | 2009-12-17 | 2011-07-20 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
US20110305344A1 (en) | 2008-12-30 | 2011-12-15 | Fundacio Barcelona Media Universitat Pompeu Fabra | Method and apparatus for three-dimensional acoustic field encoding and optimal reconstruction |
WO2012072804A1 (en) | 2010-12-03 | 2012-06-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for geometry-based spatial audio coding |
US20120155653A1 (en) | 2010-12-21 | 2012-06-21 | Thomson Licensing | Method and apparatus for encoding and decoding successive frames of an ambisonics representation of a 2- or 3-dimensional sound field |
US20130142341A1 (en) | 2011-12-02 | 2013-06-06 | Giovanni Del Galdo | Apparatus and method for merging geometry-based spatial audio coding streams |
WO2013079568A1 (en) | 2011-12-02 | 2013-06-06 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for microphone positioning based on a spatial power density |
CN103250207A (en) | 2010-11-05 | 2013-08-14 | 汤姆逊许可公司 | Data structure for higher order ambisonics audio data |
US20140023197A1 (en) | 2012-07-20 | 2014-01-23 | Qualcomm Incorporated | Scalable downmix design for object-based surround codec with cluster analysis by synthesis |
WO2014015914A1 (en) | 2012-07-27 | 2014-01-30 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for providing a loudspeaker-enclosure-microphone system description |
CN103635964A (en) | 2011-06-30 | 2014-03-12 | 汤姆逊许可公司 | Method and apparatus for changing relative positions of sound objects contained within higher-order ambisonics representation |
US20140241528A1 (en) | 2013-02-28 | 2014-08-28 | Dolby Laboratories Licensing Corporation | Sound Field Analysis System |
US20140249827A1 (en) | 2013-03-01 | 2014-09-04 | Qualcomm Incorporated | Specifying spherical harmonic and/or higher order ambisonics coefficients in bitstreams |
US20140358557A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Performing positional analysis to code spherical harmonic coefficients |
US20140358567A1 (en) | 2012-01-19 | 2014-12-04 | Koninklijke Philips N.V. | Spatial audio rendering and encoding |
WO2014194075A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Compensating for error in decomposed representations of sound fields |
US20140355766A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Binauralization of rotated higher order ambisonics |
CN104244164A (en) | 2013-06-18 | 2014-12-24 | 杜比实验室特许公司 | Method, device and computer program product for generating surround sound field |
US20150127354A1 (en) | 2013-10-03 | 2015-05-07 | Qualcomm Incorporated | Near field compensation for decomposed representations of a sound field |
WO2015086337A1 (en) | 2013-12-13 | 2015-06-18 | Robert Bosch Gmbh | Swashplate machine, swashplate and method for relieving the hydrostatic pressure on a control part connection of a swashplate machine and for reducing the pressure of a working medium during a process of reversing the swashplate machine |
WO2015107926A1 (en) | 2014-01-16 | 2015-07-23 | ソニー株式会社 | Sound processing device and method, and program |
US20150223002A1 (en) | 2012-08-31 | 2015-08-06 | Dolby Laboratories Licensing Corporation | System for Rendering and Playback of Object Based Audio in Various Listening Environments |
US20150271621A1 (en) | 2014-03-21 | 2015-09-24 | Qualcomm Incorporated | Inserting audio channels into descriptions of soundfields |
US20150296319A1 (en) | 2012-11-20 | 2015-10-15 | Nokia Corporation | Spatial audio enhancement apparatus |
TW201614638A (en) | 2014-10-10 | 2016-04-16 | Thomson Licensing | Method and apparatus for low bit rate compression of a higher order ambisonics HOA signal representation of a sound field |
WO2016081655A1 (en) | 2014-11-19 | 2016-05-26 | Dolby Laboratories Licensing Corporation | Adjusting spatial congruency in a video conferencing system |
CN105637902A (en) | 2013-10-23 | 2016-06-01 | 汤姆逊许可公司 | Method and apparatus for decoding an ambisonics audio soundfield representation for audio playback using 2D settings |
WO2017098949A1 (en) | 2015-12-10 | 2017-06-15 | ソニー株式会社 | Speech processing device, method, and program |
US20180206057A1 (en) | 2017-01-13 | 2018-07-19 | Qualcomm Incorporated | Audio parallax for virtual reality, augmented reality, and mixed reality |
WO2019013924A1 (en) | 2017-07-12 | 2019-01-17 | Google Llc | Ambisonics sound field navigation using directional decomposition and path distance estimation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2884491A1 (en) * | 2013-12-11 | 2015-06-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Extraction of reverberant sound using microphone arrays |
-
2018
- 2018-07-13 RU RU2020106725A patent/RU2736418C1/en active
- 2018-07-13 CN CN201880060064.7A patent/CN111149155B/en active Active
- 2018-07-13 JP JP2020500728A patent/JP7119060B2/en active Active
- 2018-07-13 CN CN202311248978.5A patent/CN117319917A/en active Pending
- 2018-07-13 KR KR1020227021791A patent/KR102654507B1/en active IP Right Grant
- 2018-07-13 WO PCT/EP2018/069140 patent/WO2019012131A1/en active Search and Examination
- 2018-07-13 CA CA3069241A patent/CA3069241C/en active Active
- 2018-07-13 KR KR1020207001183A patent/KR102491818B1/en active IP Right Grant
- 2018-07-13 BR BR112020000775-7A patent/BR112020000775A2/en unknown
- 2018-07-13 SG SG11202000330XA patent/SG11202000330XA/en unknown
- 2018-07-13 EP EP18737640.5A patent/EP3652735A1/en active Pending
- 2018-07-13 AU AU2018298874A patent/AU2018298874C1/en active Active
- 2018-07-13 AR ARP180101958 patent/AR112451A1/en active IP Right Grant
- 2018-07-16 TW TW107124520A patent/TWI713866B/en active
-
2020
- 2020-01-02 ZA ZA2020/00020A patent/ZA202000020B/en unknown
- 2020-01-10 US US16/740,272 patent/US11463834B2/en active Active
-
2022
- 2022-08-03 JP JP2022124044A patent/JP2022153626A/en active Pending
- 2022-08-29 US US17/898,016 patent/US11950085B2/en active Active
-
2023
- 2023-11-17 US US18/513,090 patent/US20240098445A1/en active Pending
Patent Citations (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4042779A (en) | 1974-07-12 | 1977-08-16 | National Research Development Corporation | Coincident microphone simulation covering three dimensional space and yielding various directional outputs |
JPH08107600A (en) | 1994-10-04 | 1996-04-23 | Yamaha Corp | Sound image localization device |
WO1997041711A1 (en) | 1996-04-30 | 1997-11-06 | Srs Labs, Inc. | Audio enhancement system for use in a surround sound environment |
JP2006074589A (en) | 2004-09-03 | 2006-03-16 | Matsushita Electric Ind Co Ltd | Acoustic processing device |
US20070274528A1 (en) | 2004-09-03 | 2007-11-29 | Matsushita Electric Industrial Co., Ltd. | Acoustic Processing Device |
CN102326417A (en) | 2008-12-30 | 2012-01-18 | 庞培法布拉大学巴塞隆纳媒体基金会 | Method and apparatus for three-dimensional acoustic field encoding and optimal reconstruction |
US20110305344A1 (en) | 2008-12-30 | 2011-12-15 | Fundacio Barcelona Media Universitat Pompeu Fabra | Method and apparatus for three-dimensional acoustic field encoding and optimal reconstruction |
EP2346028A1 (en) | 2009-12-17 | 2011-07-20 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
US9196257B2 (en) | 2009-12-17 | 2015-11-24 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
JP2013514696A (en) | 2009-12-17 | 2013-04-25 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | Apparatus and method for converting a first parametric spatial audio signal to a second parametric spatial audio signal |
US20130016842A1 (en) | 2009-12-17 | 2013-01-17 | Richard Schultz-Amling | Apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
CN102859584A (en) | 2009-12-17 | 2013-01-02 | 弗劳恩霍弗实用研究促进协会 | An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
KR20140000240A (en) | 2010-11-05 | 2014-01-02 | 톰슨 라이센싱 | Data structure for higher order ambisonics audio data |
CN103250207A (en) | 2010-11-05 | 2013-08-14 | 汤姆逊许可公司 | Data structure for higher order ambisonics audio data |
US20130216070A1 (en) | 2010-11-05 | 2013-08-22 | Florian Keiler | Data structure for higher order ambisonics audio data |
JP2013545391A (en) | 2010-11-05 | 2013-12-19 | トムソン ライセンシング | Data structure for higher-order ambisonics audio data |
TW201237849A (en) | 2010-12-03 | 2012-09-16 | Fraunhofer Ges Forschung | Apparatus and method for geometry-based spatial audio coding |
WO2012072804A1 (en) | 2010-12-03 | 2012-06-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for geometry-based spatial audio coding |
CN103460285A (en) | 2010-12-03 | 2013-12-18 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for geometry-based spatial audio coding |
US20120155653A1 (en) | 2010-12-21 | 2012-06-21 | Thomson Licensing | Method and apparatus for encoding and decoding successive frames of an ambisonics representation of a 2- or 3-dimensional sound field |
US20140133660A1 (en) | 2011-06-30 | 2014-05-15 | Thomson Licensing | Method and apparatus for changing the relative positions of sound objects contained within a higher-order ambisonics representation |
CN103635964A (en) | 2011-06-30 | 2014-03-12 | 汤姆逊许可公司 | Method and apparatus for changing relative positions of sound objects contained within higher-order ambisonics representation |
TW201334580A (en) | 2011-12-02 | 2013-08-16 | Fraunhofer Ges Forschung | Apparatus and method for merging geometry-based spatial audio coding streams |
US20130142341A1 (en) | 2011-12-02 | 2013-06-06 | Giovanni Del Galdo | Apparatus and method for merging geometry-based spatial audio coding streams |
JP2015502573A (en) | 2011-12-02 | 2015-01-22 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | Apparatus and method for integrating spatial audio encoded streams based on geometry |
RU2609102C2 (en) | 2011-12-02 | 2017-01-30 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Device and method of spatial audio encoding streams combining based on geometry |
WO2013079568A1 (en) | 2011-12-02 | 2013-06-06 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for microphone positioning based on a spatial power density |
KR20140097555A (en) | 2011-12-02 | 2014-08-06 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | Apparatus and method for merging geometry-based spatial audio coding streams |
CN104185869A (en) | 2011-12-02 | 2014-12-03 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for merging geometry-based spatial audio coding streams |
WO2013079663A2 (en) | 2011-12-02 | 2013-06-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for merging geometry-based spatial audio coding streams |
US20140358567A1 (en) | 2012-01-19 | 2014-12-04 | Koninklijke Philips N.V. | Spatial audio rendering and encoding |
US20170125030A1 (en) | 2012-01-19 | 2017-05-04 | Koninklijke Philips N.V. | Spatial audio rendering and encoding |
JP2015509212A (en) | 2012-01-19 | 2015-03-26 | コーニンクレッカ フィリップス エヌ ヴェ | Spatial audio rendering and encoding |
US20140023197A1 (en) | 2012-07-20 | 2014-01-23 | Qualcomm Incorporated | Scalable downmix design for object-based surround codec with cluster analysis by synthesis |
WO2014015914A1 (en) | 2012-07-27 | 2014-01-30 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for providing a loudspeaker-enclosure-microphone system description |
US20150223002A1 (en) | 2012-08-31 | 2015-08-06 | Dolby Laboratories Licensing Corporation | System for Rendering and Playback of Object Based Audio in Various Listening Environments |
US9769588B2 (en) | 2012-11-20 | 2017-09-19 | Nokia Technologies Oy | Spatial audio enhancement apparatus |
US20150296319A1 (en) | 2012-11-20 | 2015-10-15 | Nokia Corporation | Spatial audio enhancement apparatus |
US20140241528A1 (en) | 2013-02-28 | 2014-08-28 | Dolby Laboratories Licensing Corporation | Sound Field Analysis System |
US20140249827A1 (en) | 2013-03-01 | 2014-09-04 | Qualcomm Incorporated | Specifying spherical harmonic and/or higher order ambisonics coefficients in bitstreams |
WO2014194075A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Compensating for error in decomposed representations of sound fields |
US20140355766A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Binauralization of rotated higher order ambisonics |
RU2015151021A (en) | 2013-05-29 | 2017-07-04 | Квэлкомм Инкорпорейтед | COMPRESSING SOUND FIELD REPRESENTATIONS |
US20140358557A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Performing positional analysis to code spherical harmonic coefficients |
US20160142851A1 (en) | 2013-06-18 | 2016-05-19 | Dolby Laboratories Licensing Corporation | Method for Generating a Surround Sound Field, Apparatus and Computer Program Product Thereof |
CN104244164A (en) | 2013-06-18 | 2014-12-24 | 杜比实验室特许公司 | Method, device and computer program product for generating surround sound field |
US20150127354A1 (en) | 2013-10-03 | 2015-05-07 | Qualcomm Incorporated | Near field compensation for decomposed representations of a sound field |
CN105637902A (en) | 2013-10-23 | 2016-06-01 | 汤姆逊许可公司 | Method and apparatus for decoding an ambisonics audio soundfield representation for audio playback using 2D settings |
US20160309273A1 (en) | 2013-10-23 | 2016-10-20 | Thomson Licensing | Method for and apparatus for decoding an ambisonics audio soundfield representation for audio playback using 2d setups |
WO2015086337A1 (en) | 2013-12-13 | 2015-06-18 | Robert Bosch Gmbh | Swashplate machine, swashplate and method for relieving the hydrostatic pressure on a control part connection of a swashplate machine and for reducing the pressure of a working medium during a process of reversing the swashplate machine |
WO2015107926A1 (en) | 2014-01-16 | 2015-07-23 | ソニー株式会社 | Sound processing device and method, and program |
CN106104680A (en) | 2014-03-21 | 2016-11-09 | 高通股份有限公司 | It is inserted into voice-grade channel in the description of sound field |
US20150271621A1 (en) | 2014-03-21 | 2015-09-24 | Qualcomm Incorporated | Inserting audio channels into descriptions of soundfields |
TW201614638A (en) | 2014-10-10 | 2016-04-16 | Thomson Licensing | Method and apparatus for low bit rate compression of a higher order ambisonics HOA signal representation of a sound field |
US20170243589A1 (en) | 2014-10-10 | 2017-08-24 | Dolby International Ab | Method and apparatus for low bit rate compression of a higher order ambisonics hoa signal representation of a sound field |
WO2016081655A1 (en) | 2014-11-19 | 2016-05-26 | Dolby Laboratories Licensing Corporation | Adjusting spatial congruency in a video conferencing system |
WO2017098949A1 (en) | 2015-12-10 | 2017-06-15 | ソニー株式会社 | Speech processing device, method, and program |
US20180359594A1 (en) | 2015-12-10 | 2018-12-13 | Sony Corporation | Sound processing apparatus, method, and program |
US20180206057A1 (en) | 2017-01-13 | 2018-07-19 | Qualcomm Incorporated | Audio parallax for virtual reality, augmented reality, and mixed reality |
WO2019013924A1 (en) | 2017-07-12 | 2019-01-17 | Google Llc | Ambisonics sound field navigation using directional decomposition and path distance estimation |
Non-Patent Citations (45)
Title |
---|
"ITU-R BS.1534-3, Method for the subjective assessment of intermediate quality level of audio systems", International Telecommunication Union, 2015, Oct. 2015, 36 pp. |
Altman, M, et al., "Immersive Audio for VR", Audio Engineering Society, Oct. 1, 2016, Conference on Audio for Virtual and Augmented Reality, 8 pp. |
Anderson, Robert, et al., "Jump: Virtual Reality Video", ACM Transactions on Graphics, 35(6), p. 198, 2016, pp. 198-198:13. |
Bates, Enda, et al., "Spatial Music, Virtual Reality, and 360 Media", Audio Eng. Soc. Int. Conf. on Audio for Virtual and Augmented Reality, Los Angeles, CA, U.S.A., 2016, 8 pp. |
Blauert, Jens, "[Uploaded in 2 parts] —Spatial Hearing—Revised Edition: The Psychophysics of Human Sound Localization", The MIT Press, 1996, ISBN 0262024136, pp. 93-128. |
Bleidt, Robert L, et al., "Development of the MPEG-H TV Audio System for ATSC 3.0", IEEE Transactions on Broadcasting; vol. 63, No. 1, Mar. 2017, pp. 202-236, XP055484143, US; ISSN: 0018-9316, Mar. 2017, pp. 202-236. |
Bleidt, Robert L., et al., "Development of the MPEG-HTV Audio System for ATSC 3.011", IEEE Transactions on Broadcasting., vol. 63, No. 1, Mar. 2017 (Mar. 2017). |
Boehm, Johannes, et al., "Scalable Decoding Mode for MPEG-H3D Audio HOA", 108. MPEG Meeting; Mar. 31, 2014-Apr. 4, 2014; Valencia; (Motion Picture Expert Group or ISO/IECJTC1/SC29/WG11), No. m33195, Mar. 26, 2014 (Mar. 26, 2014), XP030061647, p. 7, paragraph 2.4—p. 11, paragraph 2.5, pp. 7, 11. |
Borb, Christian, "A polygon-based panning method for 3D loudspeaker setups", Audio Eng. Soc. Conv., pp. 343-352, Los Angeles, CA, USA, 2014, pp. 343-352. |
Engelke, Ulrich, et al., "Psychophysiology-Based QoE Assessment: A Survey", IEEE Selected Topics in Signal Processing, 11(1), pp. 6-21, 2017, pp. 6-21. |
Faller, Christof, "Parametric multichannel audio coding: Synthesis of coherence cues", IEEE Trans. Speech Audio Process., vol. 14, No. 1, Jan. 2006, pp. 299-310. |
Faller, Christof, et al., "Binaural cue coding—Part II: schemes and applications", IEEE Trans on Speech and Audio Proc., vol. 11, No. 6, pp. 299-310. |
Gerzon, Michael A., "Periphony: With-height sound reproduction", J. Acoust. Soc. Am., vol. 21, 110. 1, pp. 2-10, 1973, 1973, pp. 2-10. |
Johannes, Boehm, et al., "", "Scalable Decoding Mode for MPEG-H 3D Audio HOA", 108. MPEG Meeting; Mar. 31, 2014-Apr. 4, 2014; Valencia; (Motion Picture Expert Group or ISO/IEC JTC1/SC29/WG11), No. m33195, Mar. 26, 2014, XP03006164, Mar. 26, 2014, 12 pages. |
Khaddour, Hasan, et al., "A Novel Combined System of Direction Estimation and Sound Zooming of Multiple Speakers", Radioengineering, 24(2), 2015, pp. 583-592. |
Kowalczyk, Konrad, et al., "Parametric Spatial Sound Processing: A Flexible and Efficient Solution to Sound Scene Acquisition, Modification, and Reproduction", IEEE Signal Process. Mag., 32(2), pp. 31-42, 2015, pp. 31-42. |
Kronlachner, Matthias, et al., "Spatial Transformations for the Enhancement of Ambisonics Recordings", 2nd International Conference on Spatial Audio, Erlangen, Germany, 2014, 5 pp. |
Kuttruff, Heinrich, "Room Acoustics", Taylor & Francis, 4 edition, 2000, 369 pp. |
Liitola, Toni, "Headphone sound externalization", Ph.D. thesis, Helsinki University of Technology. Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing., 2006, 83 pp. |
Merimaa, Juha, "Analysis, Synthesis and Perception of Spatial Sound: Binaural Localization Modeling and Multichannel Loudspeaker Reproduction", Ph.D. thesis, Helsinki University of Technology, 2006, 196 pp. |
Pihlajamaki, Tapani, et al., "Synthesis of Complex Sound Scenes with Transformation of Recorded Spatial Sound in Virtual Reality", JAES, AES, 60 East 42nd Street, Room 2520 New York 10165-2520, USA, vol. 63, No. 7/8, Aug. 18, 2015, pp. 542-551, XP040672137, pp. 542-551. |
Pihlajamäki, Tapani, et al., Synthesis of Complex Sound Scenes with Transformation of Recorded Spatial Sound in Virtual Reality, Journal of the Audio Engineering Society PAPERS vol. 63, No. 7/8, Jul./Aug. 2015, Aug. 2015. |
Plinge, Axe, et al., "Six-Degrees-of-Freedom Binaural Audio Reproduction of First-Order Ambisonics with Distance Information", Aug. 22, 2018, Conference on Audio for Virtual and Augmented Reality, 10 pp. |
Pulkki, Ville, "Directional audio coding in spatial sound reproduction and stereo upmixing", Proc. of the 28th AES International Conference, 2006, 8 pp. |
Pulkki, Ville, "Spatial Sound Reproduction with Directional Audio Coding", J. Audio Eng. Soc., 55(6), pp. 503-516, 2007, pp. 503-516. |
Pulkki, Ville, "Virtual sound source positioning using vector base amplitude panning", J. Acoust. Soc. A m, vol. 45, No. 6, pp. 456-466, Jun. 1997, pp. 456-466. |
Rummukainen, Olli, et al., "Evaluating Binaural Reproduction Systems from Behavioral Patterns in a Virtual Reality—A Case Study with Impaired Binaural Cues and Tracking Latency", Audio Eng. Soc. Conv. 143, New York, NY, USA, 2017, 8 pp. |
Rungta, Atul, et al., "Diffraction Kernels for Interactive Sound Propagation in Dynamic Environments", IEEE Trans. Visualization & Comp. Graphics, 24(4), pp. 1613-1622, 2018, pp. 1613-1622. |
Schlecht, Sebastian J, et al., "Sign-Agnostic Matrix Design for Spatial Artificial Reverberation with Feedback Delay Networks", Proc. Audio Eng. Soc. Conf., pp. 1-10—accepted, Tokyo, Japan, 2018, Aug. 6, 2018, pp. 1-10. |
Schuijers, Erik, et al., "Low complexity parametric stereo coding", Proc. of the 116th A ES Convention, Berlin, Germany, 2004. |
Tapani, Pihlajamaki, "Synthesis of Complex Sound Scenes with Transformation of Recorded Spatial Sound in Virtual Reality", JAES, AES, 60 East 42nd Street, Room 2520 New York 10165-2520, USA, vol. 63, No. 7/8, Aug. 18, 2015 (Aug. 18, 2015), pp. 542-551, XP040672137, DOI: 10.17743/JAES .2015.0059, p. 542, paragraph 1.—p. 547, paragraph 3.3, pp. 542, 547. |
Taylor, Micah, et al., "Guided multi-view ray tracing for fast auralization", IEEE Trans. Visualization & Comp. Graphics, 18, pp. 1797-1810, 2012, pp. 1797-1810. |
Thiergart, Oliver, et al., "An Acoustical Zoom based on Informed Spatial Filtering", Int. Workshop on Acoustic Signal Enhancement, pp. 109-113, 2014, pp. 109-113. |
Thiergart, Oliver, et al., "Geometry-Based Spatial Sound Acquisition using Distributed Microphone Arrays", IEEE Trans. Audio, Speech, Language Process., 21(12), pp. 2583-2594, 2013, pp. 2583-2594. |
Thiergart, Oliver, et al., "Three-Dimensional Sound Field Analysis with Directional Audio Coding Based on Signal Adaptive Parameter Estimators", Audio Eng. Soc. Conv. Spatial Audio: Sense the Sound of Space, 2010, 9 pp. |
Tsingos, Nicolas, et al., "Perceptual Audio Rendering of Complex Virtual Environments", ACM Transactions on Graphics, 23(3), pp. 249-258, 2004, pp. 249-258. |
Tsukada, Manabu, et al., "SDM3602: An interactive audio-visual service for music event", "SDM3602: Interactive playback of free-view-listen point video and audio for music events", Multimedia, Distributed, Cooperative, and Mobile Symposium (DICOMO2017), vol. 2017, No. 1, pp. 1460-1467, 9 pp. |
Tylka, J., et al., "Comparison of techniques for binaural navigation of higher order ambisonics sound fields", Proc. of the AES International Conference on Audio for Virtual and Augmented Reality, New York, Sep. 2016. |
Tylka, Joseph G, et al., "Performance of Linear Extrapolation Methods of Virtual Sound Field Navigation", J. Audio Eng. Soc., vol. 68, No. 3, Mar. 2020, pp. 138-156. |
Tylka, Joseph G, et al., "Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones", CONFERENCE: 2016 AES International Conference on Audio for Virtual and Augmented Reality; Sep. 2016, AES, 60 EAST 42nd Street, Room 2520 New York 10165-2520, USA, Sep. 21, 2016, XP040681032, Sep. 21, 2016, 10 pp. |
Tylka, Joseph G., et al., "Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones", CONFERENCE: 2016 AES International Conference on Audio for Virtual and Augmented Reality; Sep. 2016, AES, 60 East 42nd Street, Room 2520 New York, 10165-2520, USA, Sep. 21, 2016 (Sep. 21, 2016). |
Yang, Cheng, et al., "A 3D Audio Coding Technique Based on Extracting the Distance Parameter", Jul. 18, 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME), 6 pp. |
Yulin, Peng, "[Uploaded in 4 parts] Study on Several Algorithms for Three-Dimension Audio", China's outstanding master's academic dissertation full text database (Information Technology), Aug. 15, 2013 (62 pp.), 15 pp. |
Zhang, Wen, et al., "Surround by Sound: A Re-view of Spatial Audio Recording and Reproduction", Applied Sciences, 7(5), p. 532, 2017, 532. |
Ziegler, Matthias, et al., "Immersive Virtual Reality for Live-Action Video using Camera Arrays", IBC, Amsterdam, Netherlands, 2017, 8 pp. |
Also Published As
Publication number | Publication date |
---|---|
SG11202000330XA (en) | 2020-02-27 |
CA3069241A1 (en) | 2019-01-17 |
CN117319917A (en) | 2023-12-29 |
US20240098445A1 (en) | 2024-03-21 |
KR102491818B1 (en) | 2023-01-26 |
KR20200040745A (en) | 2020-04-20 |
US20220417695A1 (en) | 2022-12-29 |
AU2018298874A1 (en) | 2020-02-20 |
US20200228913A1 (en) | 2020-07-16 |
CN111149155B (en) | 2023-10-10 |
KR102654507B1 (en) | 2024-04-05 |
JP2020527746A (en) | 2020-09-10 |
TWI713866B (en) | 2020-12-21 |
BR112020000775A2 (en) | 2020-07-14 |
KR20220098261A (en) | 2022-07-11 |
ZA202000020B (en) | 2021-10-27 |
JP2022153626A (en) | 2022-10-12 |
AR112451A1 (en) | 2019-10-30 |
TW201909657A (en) | 2019-03-01 |
AU2018298874C1 (en) | 2023-10-19 |
AU2018298874B2 (en) | 2021-08-19 |
CN111149155A (en) | 2020-05-12 |
RU2736418C1 (en) | 2020-11-17 |
WO2019012131A1 (en) | 2019-01-17 |
EP3652735A1 (en) | 2020-05-20 |
JP7119060B2 (en) | 2022-08-16 |
CA3069241C (en) | 2023-10-17 |
US11463834B2 (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11950085B2 (en) | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description | |
US11477594B2 (en) | Concept for generating an enhanced sound-field description or a modified sound field description using a depth-extended DirAC technique or other techniques | |
US11863962B2 (en) | Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V., GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HERRE, JUERGEN;HABETS, EMANUEL;SIGNING DATES FROM 20221018 TO 20221026;REEL/FRAME:062679/0597 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |