WO2009046223A2 - Analyse audio spatiale et synthèse pour la reproduction binaurale et la conversion de format - Google Patents

Analyse audio spatiale et synthèse pour la reproduction binaurale et la conversion de format Download PDF

Info

Publication number
WO2009046223A2
WO2009046223A2 PCT/US2008/078632 US2008078632W WO2009046223A2 WO 2009046223 A2 WO2009046223 A2 WO 2009046223A2 US 2008078632 W US2008078632 W US 2008078632W WO 2009046223 A2 WO2009046223 A2 WO 2009046223A2
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
signal
audio
time
channel
Prior art date
Application number
PCT/US2008/078632
Other languages
English (en)
Other versions
WO2009046223A3 (fr
Inventor
Michael M. Goodwin
Jean-Marc Jot
Mark Dolson
Original Assignee
Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/243,963 external-priority patent/US8374365B2/en
Application filed by Creative Technology Ltd filed Critical Creative Technology Ltd
Priority to GB1006665A priority Critical patent/GB2467668B/en
Priority to CN200880119120.6A priority patent/CN101884065B/zh
Publication of WO2009046223A2 publication Critical patent/WO2009046223A2/fr
Publication of WO2009046223A3 publication Critical patent/WO2009046223A3/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to audio processing techniques. More particularly, the present invention relates to methods for providing spatial cues in audio signals.
  • Virtual 3D audio reproduction of a 2-channel or multi-channel recording traditionally aims at reproducing over headphones the auditory sensation of listening to the recording over loudspeakers.
  • the conventional method consists of "virtualizing" each of the source channels by use of HRTF (Head Related Transfer Function) filters or BRIR (Binaural Room Impulse Response) filters.
  • HRTF Head Related Transfer Function
  • BRIR Binary Room Impulse Response
  • What is desired is an improved method for reproducing over headphones the directional cues of a two-channel or multi-channel audio signal.
  • the present invention provides an apparatus and method for binaural rendering of a signal based on a frequency-domain spatial analysis- synthesis.
  • the nature of the signal may be, for instance, a music or movie soundtrack recording, the audio output of an interactive gaming system, or an audio stream received from a communication network or the internet. It may also be an impulse response recorded in a room or any acoustic environment, and intended for reproducing the acoustics of this environment by convolution with an arbitrary source signal.
  • a method for binaural rendering of an audio signal having at least two channels each assigned respective spatial directions is provided.
  • the original signal may be provided in any multi-channel or spatial audio recording format, including the Ambisonic B format or a higher-order Ambisonic format; Dolby Surround, Dolby prologic or any other phase- amplitude matrix stereo format; Dolby Digital, DTS or any discrete multi-channel format; and conventional 2-channel or multi-channel recording obtained by use of an array of 2 or more microphones (including binaural recordings).
  • any multi-channel or spatial audio recording format including the Ambisonic B format or a higher-order Ambisonic format
  • Dolby Surround Dolby prologic or any other phase- amplitude matrix stereo format
  • Dolby Digital, DTS or any discrete multi-channel format Dolby Digital, DTS or any discrete multi-channel format
  • conventional 2-channel or multi-channel recording obtained by use of an array of 2 or more microphones (including binaural recordings).
  • the method includes converting the signal to a frequency-domain or subband representation, deriving in a spatial analysis a direction for each time-frequency component, and generating left and right frequency-domain signals such that, for each time and frequency, the inter-channel amplitude and phase differences between these two signals matches the inter-channel amplitude and phase differences present in the HRTF corresponding to the direction angle derived from the spatial analysis.
  • an audio output signal which has at least first and second audio output channels.
  • the output channels are generated from a time-frequency signal representation of an audio input signal having at least one audio input channel and at least one spatial information input channel.
  • a spatial audio output format is selected.
  • Directional information corresponding to each of a plurality of frames of the time- frequency signal representation are received.
  • First and second frequency domain signals are generated from the time frequency signal representation that, at each time and frequency, have inter-channel amplitude and phase differences between the at least first and second output channels, the amplitude and phase differences characterizing a direction in the selected spatial audio output format.
  • a method of generating audio output signals is provided.
  • An input audio signal preferably having at least two channels is provided.
  • the input audio signal is converted to a frequency domain representation.
  • a directional vector corresponding to the localization direction of each of a plurality of time frequency components is derived from the frequency domain representation.
  • First and second frequency domain signals are generated from the time frequency signal representation that, at each time and frequency, have inter-channel amplitude and phase differences that characterize the direction that corresponds to the directional vector.
  • An inverse transform is performed to convert the frequency domain signals to the time domain.
  • While the present invention has a particularly advantageous application for improved binaural reproduction over headphones, it applies more generally to spatial audio reproduction over headphones or loudspeakers using any 2-channel or multi-channel audio recording or transmission format where the direction angle can be encoded in the output signal by frequency-dependent or frequency-independent inter-channel amplitude and/or phase differences, including an Ambisonic format; a phase-amplitude matrix stereo format; a discrete multi-channel format; conventional 2-channel or multi-channel recording obtained by use of an array of 2 or more microphones; 2-channel or multi-channel loudspeaker 3D audio using HRTF-based (or "transaural") virtualization techniques; and sound field reproduction using loudspeaker arrays, including Wave Field Synthesis.
  • any 2-channel or multi-channel audio recording or transmission format where the direction angle can be encoded in the output signal by frequency-dependent or frequency-independent inter-channel amplitude and/or phase differences, including an Ambisonic format; a phase-amplitude matrix stereo format; a discrete multi-channel format; conventional 2-channel or multi-channel
  • the present invention can be used to convert a signal from any 2-channel or multi-channel spatial audio recording or transmission format to any other 2-channel or multi-channel spatial audio format. Furthermore, the method allows including in the format conversion an angular transformation of the sound scene such as a rotation or warping applied to the direction angle of sound components in the sound scene.
  • FIG. 1 is a flowchart illustrating a stereo virtualization method in accordance with one embodiment of the present invention.
  • FIG. 2 is a flowchart illustrating a binaural synthesis method for multichannel audio signals in accordance with another embodiment of the present invention.
  • FIG. 3 is a block diagram of standard time-domain virtualization based on HRTFs or BRTFs.
  • FIG. 4A is a block diagram of a time-domain virtualization process for one of the input channels illustrated in FIG. 3.
  • FIG. 4B is block-diagram of the time-domain virtualization process illustrated in
  • FIG. 4A is a diagrammatic representation of FIG. 4A.
  • FIG. 5 is a block diagram of a generic frequency-domain virtualization system.
  • FIG. 6A depicts format vectors for a standard 5-channel audio format and the corresponding encoding locus of the Gerzon vector in accordance with one embodiment of the present invention.
  • FIG. 6B depicts format vectors for an arbitrary 6-channel loudspeaker layout and the corresponding encoding locus of the Gerzon vector in accordance with one embodiment of the present invention.
  • FIG. 7 is a block diagram of a high-resolution frequency-domain virtualization algorithm in accordance with one embodiment of the present invention.
  • FIG. 8 is a block diagram of a high-resolution frequency-domain virtualization system with primary-ambient signal decomposition in accordance with one embodiment of the present invention.
  • the present invention provides frequency-domain methods for headphone reproduction of 2-channel or multi-channel recordings based on spatial analysis of directional cues in the recording and conversion of these cues into binaural cues or inter- channel amplitude and/or phase difference cues in the frequency domain.
  • This invention incorporates by reference the details provided in the disclosure of the invention described in the US patent application serial number 11/750,300, docket no. CLIP159, and entitled “Spatial Audio Coding Based on Universal Spatial Cues", filed on May 17, 2007, which claims priority from Application 60/747,532, the entire disclosures of which are incorporated by reference in their entirety.
  • Binaural rendering includes generating left and right frequency-domain signals such that, for each time and frequency, the binaural amplitude and phase differences between these two signals matches the binaural amplitude and phase differences present in the HRTF corresponding to the direction angle derived from the spatial analysis. It is straightforward to extend the method to any 2-channel or multi-channel spatial rendering method where the due direction of sound is characterized by prescribed inter-channel amplitude and/or phase differences.
  • headphone listening has become increasingly common; in both mobile and non-mobile listening scenarios, providing a high- fidelity listening experience over headphones is thus a key value-add (or arguably even a necessary feature) for modern consumer electronic products.
  • This enhanced headphone reproduction is relevant for stereo content such as legacy music recordings as well as multichannel music and movie soundtracks.
  • algorithms for improved headphone listening might incorporate dynamics processing and / or transducer compensation
  • the described embodiments of the invention are concerned with spatial enhancement, for which the goal is ultimately to provide the headphone listener with an immersive experience.
  • spatial enhancement for which the goal is ultimately to provide the headphone listener with an immersive experience.
  • some "spatially enhanced" headphones incorporating multiple transducers have become commercially available.
  • the preferred embodiments of the invention are directed to the more common case of headphone presentation wherein a single transducer is used to render the signal to a given ear: the headphone reproduction simply constitutes presenting a left-channel signal to the listener's left ear and likewise a right- channel signal to the right ear.
  • stereo music recordings still the predominant format
  • In-the-head localization though commonly experienced by headphone listeners, is certainly a physically unnatural percept, and is, as mentioned, contrary to the goal of listener immersion, for which a sense of externalization of the sound sources is critical.
  • a technique known as virtualization is commonly used to attempt to mitigate in-the-head localization and to enhance the sense of externalization.
  • the goal of virtualization is generally to recreate over headphones the sensation of listening to the original audio content over loudspeakers at some pre-established locations dictated by the audio format, e.g. +/- 30° azimuth (in the horizontal plane) for a typical stereo format.
  • the binaural signals for the various input channels are mixed into a two- channel signal for presentation over headphones, as illustrated in FIG. 3. Standard visualization methods have been applied to music and movie listening as well as interactive scenarios such as games.
  • a positionally accurate set of head-related transfer functions can be applied to each source to create an effective binaural rendering of multiple spatially distinct sources.
  • HRTFs head-related transfer functions
  • HRIRs head-related impulse responses
  • the channel signals consist of a mixture of the various sound sources.
  • SASC spatial audio scene coding
  • the SASC spatial analysis derives a direction angle and a radius representative of a position relative to the center of a listening circle (or sphere); the angle and radius correspond to the perceived location of that time-frequency component (for a listener situated at the center). Then, left and right frequency-domain signals are generated based on these directional cues such that, at each time and frequency, the binaural magnitude and phase differences between the synthesized signals match those of the HRTFs corresponding to the direction angle derived by the SASC analysis - such that a source panned between channels will indeed be processed by the correct HRTFs.
  • STANDARD VIRTUALIZATION METHODS In the following sections, we review standard methods of headphone virtualization, including time-domain and frequency-domain processing architectures and performance limitations.
  • Virtual 3-D audio reproduction of a two-channel or multi-channel recording traditionally aims at reproducing over headphones the auditory sensation of listening to the recording over loudspeakers.
  • the conventional method depicted in FIG. 3, consists of "virtualizing" each of the input channels (301-303) via HRTF filters (306, 308) or BRIR/BRTF (binaural room impulse response / transfer function) filters and then summing the results (310, 312).
  • Y L [t] ⁇ h mL [t] *Z m [t] (D m
  • Y R lt] ⁇ h mR [t] *Z m [t] (2) m
  • m is a channel index
  • ⁇ m ⁇ t ⁇ is the m-th channel signal.
  • the filters h mL [t] and h mR [t] for channel m are dictated by the defined spatial position of that channel, e.g. ⁇ 30° azimuth for a typical stereo format; the filter h mL [t] represents the impulse response (transfer function) from the m-th input position to the left ear, and h mR [t] the response to the right ear.
  • FIG. 4A is a block diagram of a time-domain virtualization process for one of the input channels.
  • the HRTF filters shown in FIG. 4A can be decomposed into an interaural level difference (ILD) and an interaural time difference (ITD).
  • ILD interaural level difference
  • ITD interaural time difference
  • the filtering is decomposed into an interaural time difference (ITD) and an interaural level difference (ILD), where the ITD essentially captures the different propagation delays of the two acoustic paths to the ears and the ILD represents the spectral filtering caused by the listener's presence.
  • ITD interaural time difference
  • ILD interaural level difference
  • Virtualization based on the ILD/ITD decomposition is depicted in FIG. 4B; this binaural synthesis achieves the virtualization effect by imposing interaural time and level differences on the signals to be rendered, where the ITDs and ILDs are determined from the desired virtual positions.
  • the depiction is given generically to reflect that in practice the processing is often carried out differently based on the virtualization geometry: for example, for a given virtual source, the signal to the ipsilateral ear (closest to the virtual source) may be presented without any delay while the full ITD is applied to the contralateral ear signal.
  • the ILD and ITD can both be thought of as being frequency-dependent.
  • Y R ( ⁇ ) ⁇ H mR ( ⁇ )X m ( ⁇ ) (4)
  • H(O)) denotes the discrete-time Fourier transform (DTFT) of h[t], and X m ( ⁇ ) ) the
  • interaural phase difference (unwrapped) can be thought of as representing the (frequency-dependent) ITD information:
  • each HRTF is decomposed into its minimum- phase component and an allpass component:
  • H mL ( ⁇ ) F mL ( ⁇ )e l ⁇ ⁇ ⁇ ) (8)
  • H mR ( ⁇ ) F mR ( ⁇ )e' ⁇ ⁇ ⁇ ) (9) where F(O) ) is the minimum-phase component and ⁇ ( O) ) is the excess-phase function.
  • the ITD is then obtained by:
  • FIG. 5 is a block diagram of a generic frequency-domain virtualization system.
  • the STFT consists of a sliding window and an FFT, while the inverse STFT comprises an inverse FFT and overlap-add.
  • frequency-domain formulations are idealized; in practice, frequency-domain implementations are typically based on a short-time Fourier transform (STFT) framework such as that shown in FIG. 5, where the input signal is windowed and the discrete Fourier transform (DFT) is applied to each windowed segment:
  • STFT short-time Fourier transform
  • r L [fc,/] ⁇ H mL [fc]xjM] (12)
  • m r s [M] ⁇ H m Jfc]xjM] (13) m
  • H[k] denotes the DFT of h[t].
  • achieving filtering equivalent to the time-domain approach requires that the DFT size be sufficiently large to avoid time-domain aliasing: K >_ N + N h - 1 , where N h is the length of the ⁇ RIR.
  • the frequency-domain processing can still be implemented with a computationally practical F
  • Frequency-domain processing architectures are of interest for several reasons.
  • FFT fast Fourier transform
  • DFT digital filter
  • ⁇ RTF data can be more flexibly and meaningfully parameterized and modeled in a frequency-domain representation than in the time domain.
  • the source s[t] is thus rendered through a combination of HRTFs for multiple different directions instead of via the correct HRTFs for the actual desired source direction, i.e. the due source location in a loudspeaker reproduction compatible with the input format. Unless the combined HRTFs correspond to closely spaced channels, this combination of HRTFs will significantly degrade the spatial image.
  • the methods of various embodiments of the present invention overcome this drawback, as described further in the following section.
  • Embodiments of the present invention use a novel frequency-domain approach to binaural rendering wherein the input audio scene is analyzed for spatial information, which is then used in the synthesis algorithm to render a faithful and compelling reproduction of the input, scene.
  • a frequency-domain representation provides an effective means to distill a complex acoustic scene into separate sound events so that appropriate spatial processing can be applied to each such event.
  • FIG. 1 is a flowchart illustrating a generalized stereo virtualization method in accordance with one embodiment of the present invention.
  • STFT short term Fourier transform
  • the STFT may comprise a sliding window and an FFT.
  • a panning analysis is performed to extract directional information.
  • the spatial analysis derives a directional angle representative of the position of the source audio relative to the listener's head and may perform a separation of the input signal into several spatial components (for instance directional and non-directional components).
  • panning-dependent filtering is performed using left and right HRTF filters designed for virtualization at the determined direction angle.
  • time-domain signals for presentation to the listener are generated by an inverse transform and an overlap-add procedure in operation 110.
  • FIG. 2 is a flowchart illustrating a method for binaural synthesis of multichannel audio in accordance with one embodiment of the present invention.
  • a short term Fourier transform STFT
  • the STFT may comprise a sliding window and an FFT.
  • a spatial analysis is performed to extract directional information. For each time and frequency, the spatial analysis derives a direction vector representative of the position of the source audio relative to the listener's head.
  • each time-frequency component is filtered preferably based on phase and amplitude differences that would be present in left and right head related transfer function (HRTF) filters derived from the corresponding time-frequency direction vector (provided by block 204). More particularly, at least first and second frequency domain output signals are generated that at each time and frequency component have relative inter-channel phase and amplitude values that characterize a direction in a selected output format. After the at least two output channel signals are generated for all frequencies in a given time frame, time- domain signals for presentation to the listener are generated by an inverse transform and an overlap-add procedure in operation 208.
  • HRTF head related transfer function
  • the spatial analysis method includes extracting directional information from the input signals in the time-frequency domain. For each time and frequency, the spatial analysis derives a direction angle representative of a position relative to the listener's head; for the multichannel case, it furthermore derives a distance cue that describes the radial position relative to the center of a listening circle - so as to enable parametrization of fly-by and fly- through sound events.
  • g [kj] ⁇ a[k, l] e m (18) where e m is a unit vector in the direction of the m-th input channel.
  • FIG. 6A depicts format vectors (601-605) for a standard 5-channel audio format
  • FIG. 6B depicts the same for an arbitrary loudspeaker layout.
  • the Gerzon vector 608 and the localization vector 609 are illustrated in FIG. 6A.
  • a localization vector d [k,l] is computed as follows (where the steps are carried out for each bin k at each time I):
  • * /] *'#
  • This is encoded in polar form as the radius r[k,l] and an azimuth angle ⁇ [k,l].
  • the localization vector given in Eq. (22) is in the same direction as the Gerzon vector.
  • the vector length is modified by the projection operation in Eq. (21) such that the encoding locus of the localization vector is expanded to include the entire listening circle; pairwise-panned components are encoded on the circumference instead of on the inscribed polygon as for the unmodified Gerzon vector.
  • the spatial analysis described above was initially developed to provide "universal spatial cues" for use in a format- independent spatial audio coding scheme.
  • a variety of new spatial audio algorithms have been enabled by this robust and flexible parameterization of audio scenes, which we refer to hereafter as spatial audio scene coding (SASC); for example, this spatial parameterization has been used for high-fidelity conversion between arbitrary multichannel audio formats.
  • SASC spatial audio scene coding
  • the application of SASC is provided in the frequency-domain virtualization algorithm depicted in FIG. 5.
  • the SASC spatial analysis is used to determine the perceived direction of each time-frequency component in the input audio scene. Then, each such component is rendered with the appropriate binaural processing for virtualization at that direction; this binaural spatial synthesis is discussed in the following section.
  • the analysis was described above based on an STFT representation of the input signals, the SASC method can be equally applied to other frequency-domain transforms and subband signal representations. Furthermore, it is straightforward to extend the analysis (and synthesis) to include elevation in addition to the azimuth and radial positional information.
  • X m [k,l] and the spatial localization vector d [k,l] are both provided to the binaural synthesis engine as shown in FIG. 7.
  • frequency-domain signals Y L [k,l] and Y R [k,l] are generated based on the cues d [k,l] such that, at each time and frequency, the correct HRTF magnitudes and phases are applied for virtualization at the direction indicated by the angle of d [k,l ⁇ .
  • the processing steps in the synthesis algorithm are as follows and are carried out for each frequency bin k at each time I:
  • H R [k,l] F R [k,l]e- ⁇ W ⁇ [k - l] (24) where the HRTF phases are expressed here using time delays ⁇ L [k,l] and ⁇ R [k,l] .
  • the radial cue r[k,l] can also be incorporated in the derivation of these HRTFs as an elevation or proximity effect, as described below.
  • FIG. 7 is a block diagram of a high-resolution frequency-domain virtualization algorithm where Spatial Audio Scene Coding is used to determine the virtualization directions for each time-frequency component in the input audio scene.
  • Input signals 702 are converted to the frequency domain representation 706, preferably but not necessarily using a Short Term Fourier Transform 704.
  • the frequency-domain signals are preferably analyzed in spatial analysis block 708 to generate at least a directional vector 709 for each time-frequency component.
  • embodiments of the present invention are not limited to methods where spatial analysis is performed, or, even in method embodiments where spatial analysis is performed, to a particular spatial analysis technique.
  • One preferred method for spatial analysis is described in further detail in copending application No. 11/750,300, filed May 17, 2007, titled “Spatial Audio Coding Based on Universal Spatial Cues (incorporated by reference).
  • the time-frequency signal representation (frequency-domain representation) 706 is further processed in the high resolution virtualization block 710.
  • This block achieves a virtualization effect for the selected output format channels 718 by generating at least first and second frequency domain signals 712 from the time frequency signal representation 706 that, for each time and frequency component, have inter-channel amplitude and phase differences that characterize the direction that corresponds to the directional vector 709.
  • the first and second frequency domain channels are then converted to the time domain, preferably by using an inverse Short Term Fourier Transform 714 along with conventional overlap and add techniques to yield the output format channels 718.
  • Equations (25, 26) each time frequency component X m [k,l] is independently virtualized by the HRTFs. It is straightforward to manipulate the final synthesis expressions given in Equations (27, 28) to yield
  • the frequency-domain multiplications by F L [k,l] and F R [k,l] correspond to filtering operations, but here, as opposed to the cases discussed earlier, the filter impulse responses are of length K; due to the nonlinear construction of the filters in the frequency domain (based on the different spatial analysis results for different frequency bins), the lengths of the corresponding filter impulse responses are not constrained.
  • the frequency-domain multiplication by filters constructed in this way always introduces some time-domain aliasing since the filter length and the DFT size are equal, i.e. there is no zero padding for the convolution. Listening tests indicate that this aliasing is inaudible and thus not problematic, but, if desired, it could be reduced by time-limiting the filters H L [k,l] and
  • H R [k,l] at each time I e.g. by a frequency-domain convolution with the spectrum of a sufficiently short time-domain window.
  • This convolution can be implemented approximately (as a simple spectral smoothing operation) to save computation.
  • the time-limiting spectral correction alters the filters H L [k,l] and H R [k, I] at each bin k and therefore reduces the accuracy of the resulting spatial synthesis.
  • Finding appropriate filters H L [k,l] and H R [k,l] in step 1 of the spatial synthesis algorithm corresponds to determining HRTFs for an arbitrary direction ⁇ [k,l] .
  • This problem is also encountered in interactive 3-D positional audio systems.
  • the magnitude (or minimum-phase) component of H L [k,l] and H R [k,l] is derived by spatial interpolation at each frequency from a database of HRTF measurements obtained at a set of discrete directions. A simple linear interpolation is usually sufficient.
  • the ITD is reconstructed separately either by a similar interpolation from measured ITD values or by an approximate formula. For instance, the spherical head model with diametrically opposite ears and radius b yields
  • A[k,l] -( ⁇ [k,l] + sm ⁇ [k,l]) (31) c where c denotes the speed of sound, and the azimuth angle ⁇ [k,l] is in radians referenced to the front direction.
  • This separate interpolation or computation of the ITD is critical for high-fidelity virtualization at arbitrary directions.
  • the delays ⁇ L [k,l] and ⁇ R [k,l] needed in Equations (23, 24) are derived by allocating the ITD between the left and right signals.
  • ⁇ L [k,l] ⁇ .
  • a transient detector can be incorporated; if a frame contains a broadband transient, the phase modification can be changed from a per-bin phase shift to a broadband delay such that the appropriate ITD is realized for the transient structure. This assumes the use of sufficient oversampling in the DFT to allow for such a signal delay. Furthermore, the broadband delay can be confined to the bins exhibiting the most transient behavior - such that the high- resolution virtualization is maintained for stationary sources that persist during the transient.
  • r[k,l] 0
  • the localization of the sound event coincides with the reference listening position.
  • loudspeaker reproduction of a multichannel recording in a horizontal-only (or "pantophonic") format such as the 5.1 format illustrated in FIG. 6A, a listener located at the reference position (or “sweet spot") would perceive a sound located above the head (assuming that all channels contain scaled copies of a common source signal).
  • SASC localization vector d[k, I] is the projection onto the horizontal plane of a virtual source position (defined by the azimuth and elevation angles ⁇ [k,l] and ⁇ [k,l] ) that spans a 3-D encoding surface coinciding with the upper half of a sphere centered on the listener.
  • a more general solution is defined as any 3-D encoding surface that preserves symmetry around the vertical axis and includes the circumference of the unit circle as its edge.
  • an additional enhancement for r[k,l] ⁇ 1 consists of synthesizing a binaural near-field effect so as to produce a more compelling illusion for sound events localized in proximity to the listener's head (approximately 1 meter or less).
  • mapping r[k,l] (or the virtual 3-D source position defined by the azimuth and elevation angles ⁇ [k,l] and ⁇ [k,l] ) to a physical distance measure, and extending the HRTF database used in the binaural synthesis described earlier to include near- field
  • HRTF data An approximate near-field HRTF correction can be obtained by appropriately adjusting the interaural level difference for laterally localized sound sources.
  • the gain factors ⁇ L and ⁇ R to be applied at the two ears may be derived by splitting the interaural path length difference for a given ITD value:
  • A[k,l] -[arcsm(cos ⁇ [k,l]sm ⁇ [k,l]) -H:os ⁇ [k,l]sm ⁇ [k,l]] . (38) c
  • FIG. 8 is a block diagram of a high-resolution frequency-domain virtualization system with primary-ambient signal decomposition, where the input and output time- frequency transforms are not depicted.
  • the frequency domain input signals 806 are processed in primary-ambient decomposition block 808 to yield primary components 810 and ambient components 811.
  • spatial analysis 812 is performed on the primary components to yield a directional vector 814.
  • the spatial analysis is performed in accordance with the methods described in copending application, US No. 11/750,300.
  • the spatial analysis is performed by any suitable technique that generates a directional vector from input signals.
  • the primary component signals 810 are processed in high resolution virtualization block 816, in conjunction with the directional vector information 814 to generate frequency domain signals 817 that, for each time and frequency component, have inter-channel amplitude and phase differences that characterize the direction that corresponds to the directional vector 814.
  • Ambience virtualization of the ambience components 811 takes place in the ambience virtualization block 818 to generate virtualized ambience components 819, also a frequency domain signal. Since undesirable signal cancellation can occur in a downmix, relative normalization is introduced in a preferred embodiment of the invention to ensure that the power of the downmix matches that of the multichannel input signal at each time and frequency.
  • the signals 817 and 819 are then combined.
  • the spatial analysis and synthesis scheme described previously is applied to the primary components P m [k,l].
  • the ambient components A m [k,l] may be suitably rendered by the standard multichannel virtualization method described earlier, especially if the input signal is a multichannel surround recording, e.g. in 5.1 format.
  • the ambient signal components A L [k,l] and A R [k,l] are directly added into the binaural output signal ( Y L [k,l] and Y R [k,l] ) without modification, or with some decorrelation filtering for an enhanced effect.
  • An alternative method consists of "upmixing" this pair of ambient signal components into a multichannel surround ambience signal and then virtualizing this multichannel signal with the standard techniques described earlier. This ambient upmixing process preferably includes applying decorrelating filters to the synthetic surround ambience signals.
  • the proposed SASC-based rendering method has obvious applications in a variety of consumer electronic devices where improved headphone reproduction of music or movie soundtracks is desired, either in the home or in mobile scenarios.
  • the combination of the spatial analysis method described in U.S. patent application No. 11/750,300 (docket CLIP159, "Spatial Audio Coding Based on Universal Spatial Cues", incorporated by reference herein) with binaural synthesis performed in the frequency domain provides an improvement in the spatial quality of reproduction of music and movie soundtracks over headphones.
  • the resulting listening experience is a closer approximation of the experience of listening to a true binaural recording of the recorded sound scene (or of a given loudspeaker reproduction system in an established listening room).
  • this reproduction technique readily supports head-tracking compensation because it allows simulating a rotation of the sound scene with respect to the listener, as described below. While not intended to limit the scope of the present invention, several additional applications of the invention are described below.
  • Spatial audio coding formats The SASC-based binaural rendering embodiments described herein are particularly efficient if the input signal is already provided in the frequency domain, and even more so if it is composed of more than two channels - since the virtualization then has the effect of reducing the number of channels requiring an inverse transform for conversion to the time domain.
  • the input signals in standard audio coding schemes are provided to the decoder in a frequency-domain representation; similarly, this situation occurs in the binaural rendering of a multichannel signal represented in a spatial audio coding format.
  • the encoder already provides the spatial analysis (described earlier), the downmix signal, and the primary- ambient decomposition.
  • the spatial synthesis methods described above thus form the core of a computationally efficient and perceptually accurate headphone decoder for the SASC format.
  • Non-discrete multichannel formats The SASC-based binaural rendering method can be applied to other audio content than standard discrete multichannel recordings. For instance, it can be used with ambisonic-encoded or matrix-encoded material.
  • the binaural rendering method proposed here provides a compatible and effective approach for headphone reproduction of two-channel matrix-encoded surround content. Similarly, it can be readily combined with the SIRR or DirAC techniques for high-resolution reproduction of ambisonic recordings over headphones or for the conversion of room impulse responses from an ambisonic format to a binaural format.
  • the SASC-based binaural rendering method has many applications beyond the initial motivation of improved headphone listening.
  • the use of the SASC analysis framework to parameterize the spatial aspects of the original content enables flexible and robust modification of the rendered scene.
  • One example is a "wraparound" enhancement effect created by warping the angle cues so as to spatially widen the audio scene prior to the high-resolution virtualization. Given that spatial separation is well known to be an important factor in speech intelligibility, such spatial widening may prove useful in improving the listening assistance provided by hearing aids.
  • SASC-based binaural rendering enables improved head-tracked binaural virtualization compared to standard channel-centric virtualization methods because all primary sound components are reproduced with accurate HRTF cues, avoiding any attempt to virtualize "phantom image" illusions of sounds panned between two or more channels.
  • the SASC-based binaural rendering method can be incorporated in a loudspeaker reproduction scenario by introducing appropriate crosstalk cancellation filters applied to the binaural output signal.
  • SASC-based binaural rendering method assumes reproduction using a left output channel and a right output channel, it is straightforward to apply the principles of the present invention more generally to spatial audio reproduction over headphones or loudspeakers using any 2-channel or multichannel audio recording or transmission format where the direction angle can be encoded in the output signal by prescribed frequency-dependent or frequency- independent inter-channel amplitude and/or phase differences.
  • the present invention allows accurate reproduction of the spatial audio scene in, for instance, an ambisonic format, a phase- amplitude matrix stereo format; a discrete multi-channel format, a conventional 2-channel or multi-channel recording format associated to array of two or more microphones, a 2- channel or multi-channel loudspeaker 3D audio format using HRTF-based (or "transaural") virtualization techniques, or a sound field reproduction method using loudspeaker arrays, such as Wave Field Synthesis.
  • the present invention can be used to convert a signal from any 2-channel or multi-channel spatial audio recording or transmission format to any other 2-channel or multi-channel spatial audio recording or transmission format.
  • the method allows including in the format conversion an angular transformation of the sound scene such as a rotation or warping applied to the direction angle of sound components in the sound scene.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne un procédé à domaine fréquentiel pour la conversion de format ou la reproduction de signaux audio à 2 canaux ou multiplex tels que des enregistrements. La reproduction se fait sur la base de l'analyse spatiale de signaux directionnels dans le signal audio d'entrée et la conversion de ces signaux en signaux de sortie audio pour au moins deux canaux dans le domaine fréquentiel.
PCT/US2008/078632 2007-10-03 2008-10-02 Analyse audio spatiale et synthèse pour la reproduction binaurale et la conversion de format WO2009046223A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1006665A GB2467668B (en) 2007-10-03 2008-10-02 Spatial audio analysis and synthesis for binaural reproduction and format conversion
CN200880119120.6A CN101884065B (zh) 2007-10-03 2008-10-02 用于双耳再现和格式转换的空间音频分析和合成的方法

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US97734507P 2007-10-03 2007-10-03
US60/977,345 2007-10-03
US97743207P 2007-10-04 2007-10-04
US60/977,432 2007-10-04
US10200208P 2008-10-01 2008-10-01
US12/243,963 2008-10-01
US12/243,963 US8374365B2 (en) 2006-05-17 2008-10-01 Spatial audio analysis and synthesis for binaural reproduction and format conversion
US61/102,002 2008-10-01

Publications (2)

Publication Number Publication Date
WO2009046223A2 true WO2009046223A2 (fr) 2009-04-09
WO2009046223A3 WO2009046223A3 (fr) 2009-06-11

Family

ID=40526952

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/078632 WO2009046223A2 (fr) 2007-10-03 2008-10-02 Analyse audio spatiale et synthèse pour la reproduction binaurale et la conversion de format

Country Status (3)

Country Link
CN (1) CN101884065B (fr)
GB (1) GB2467668B (fr)
WO (1) WO2009046223A2 (fr)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012025283A1 (fr) * 2010-08-25 2012-03-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Appareil conçu pour générer un signal décorrélé au moyen d'informations de phase émises
WO2013104529A1 (fr) * 2012-01-13 2013-07-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Dispositif et procédé pour le calcul de signaux de haut-parleurs pour une pluralité de haut-parleurs, utilisant une temporisation dans la gamme de fréquence
EP2665208A1 (fr) * 2012-05-14 2013-11-20 Thomson Licensing Procédé et appareil de compression et de décompression d'une représentation de signaux d'ambiophonie d'ordre supérieur
EP2738962A1 (fr) * 2012-11-29 2014-06-04 Thomson Licensing Procédé et appareil pour la détermination des directions de source sonore dominante dans une représentation d'ambiophonie d'ordre supérieur d'un champ sonore
WO2014177202A1 (fr) * 2013-04-30 2014-11-06 Huawei Technologies Co., Ltd. Appareil de traitement de signal audio
WO2014193993A1 (fr) * 2013-05-29 2014-12-04 Qualcomm Incorporated Filtrage avec réponses impulsionnelles de salle binauriculaire
WO2014194003A1 (fr) * 2013-05-29 2014-12-04 Qualcomm Incorporated Exécution d'une analyse de positions afin de coder des coefficients harmoniques sphériques
WO2014194088A3 (fr) * 2013-05-29 2015-03-19 Qualcomm Incorporated Binauralisation d'ambiophonie rotative d'ordre supérieur
KR20150095660A (ko) * 2012-12-12 2015-08-21 톰슨 라이센싱 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
KR20150114874A (ko) * 2014-04-02 2015-10-13 주식회사 윌러스표준기술연구소 오디오 신호 처리 방법 및 장치
RU2570359C2 (ru) * 2010-12-03 2015-12-10 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Прием звука посредством выделения геометрической информации из оценок направления его поступления
US9282417B2 (en) 2010-02-02 2016-03-08 Koninklijke N.V. Spatial sound reproduction
US9489955B2 (en) 2014-01-30 2016-11-08 Qualcomm Incorporated Indicating frame parameter reusability for coding vectors
US9495968B2 (en) 2013-05-29 2016-11-15 Qualcomm Incorporated Identifying sources from which higher order ambisonic audio data is generated
US9584235B2 (en) 2009-12-16 2017-02-28 Nokia Technologies Oy Multi-channel audio processing
US9620137B2 (en) 2014-05-16 2017-04-11 Qualcomm Incorporated Determining between scalar and vector quantization in higher order ambisonic coefficients
EP3062535A4 (fr) * 2013-10-22 2017-07-05 Industry-Academic Cooperation Foundation, Yonsei University Procédé et appareil conçus pour le traitement d'un signal audio
US9747910B2 (en) 2014-09-26 2017-08-29 Qualcomm Incorporated Switching between predictive and non-predictive quantization techniques in a higher order ambisonics (HOA) framework
US9832585B2 (en) 2014-03-19 2017-11-28 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US9832589B2 (en) 2013-12-23 2017-11-28 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
US9848275B2 (en) 2014-04-02 2017-12-19 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and device
WO2017218973A1 (fr) 2016-06-17 2017-12-21 Edward Stein Panoramique en fonction de distance à l'aide d'un rendu de champ proche/lointain
US9852737B2 (en) 2014-05-16 2017-12-26 Qualcomm Incorporated Coding vectors decomposed from higher-order ambisonics audio signals
US9922656B2 (en) 2014-01-30 2018-03-20 Qualcomm Incorporated Transitioning of ambient higher-order ambisonic coefficients
CN107851444A (zh) * 2015-07-24 2018-03-27 声音对象技术股份有限公司 用于将声学信号分解为声音对象的方法和系统、声音对象及其使用
WO2018059742A1 (fr) 2016-09-30 2018-04-05 Benjamin Bernard Procede de conversion, d'encodage stereophonique, de decodage et de transcodage d'un signal audio tridimensionnel
US9961469B2 (en) 2013-09-17 2018-05-01 Wilus Institute Of Standards And Technology Inc. Method and device for audio signal processing
US10531215B2 (en) 2010-07-07 2020-01-07 Samsung Electronics Co., Ltd. 3D sound reproducing method and apparatus
US10609503B2 (en) 2018-04-08 2020-03-31 Dts, Inc. Ambisonic depth extraction
US10652683B2 (en) 2014-01-10 2020-05-12 Samsung Electronics Co., Ltd. Method and apparatus for reproducing three-dimensional audio
US10770087B2 (en) 2014-05-16 2020-09-08 Qualcomm Incorporated Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals
US10854210B2 (en) 2016-09-16 2020-12-01 Coronal Audio S.A.S. Device and method for capturing and processing a three-dimensional acoustic field
US10932082B2 (en) 2016-06-21 2021-02-23 Dolby Laboratories Licensing Corporation Headtracking for pre-rendered binaural audio
GB2598960A (en) * 2020-09-22 2022-03-23 Nokia Technologies Oy Parametric spatial audio rendering with near-field effect

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2733964A1 (fr) * 2012-11-15 2014-05-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Réglage par segment de signal audio spatial sur différents paramétrages de haut-parleur de lecture
EP2866475A1 (fr) 2013-10-23 2015-04-29 Thomson Licensing Procédé et appareil pour décoder une représentation du champ acoustique audio pour lecture audio utilisant des configurations 2D
US9875745B2 (en) * 2014-10-07 2018-01-23 Qualcomm Incorporated Normalization of ambient higher order ambisonic audio data
US9826297B2 (en) 2014-10-29 2017-11-21 At&T Intellectual Property I, L.P. Accessory device that provides sensor input to a media device
KR102516625B1 (ko) * 2015-01-30 2023-03-30 디티에스, 인코포레이티드 몰입형 오디오를 캡처하고, 인코딩하고, 분산하고, 디코딩하기 위한 시스템 및 방법
HUE056176T2 (hu) 2015-02-12 2022-02-28 Dolby Laboratories Licensing Corp Fejhallgató virtualizálás
US9980055B2 (en) * 2015-10-12 2018-05-22 Oticon A/S Hearing device and a hearing system configured to localize a sound source
CN105376690A (zh) * 2015-11-04 2016-03-02 北京时代拓灵科技有限公司 生成虚拟环绕声的方法和装置
NZ750171A (en) * 2016-01-18 2022-04-29 Boomcloud 360 Inc Subband spatial and crosstalk cancellation for audio reproduction
US10225657B2 (en) 2016-01-18 2019-03-05 Boomcloud 360, Inc. Subband spatial and crosstalk cancellation for audio reproduction
CN108781331B (zh) 2016-01-19 2020-11-06 云加速360公司 用于头戴式扬声器的音频增强
CN105792090B (zh) * 2016-04-27 2018-06-26 华为技术有限公司 一种增加混响的方法与装置
CN107358960B (zh) * 2016-05-10 2021-10-26 华为技术有限公司 多声道信号的编码方法和编码器
CN107968984B (zh) * 2016-10-20 2019-08-20 中国科学院声学研究所 一种5-2通道音频转换优化方法
CN107182003B (zh) * 2017-06-01 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) 机载三维通话虚拟听觉处理方法
US10313820B2 (en) 2017-07-11 2019-06-04 Boomcloud 360, Inc. Sub-band spatial audio enhancement
CN107920303B (zh) * 2017-11-21 2019-12-24 北京时代拓灵科技有限公司 一种音频采集的方法及装置
US10764704B2 (en) 2018-03-22 2020-09-01 Boomcloud 360, Inc. Multi-channel subband spatial processing for loudspeakers
CN113302692A (zh) * 2018-10-26 2021-08-24 弗劳恩霍夫应用研究促进协会 基于方向响度图的音频处理
CN111757240B (zh) * 2019-03-26 2021-08-20 瑞昱半导体股份有限公司 音频处理方法与音频处理系统
CN111757239B (zh) * 2019-03-28 2021-11-19 瑞昱半导体股份有限公司 音频处理方法与音频处理系统
JP2022543121A (ja) * 2019-08-08 2022-10-07 ジーエヌ ヒアリング エー/エス 1人以上の所望の話者の音声を強調するバイラテラル補聴器システム及び方法
US10841728B1 (en) 2019-10-10 2020-11-17 Boomcloud 360, Inc. Multi-channel crosstalk processing
CN114173256B (zh) * 2021-12-10 2024-04-19 中国电影科学技术研究所 一种还原声场空间及姿态追踪的方法、装置和设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006072270A1 (fr) * 2005-01-10 2006-07-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Information compacte pour le codage parametrique de signal audio spatial
WO2007031896A1 (fr) * 2005-09-13 2007-03-22 Koninklijke Philips Electronics N.V. Codage audio
WO2007096808A1 (fr) * 2006-02-21 2007-08-30 Koninklijke Philips Electronics N.V. Codage et décodage audio

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4637725B2 (ja) * 2005-11-11 2011-02-23 ソニー株式会社 音声信号処理装置、音声信号処理方法、プログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006072270A1 (fr) * 2005-01-10 2006-07-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Information compacte pour le codage parametrique de signal audio spatial
WO2007031896A1 (fr) * 2005-09-13 2007-03-22 Koninklijke Philips Electronics N.V. Codage audio
WO2007096808A1 (fr) * 2006-02-21 2007-08-30 Koninklijke Philips Electronics N.V. Codage et décodage audio

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FALLER C.: 'Proc. of the 7th Int. Conf. DAFx'04, Napoles, Italy, October 5-8, 2004', article 'Parametric coding of spatial audio' *

Cited By (142)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9584235B2 (en) 2009-12-16 2017-02-28 Nokia Technologies Oy Multi-channel audio processing
US9282417B2 (en) 2010-02-02 2016-03-08 Koninklijke N.V. Spatial sound reproduction
RU2719283C1 (ru) * 2010-07-07 2020-04-17 Самсунг Электроникс Ко., Лтд. Способ и устройство для воспроизведения трехмерного звука
US10531215B2 (en) 2010-07-07 2020-01-07 Samsung Electronics Co., Ltd. 3D sound reproducing method and apparatus
US8831931B2 (en) 2010-08-25 2014-09-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus for generating a decorrelated signal using transmitted phase information
US9431019B2 (en) 2010-08-25 2016-08-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus for decoding a signal comprising transients using a combining unit and a mixer
US9368122B2 (en) 2010-08-25 2016-06-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus for generating a decorrelated signal using transmitted phase information
WO2012025283A1 (fr) * 2010-08-25 2012-03-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Appareil conçu pour générer un signal décorrélé au moyen d'informations de phase émises
US10109282B2 (en) 2010-12-03 2018-10-23 Friedrich-Alexander-Universitaet Erlangen-Nuernberg Apparatus and method for geometry-based spatial audio coding
RU2570359C2 (ru) * 2010-12-03 2015-12-10 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Прием звука посредством выделения геометрической информации из оценок направления его поступления
US9396731B2 (en) 2010-12-03 2016-07-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Sound acquisition via the extraction of geometrical information from direction of arrival estimates
US9666203B2 (en) 2012-01-13 2017-05-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Device and method for calculating loudspeaker signals for a plurality of loudspeakers while using a delay in the frequency domain
WO2013104529A1 (fr) * 2012-01-13 2013-07-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Dispositif et procédé pour le calcul de signaux de haut-parleurs pour une pluralité de haut-parleurs, utilisant une temporisation dans la gamme de fréquence
EP4012703A1 (fr) * 2012-05-14 2022-06-15 Dolby International AB Procédé et appareil de compression et de décompression d'une représentation de signaux d'ambiophonie d'ordre supérieur
AU2013261933B2 (en) * 2012-05-14 2017-02-02 Dolby International Ab Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation
KR20230058548A (ko) * 2012-05-14 2023-05-03 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
AU2021203791B2 (en) * 2012-05-14 2022-09-01 Dolby International Ab Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation
KR20150010727A (ko) * 2012-05-14 2015-01-28 톰슨 라이센싱 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
CN104285390A (zh) * 2012-05-14 2015-01-14 汤姆逊许可公司 压缩和解压缩高阶高保真度立体声响复制信号表示的方法及装置
KR102121939B1 (ko) * 2012-05-14 2020-06-11 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
EP3564952A1 (fr) * 2012-05-14 2019-11-06 Dolby International AB Procédé et appareil de compression et de décompression d'une représentation de signaux d'ambiophonie d'ordre supérieur
KR20200067954A (ko) * 2012-05-14 2020-06-12 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
KR20220112856A (ko) * 2012-05-14 2022-08-11 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
KR102231498B1 (ko) * 2012-05-14 2021-03-24 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
KR102427245B1 (ko) * 2012-05-14 2022-07-29 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
EP4246511A3 (fr) * 2012-05-14 2023-09-27 Dolby International AB Procédé et appareil de compression et de décompression d'une représentation de signal d'ambiophonie d'ordre supérieur
US9454971B2 (en) 2012-05-14 2016-09-27 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
US10390164B2 (en) 2012-05-14 2019-08-20 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
KR20210034101A (ko) * 2012-05-14 2021-03-29 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
US9980073B2 (en) 2012-05-14 2018-05-22 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
KR102651455B1 (ko) * 2012-05-14 2024-03-27 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
AU2016262783B2 (en) * 2012-05-14 2018-12-06 Dolby International Ab Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation
KR102526449B1 (ko) * 2012-05-14 2023-04-28 돌비 인터네셔널 에이비 고차 앰비소닉스 신호 표현의 압축 및 압축 해제 방법 및 장치
US11234091B2 (en) 2012-05-14 2022-01-25 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation
EP2665208A1 (fr) * 2012-05-14 2013-11-20 Thomson Licensing Procédé et appareil de compression et de décompression d'une représentation de signaux d'ambiophonie d'ordre supérieur
US11792591B2 (en) 2012-05-14 2023-10-17 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order Ambisonics signal representation
WO2013171083A1 (fr) * 2012-05-14 2013-11-21 Thomson Licensing Procédé et appareil de compression et de décompression d'une représentation de signal ambiophonique d'ordre supérieur
US9445199B2 (en) 2012-11-29 2016-09-13 Dolby Laboratories Licensing Corporation Method and apparatus for determining dominant sound source directions in a higher order Ambisonics representation of a sound field
WO2014082883A1 (fr) * 2012-11-29 2014-06-05 Thomson Licensing Procédé et appareil permettant de déterminer des directions de sources sonores dominantes dans une représentation d'ambiophonie d'ordre supérieur d'un champ sonore
TWI633792B (zh) * 2012-11-29 2018-08-21 瑞典商杜比國際公司 聲場的高階保真立體音響表示法中主聲源方向之決定方法和裝置
EP2738962A1 (fr) * 2012-11-29 2014-06-04 Thomson Licensing Procédé et appareil pour la détermination des directions de source sonore dominante dans une représentation d'ambiophonie d'ordre supérieur d'un champ sonore
KR20230098355A (ko) * 2012-12-12 2023-07-03 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
US11184730B2 (en) 2012-12-12 2021-11-23 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order ambisonics representation for a sound field
KR102664626B1 (ko) 2012-12-12 2024-05-10 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
KR102428842B1 (ko) 2012-12-12 2022-08-04 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
KR102546541B1 (ko) 2012-12-12 2023-06-23 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
KR20210007036A (ko) * 2012-12-12 2021-01-19 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
KR20220113839A (ko) * 2012-12-12 2022-08-16 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
KR102202973B1 (ko) 2012-12-12 2021-01-14 돌비 인터네셔널 에이비 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
US11546712B2 (en) 2012-12-12 2023-01-03 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order ambisonics representation for a sound field
KR20150095660A (ko) * 2012-12-12 2015-08-21 톰슨 라이센싱 사운드 필드를 위해 고차 앰비소닉스 표현을 압축 및 압축 해제하기 위한 방법 및 장치
WO2014177202A1 (fr) * 2013-04-30 2014-11-06 Huawei Technologies Co., Ltd. Appareil de traitement de signal audio
US9769586B2 (en) 2013-05-29 2017-09-19 Qualcomm Incorporated Performing order reduction with respect to higher order ambisonic coefficients
US9384741B2 (en) 2013-05-29 2016-07-05 Qualcomm Incorporated Binauralization of rotated higher order ambisonics
WO2014193993A1 (fr) * 2013-05-29 2014-12-04 Qualcomm Incorporated Filtrage avec réponses impulsionnelles de salle binauriculaire
KR101788954B1 (ko) 2013-05-29 2017-10-20 퀄컴 인코포레이티드 바이노럴 룸 임펄스 응답들에 의한 필터링
US9854377B2 (en) 2013-05-29 2017-12-26 Qualcomm Incorporated Interpolation for decomposed representations of a sound field
US9774977B2 (en) 2013-05-29 2017-09-26 Qualcomm Incorporated Extracting decomposed representations of a sound field based on a second configuration mode
US9883312B2 (en) 2013-05-29 2018-01-30 Qualcomm Incorporated Transformed higher order ambisonics audio data
WO2014194003A1 (fr) * 2013-05-29 2014-12-04 Qualcomm Incorporated Exécution d'une analyse de positions afin de coder des coefficients harmoniques sphériques
US9763019B2 (en) 2013-05-29 2017-09-12 Qualcomm Incorporated Analysis of decomposed representations of a sound field
US9749768B2 (en) 2013-05-29 2017-08-29 Qualcomm Incorporated Extracting decomposed representations of a sound field based on a first configuration mode
WO2014194088A3 (fr) * 2013-05-29 2015-03-19 Qualcomm Incorporated Binauralisation d'ambiophonie rotative d'ordre supérieur
KR20160015284A (ko) * 2013-05-29 2016-02-12 퀄컴 인코포레이티드 회전된 고차 앰비소닉스의 바이노럴화
US9716959B2 (en) 2013-05-29 2017-07-25 Qualcomm Incorporated Compensating for error in decomposed representations of sound fields
US9980074B2 (en) 2013-05-29 2018-05-22 Qualcomm Incorporated Quantization step sizes for compression of spatial components of a sound field
US9369818B2 (en) 2013-05-29 2016-06-14 Qualcomm Incorporated Filtering with binaural room impulse responses with content analysis and weighting
US10499176B2 (en) 2013-05-29 2019-12-03 Qualcomm Incorporated Identifying codebooks to use when coding spatial components of a sound field
US9674632B2 (en) 2013-05-29 2017-06-06 Qualcomm Incorporated Filtering with binaural room impulse responses
JP2016523467A (ja) * 2013-05-29 2016-08-08 クゥアルコム・インコーポレイテッドQualcomm Incorporated 回転された高次アンビソニックスのバイノーラル化
KR101723332B1 (ko) 2013-05-29 2017-04-04 퀄컴 인코포레이티드 회전된 고차 앰비소닉스의 바이노럴화
US9420393B2 (en) 2013-05-29 2016-08-16 Qualcomm Incorporated Binaural rendering of spherical harmonic coefficients
US9502044B2 (en) 2013-05-29 2016-11-22 Qualcomm Incorporated Compression of decomposed representations of a sound field
US9466305B2 (en) 2013-05-29 2016-10-11 Qualcomm Incorporated Performing positional analysis to code spherical harmonic coefficients
US9495968B2 (en) 2013-05-29 2016-11-15 Qualcomm Incorporated Identifying sources from which higher order ambisonic audio data is generated
US11962990B2 (en) 2013-05-29 2024-04-16 Qualcomm Incorporated Reordering of foreground audio objects in the ambisonics domain
US11146903B2 (en) 2013-05-29 2021-10-12 Qualcomm Incorporated Compression of decomposed representations of a sound field
US10455346B2 (en) 2013-09-17 2019-10-22 Wilus Institute Of Standards And Technology Inc. Method and device for audio signal processing
US11096000B2 (en) 2013-09-17 2021-08-17 Wilus Institute Of Standards And Technology Inc. Method and apparatus for processing multimedia signals
US11622218B2 (en) 2013-09-17 2023-04-04 Wilus Institute Of Standards And Technology Inc. Method and apparatus for processing multimedia signals
US9961469B2 (en) 2013-09-17 2018-05-01 Wilus Institute Of Standards And Technology Inc. Method and device for audio signal processing
US10469969B2 (en) 2013-09-17 2019-11-05 Wilus Institute Of Standards And Technology Inc. Method and apparatus for processing multimedia signals
EP3062535A4 (fr) * 2013-10-22 2017-07-05 Industry-Academic Cooperation Foundation, Yonsei University Procédé et appareil conçus pour le traitement d'un signal audio
EP3062534A4 (fr) * 2013-10-22 2017-07-05 Electronics and Telecommunications Research Institute Procédé de génération de filtre pour un signal audio, et dispositif de paramétrage correspondant
US11195537B2 (en) 2013-10-22 2021-12-07 Industry-Academic Cooperation Foundation, Yonsei University Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain
US10692508B2 (en) 2013-10-22 2020-06-23 Electronics And Telecommunications Research Institute Method for generating filter for audio signal and parameterizing device therefor
US10204630B2 (en) 2013-10-22 2019-02-12 Electronics And Telecommunications Research Instit Ute Method for generating filter for audio signal and parameterizing device therefor
US10580417B2 (en) 2013-10-22 2020-03-03 Industry-Academic Cooperation Foundation, Yonsei University Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain
US9832589B2 (en) 2013-12-23 2017-11-28 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
US11109180B2 (en) 2013-12-23 2021-08-31 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
US11689879B2 (en) 2013-12-23 2023-06-27 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
US10158965B2 (en) 2013-12-23 2018-12-18 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
US10701511B2 (en) 2013-12-23 2020-06-30 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
US10433099B2 (en) 2013-12-23 2019-10-01 Wilus Institute Of Standards And Technology Inc. Method for generating filter for audio signal, and parameterization device for same
CN108597528A (zh) * 2013-12-23 2018-09-28 韦勒斯标准与技术协会公司 生成用于音频信号的滤波器的方法及其参数化装置
US10863298B2 (en) 2014-01-10 2020-12-08 Samsung Electronics Co., Ltd. Method and apparatus for reproducing three-dimensional audio
US10652683B2 (en) 2014-01-10 2020-05-12 Samsung Electronics Co., Ltd. Method and apparatus for reproducing three-dimensional audio
US9653086B2 (en) 2014-01-30 2017-05-16 Qualcomm Incorporated Coding numbers of code vectors for independent frames of higher-order ambisonic coefficients
US9922656B2 (en) 2014-01-30 2018-03-20 Qualcomm Incorporated Transitioning of ambient higher-order ambisonic coefficients
US9489955B2 (en) 2014-01-30 2016-11-08 Qualcomm Incorporated Indicating frame parameter reusability for coding vectors
US9502045B2 (en) 2014-01-30 2016-11-22 Qualcomm Incorporated Coding independent frames of ambient higher-order ambisonic coefficients
US9747911B2 (en) 2014-01-30 2017-08-29 Qualcomm Incorporated Reuse of syntax element indicating vector quantization codebook used in compressing vectors
US9747912B2 (en) 2014-01-30 2017-08-29 Qualcomm Incorporated Reuse of syntax element indicating quantization mode used in compressing vectors
US9754600B2 (en) 2014-01-30 2017-09-05 Qualcomm Incorporated Reuse of index of huffman codebook for coding vectors
US11343630B2 (en) 2014-03-19 2022-05-24 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US10321254B2 (en) 2014-03-19 2019-06-11 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US10070241B2 (en) 2014-03-19 2018-09-04 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US10771910B2 (en) 2014-03-19 2020-09-08 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US9832585B2 (en) 2014-03-19 2017-11-28 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US10999689B2 (en) 2014-03-19 2021-05-04 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and apparatus
US9848275B2 (en) 2014-04-02 2017-12-19 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and device
KR102216657B1 (ko) * 2014-04-02 2021-02-17 주식회사 윌러스표준기술연구소 오디오 신호 처리 방법 및 장치
US10129685B2 (en) 2014-04-02 2018-11-13 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and device
KR20150114874A (ko) * 2014-04-02 2015-10-13 주식회사 윌러스표준기술연구소 오디오 신호 처리 방법 및 장치
US9860668B2 (en) 2014-04-02 2018-01-02 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and device
US10469978B2 (en) 2014-04-02 2019-11-05 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and device
US9986365B2 (en) 2014-04-02 2018-05-29 Wilus Institute Of Standards And Technology Inc. Audio signal processing method and device
US9620137B2 (en) 2014-05-16 2017-04-11 Qualcomm Incorporated Determining between scalar and vector quantization in higher order ambisonic coefficients
US9852737B2 (en) 2014-05-16 2017-12-26 Qualcomm Incorporated Coding vectors decomposed from higher-order ambisonics audio signals
US10770087B2 (en) 2014-05-16 2020-09-08 Qualcomm Incorporated Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals
US9747910B2 (en) 2014-09-26 2017-08-29 Qualcomm Incorporated Switching between predictive and non-predictive quantization techniques in a higher order ambisonics (HOA) framework
CN107851444A (zh) * 2015-07-24 2018-03-27 声音对象技术股份有限公司 用于将声学信号分解为声音对象的方法和系统、声音对象及其使用
US10200806B2 (en) 2016-06-17 2019-02-05 Dts, Inc. Near-field binaural rendering
TWI744341B (zh) * 2016-06-17 2021-11-01 美商Dts股份有限公司 使用近場/遠場渲染之距離聲相偏移
US9973874B2 (en) 2016-06-17 2018-05-15 Dts, Inc. Audio rendering using 6-DOF tracking
JP2019523913A (ja) * 2016-06-17 2019-08-29 ディーティーエス・インコーポレイテッドDTS,Inc. 近/遠距離レンダリングを用いた距離パニング
JP7039494B2 (ja) 2016-06-17 2022-03-22 ディーティーエス・インコーポレイテッド 近/遠距離レンダリングを用いた距離パニング
KR102483042B1 (ko) * 2016-06-17 2022-12-29 디티에스, 인코포레이티드 근거리/원거리 렌더링을 사용한 거리 패닝
US10820134B2 (en) 2016-06-17 2020-10-27 Dts, Inc. Near-field binaural rendering
CN109891502A (zh) * 2016-06-17 2019-06-14 Dts公司 使用近/远场渲染的距离摇移
WO2017218973A1 (fr) 2016-06-17 2017-12-21 Edward Stein Panoramique en fonction de distance à l'aide d'un rendu de champ proche/lointain
CN109891502B (zh) * 2016-06-17 2023-07-25 Dts公司 一种近场双耳渲染方法、系统及可读存储介质
US10231073B2 (en) 2016-06-17 2019-03-12 Dts, Inc. Ambisonic audio rendering with depth decoding
EP3472832A4 (fr) * 2016-06-17 2020-03-11 DTS, Inc. Panoramique en fonction de distance à l'aide d'un rendu de champ proche/lointain
KR20190028706A (ko) * 2016-06-17 2019-03-19 디티에스, 인코포레이티드 근거리/원거리 렌더링을 사용한 거리 패닝
US11553296B2 (en) 2016-06-21 2023-01-10 Dolby Laboratories Licensing Corporation Headtracking for pre-rendered binaural audio
US10932082B2 (en) 2016-06-21 2021-02-23 Dolby Laboratories Licensing Corporation Headtracking for pre-rendered binaural audio
US10854210B2 (en) 2016-09-16 2020-12-01 Coronal Audio S.A.S. Device and method for capturing and processing a three-dimensional acoustic field
WO2018059742A1 (fr) 2016-09-30 2018-04-05 Benjamin Bernard Procede de conversion, d'encodage stereophonique, de decodage et de transcodage d'un signal audio tridimensionnel
US11232802B2 (en) 2016-09-30 2022-01-25 Coronal Encoding S.A.S. Method for conversion, stereophonic encoding, decoding and transcoding of a three-dimensional audio signal
US10609503B2 (en) 2018-04-08 2020-03-31 Dts, Inc. Ambisonic depth extraction
GB2598960A (en) * 2020-09-22 2022-03-23 Nokia Technologies Oy Parametric spatial audio rendering with near-field effect

Also Published As

Publication number Publication date
CN101884065B (zh) 2013-07-10
GB2467668B (en) 2011-12-07
GB2467668A (en) 2010-08-11
WO2009046223A3 (fr) 2009-06-11
GB201006665D0 (en) 2010-06-09
CN101884065A (zh) 2010-11-10

Similar Documents

Publication Publication Date Title
US8374365B2 (en) Spatial audio analysis and synthesis for binaural reproduction and format conversion
WO2009046223A2 (fr) Analyse audio spatiale et synthèse pour la reproduction binaurale et la conversion de format
US10820134B2 (en) Near-field binaural rendering
US10609503B2 (en) Ambisonic depth extraction
CN107925815B (zh) 空间音频处理装置
US9154896B2 (en) Audio spatialization and environment simulation
US10349197B2 (en) Method and device for generating and playing back audio signal
JP4944902B2 (ja) バイノーラルオーディオ信号の復号制御
RU2752600C2 (ru) Способ и устройство для рендеринга акустического сигнала и машиночитаемый носитель записи
US20120039477A1 (en) Audio signal synthesizing
JP2009530916A (ja) サブフィルタを用いたバイノーラル表現
CN110418274B (zh) 用于渲染声学信号的方法和装置及计算机可读记录介质
CN113170271B (zh) 用于处理立体声信号的方法和装置
CN110326310B (zh) 串扰消除的动态均衡
EP1989920A1 (fr) Codage et décodage audio
EP2130204A1 (fr) Procédé et appareil de conversion entre formats audio multicanaux
Goodwin et al. Binaural 3-D audio rendering based on spatial audio scene coding
Jot et al. Binaural simulation of complex acoustic scenes for interactive audio
Nagel et al. Dynamic binaural cue adaptation
Floros et al. Spatial enhancement for immersive stereo audio applications
He et al. Literature review on spatial audio
JP2024502732A (ja) バイノーラル信号の後処理
CN114762040A (zh) 将双耳信号转换为立体声音频信号

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880119120.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08836621

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

ENP Entry into the national phase in:

Ref document number: 1006665

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20081002

WWE Wipo information: entry into national phase

Ref document number: 1006665.2

Country of ref document: GB

122 Ep: pct application non-entry in european phase

Ref document number: 08836621

Country of ref document: EP

Kind code of ref document: A2