EP3980993B1 - Decodierer von hybridem räumlichem audio - Google Patents

Decodierer von hybridem räumlichem audio Download PDF

Info

Publication number
EP3980993B1
EP3980993B1 EP20711440.6A EP20711440A EP3980993B1 EP 3980993 B1 EP3980993 B1 EP 3980993B1 EP 20711440 A EP20711440 A EP 20711440A EP 3980993 B1 EP3980993 B1 EP 3980993B1
Authority
EP
European Patent Office
Prior art keywords
channel subset
signals
decoder
channel
active
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP20711440.6A
Other languages
English (en)
French (fr)
Other versions
EP3980993A1 (de
EP3980993C0 (de
Inventor
Michael M. Goodwin
Zoran Fejzo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DTS Inc
Original Assignee
DTS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DTS Inc filed Critical DTS Inc
Publication of EP3980993A1 publication Critical patent/EP3980993A1/de
Application granted granted Critical
Publication of EP3980993C0 publication Critical patent/EP3980993C0/de
Publication of EP3980993B1 publication Critical patent/EP3980993B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • a spatial audio signal decoder typically performs one or more operations to convert spatial audio signals from an input spatial audio format to an output spatial audio format.
  • An exemplary spatial audio signal decoder is described in US 2014/0350944 A1 .
  • Known spatial audio signal format decoding techniques include passive decoding and active decoding.
  • a passive spatial decoder carries out decoding operations that are based upon the input spatial audio format and the output spatial audio format and perhaps external parameters such as frequency, for example, but do not depend upon spatial characteristics of the audio input signal, such as the direction of arrival of audio sources in the audio input signal, for example.
  • a passive spatial decoder performs one or more operations that can be based in part upon spatial audio signal format but that are independent of the spatial characteristics of the input signal.
  • An active spatial decoder carries out decoding operations that are based upon the input spatial audio format, the output spatial audio format and perhaps external parameters such as frequency, for example, as well as spatial characteristics of the audio input signal.
  • An active spatial decoder often performs one or more operations that are adapted to the spatial characteristics of the audio input signal.
  • Active and passive spatial decoders often lack universality. Passive spatial decoders often blur directional audio sources. For example, passive spatial decoders sometimes render a discrete point source in an input audio signal format to all of the channels of an output spatial audio format (corresponding to an audio playback system) instead of to a subset localized to the point-source direction. Active spatial decoders, on the other hand, often focus diffuse sources by modeling such sources as directional, for example, as a small number of acoustic plane waves. As a result, an active spatial decoder sometimes imparts directionality to nondirectional audio signals. For example, an active spatial decoder sometimes renders nondirectional reverberations from a particular direction in an output spatial audio format (corresponding to an audio playback system) such that the spatial characteristics of the reverberation are not preserved by the decoder.
  • a spatial audio signal decoder includes a processor and storage media operably coupled thereto, the storage media comprising a plurality of instructions that when executed, cause the processor to perform a process that includes receiving an input spatial audio signal including a first set of channels having an input spatial format.
  • the first set of channels of the input spatial audio signal is partitioned into at least a first channel subset of the input spatial audio signal and a second channel subset of the input spatial audio signal.
  • Estimates of a number and directions of arrival are determined for audio sources represented in at least a portion of the channels of the input spatial audio signal.
  • One of the active and passive components of the first channel subset signals is determined based at least in part on the estimated number and directions of arrival of directional audio sources.
  • the other of the active and passive components of the first channel subset signals is determined based at least in part on the determined one of the active and passive components of the first channel subset signals.
  • An active component of the second channel subset signals is determined, based at least in part on the estimated directions of arrival of directional audio sources.
  • a passive component of the second channel subset signals is determined, based upon the second channel subset signals and the active component of the second channel subset signals.
  • the active component of the first channel subset signals is decoded to a first output signal having a first output format.
  • the passive components of the first and second channel subset signals are decoded to a second output signal having a second output format.
  • a high-quality first-order ambisonic (FOA) spatial audio decoder that uses a combination of active and passive decoding was developed previously. See, U.S. patent application serial no. 16/543,083, filed 08-16-2019 , entitled Spatial Audio Signal Decoder, suitable to decode first order ambisonic (FOA) signals.
  • the previous high-quality FOA spatial audio decoder determines a number and direction of arrival of directional audio sources represented in input spatial audio signals having an input spatial format.
  • the previous decoder determines active spatial audio signal components and passive spatial audio signal components of the input audio signal, based upon the determined number and direction of arrival of directional audio sources.
  • the previous decoder decodes the active input spatial audio signal components to a first output signal and decodes the passive input signal components to a second output signal.
  • the first output signal and second output signal are combined to form a final output signal.
  • HOA ambisonic
  • One possible approach is to extend the active and passive decoding approach used in the previously filed patent application to the HOA channels.
  • active decoding is computationally expensive and active decoding of HOA channels could significantly increase decoder computational expense and complexity.
  • the inventors recognized, though, that active decoding of HOA channels may not be perceptually necessary since an FOA decoder previously developed by the inventors is spatially robust, which largely obviates the need for active decoding of HOA channels to capture spatial information.
  • the inventors have developed a decoder which determines a number and directions of arrival of directional audio sources represented in input spatial audio signals having an input spatial format.
  • the decoder determines active spatial audio signal components and passive spatial audio signal components of the input audio signal, based upon the determined number and directions of arrival of directional audio sources.
  • the decoder decodes the active input spatial audio signal components using an FOA decoder and decodes passive input signal components using an HOA decoder, which also decodes passive FOA components.
  • the inventors have developed a hybrid higher order ambisonic decoder system.
  • the decoder system partitions an input signal that includes a set of channels into a first channel subset and a second channel subset.
  • An ambisonic input audio signal comprises a plurality of audio channels, each channel corresponding to a different three-dimensional directivity pattern. Audio sources are encoded in the ambisonic channels in accordance with the directivity patterns; for instance, a point source in a certain direction is encoded in the ambisonic format channels with the respective gains of the channel directivity patterns for that direction.
  • the decoder system uses spatial analysis of the signals in the first channel subset to identify directions that correspond to directions of arrival of active sound sources represented in the input signal.
  • the decoder system uses an active FOA decoder to decode signal components within the first channel subset that are associated with directions that correspond to directions of arrival of active sound sources represented in the input signal.
  • the decoder uses a passive HOA decoder to decode signal components within the first channel subset that do not correspond to the directions of arrival of active sound sources and to decode signal components within the second channel subset that do not correspond to the directions of arrival of active sound sources.
  • Figure 1A is an illustrative generalized block diagram representing operation of an example first order ambisonic (FOA) spatial audio decoder 106 to decode an input spatial audio signal 102 in an input spatial audio format 104 to an output spatial audio signal 108 in an output spatial audio format suitable for a multichannel audio reproduction system 110.
  • the example spatial audio decoder 106 transforms a multichannel input signal in an FOA B-format to an output signal in a multichannel audio format suitable for playback in the multichannel audio reproduction system.
  • the example input channels are the W, X, Y, Z channels of the B-format.
  • each of channels W, X, Y, Z corresponds to a spherical harmonic function having a different three-dimensional spatial directivity pattern.
  • a spatial audio decoder 106 implemented as a passive decoder performs the transformation of the input signal 102 from the input spatial format to the output spatial format independent of spatial characteristics of the audio input signal, such as direction of arrival of sound sources in the audio input signal, as explained below.
  • a spatial audio decoder 106 implemented as an active decoder performs the transformation of the input spatial audio signal 102 from the input spatial format to the output spatial format based at least in part upon spatial characteristics of the audio input signal.
  • Figure 1B is an illustrative drawing representing an example configuration of the generalized FOA spatial audio decoder of Figure 1A .
  • the decoder is configured to map an input spatial audio signal in an input spatial format to an output spatial audio signal in an output spatial format.
  • one example decoder is configured as an active FOA spatial decoder 308, and another example decoder is configured as a passive HOA spatial decoder 310.
  • each input spatial audio signal includes multiple audio signal channels and that each output spatial audio signal includes multiple audio signal channels.
  • the example decoder includes one or more mapping operations to map M input spatial audio signal channels to N output spatial audio signal channels.
  • an example mapping operation includes an M-by-N spatial decoder matrix to map M input spatial audio signal channels in an FOA B-format to N spatial audio output playback signal channels in an output spatial format.
  • a decoder configured as a passive decoder uses mapping operations that depend on the input spatial format and output spatial format but do not depend on the specific characteristics of the input signal, for instance the directions of arrival of sound sources in the signal
  • a decoder configured as an active decoder uses mapping operations that depend on the input format, the output format, and characteristics of the input signal.
  • decoding matrix or matrices used in a passive decoder are static (fixed) whereas the decoding matrix or matrices used in an active decoder may vary in time based on the input signal behavior.
  • the value of M is four since the input spatial format is the FOA B-format, which has four channels, and the value of N depends, at least in part, upon the number of speakers in the multichannel audio reproduction system.
  • An FOA spatial format representation or encoding of an individual point-source sound associated with a direction ⁇ includes audio input signal components W, Y, Y, Z that correspond to respective three-dimensional channel directivity patterns.
  • the angular pair ⁇ represents a direction with respect to a reference point.
  • the term 'soundfield' refers to a region in a material medium, such as air for example, in which sound waves propagate.
  • a spatial audio scene or soundfield is encoded in a plurality of channels (referred to as W, X, Y, and Z ) in accordance with the directivity patterns defined in the above vector d ⁇ ⁇ .
  • W, X, Y, and Z a plurality of channels
  • W X Y Z 1 2 cos ⁇ cos ⁇ sin ⁇ cos ⁇ sin ⁇ S .
  • Ambisonics is a technique to represent a soundfield by measuring sound at a single point in the soundfield with a plurality of different directivity patterns; sound from a certain direction is encoded into the ambisonic channels in accordance with the gain of the respective directivity patterns at that direction.
  • a soundfield can be represented by multiple input spatial audio signals, each characterized by a different directivity pattern relative to the reference point.
  • sound can be synthetically encoded into ambisonic channels using a sound composition tool, for example.
  • the directivity patterns associated with the ambisonics channels are designed such that a plurality of ambisonic-encoded signals carry directional information for all of the sounds in an entire soundfield.
  • An ambisonic encoder (not shown) encodes a soundfield in an ambisonic format.
  • An ambisonic format is independent from the specific loudspeaker layout which may be used to reconstruct the encoded soundfield.
  • An ambisonic decoder decodes ambisonic format signals for a specific loudspeaker layout.
  • Eric Benjamin, Richard Lee, and Aaron Heller, Is My Decoder Ambisonic?, 125th AES Convention, San Francisco 2008 provides a general explanation of ambisonics.
  • FIG. 2 is an illustrative generalized block diagram representing operation of an example spatial audio decoder 206 to decode an input spatial audio signal 202 in an HOA spatial audio format 204 to an output spatial audio signal 208 in an output spatial audio format suitable for a multichannel audio reproduction system 210.
  • the example, spatial audio spatial decoder 206 transforms a multichannel input signal 202 that includes FOA and HOA channels to an output signal 208 in a multichannel audio format suitable for playback in the multichannel audio reproduction system 210.
  • the example input channels correspond to four spherical harmonic function directivity patterns for the FOA channels W, X, Y, Z, five spherical harmonic function directivity patterns for the second-order ambisonic channels R, S, T, U, V, and seven spherical harmonic function directivity patterns for the third-order ambisonic channels K, M, L, N, O, P, Q.
  • each of the channels K through Z corresponds to a different three-dimensional spatial directivity pattern.
  • the second-order ambisonic channels is a set of higher order channels.
  • the third-order ambisonic channels is another set of higher order channels.
  • a higher order input spatial audio signal that represents a sound source can consist of audio input signal channels R through Z (first and second order) or K through Z (first, second, and third order) that are characterized by respective directivity patterns that are functions of an azimuth and elevation angle pair that corresponds to a location of the sound source within a soundfield.
  • Eric Hollerweger, An Introduction to Higher Order Ambisonics, corrected version, October 2008 , Daniel Arteaga, Introduction to Ambisonics, June 2015 , and Dave Malham, Higher order Ambisonic systems, April 2003 provide general explanations of higher order ambisonics.
  • Figure 3 is an illustrative schematic block diagram of an example first higher order ambisonic decoder system 300.
  • the first higher order ambisonic decoder system 300 includes a computer system that includes one or more processor devices configured to be operatively coupled to one or more non-transitory storage devices (i.e. hardware memory) that store instructions to configure the processing devices to provide the processing operation blocks described with reference to Figures 3-4 .
  • the first decoder system 300 of Figure 3 is an ambisonic version of the second spatial decoder system 900 of Figure 9 .
  • the first higher order ambisonic decoder system 300 includes a channel partitioning operation block 304, an active/passive decomposition and projection operation block 306, a subset recombination block 307, a FOA active decoder operation block 308, an HOA passive decoder operation block 310, and a combiner operation block 314.
  • a channel partitioning operation block 304 receives an input audio signal I on line 302 in an ambisonic format.
  • block 304 transforms the input audio signals into a time-frequency representation, for example using a short-time Fourier transform, which often includes a windowing process.
  • An example input signal can include both FOA signal channels and HOA signal channels.
  • the channel partitioning operation block 304 partitions the input signal I into an FOA channel subset and an HOA channel subset and provides the FOA channel subset on line 316 and provides the HOA channel subset on line 318.
  • the FOA channel subset includes the first order signal channels WXYZ.
  • the HOA channel subset includes higher order channels, such as the second order signal components RSTUV and the third order signal components KLMNOPQ, for example. Whereas HOA is typically understood to include the FOA channels, here the HOA channel subset does not include the FOA channels WXYZ.
  • the HOA channel subset on line 318 is designated as HOA (high orders) to underscore this distinction.
  • the active/passive input signal decomposition operation block 306 decomposes the FOA channel subset to identify active FOA components provided on line 320, to identify passive FOA components provided on line 322, and also to identify active components in the HOA channel subset provided on line 324.
  • An example active/passive decomposition block 306 identifies the active FOA components of the FOA channel subset of the input signal 302 that correspond to sound point sources based upon estimated directions of arrival (DOAs) of the sound sources and determines the passive FOA components of the FOA channel subset based upon a difference between the FOA channel subset received on line 316 and the identified active-component FOA channel subset provided on line 320.
  • DOAs estimated directions of arrival
  • an example active/passive decomposition block 306 determines the active FOA components based upon the estimated DOAs.
  • An example active/passive decomposition block 306 also identifies the active high-order HOA components based upon the estimated DOAs.
  • the example active/passive decomposition block 306 provides the active component of the FOA channel subset on line 320 to the FOA active decoder block 308, provides the passive component of the FOA channel subset on line 320, and provides the active-component of the high-order HOA channel subset on line 324 to the subset recombination operation block 307.
  • the subset recombination operation block 307 removes the received active HOA components from the HOA channel subset received on line 318. It will be appreciated that the HOA channel subset provided on line 318 includes both active HOA components and passive HOA components.
  • An example subset recombination operation block 307 determines a passive-component HOA channel subset based upon a difference between the HOA channel subset received on line 318 and the active-component HOA channel subset received on line 324. More specifically, an example subset recombination operation block 307 subtracts the active-component HOA channel subset on line 324 from the HOA channel subset received on line 318.
  • the example subset recombination block 307 provides the passive-component FOA channel subset received on line 320 and the identified passive-component HOA channel subset on line 325. It will be appreciated that the recombined passive-component signal on line 325 is a full HOA signal, meaning it has the low-order (FOA) passive components and the high-order HOA passive components - thus constituting a complete higher order ambisonics signal.
  • FOA low-order
  • the active spatial decoder block 308 transforms the active-component FOA channel subset on line 320 to provide on line 326 actively decoded output signals having an output spatial format.
  • the passive spatial decoder block 310 transforms the passive-component FOA channel subset and the identified passive-component HOA channel subset, provided on line 325, to provide on line 328 passively decoded output signals having the output spatial format. It will be appreciated that the output format in which the active decoder block 308 and the passive decoder block 310 provide the respective decoded active signals and the decoded passive signals is determined based upon configuration of the respective active and passive decoder blocks 308, 310.
  • a feature of ambisonics is to be agnostic to the output format, meaning an input ambisonic signal can be decoded to whatever output format a decoder is configured to provide.
  • the combiner block 314 combines the decoded active signals on line 326 and the decoded passive signals on line 328 to produce a combined output signal on line 330.
  • block 314 transforms the combined output signal from a time-frequency representation to time-domain signals, for instance using an inverse short-time Fourier transform, which may include windowing and overlap-add processing.
  • An example combiner block 314 performs additional processing such as allpass filtering of the decoded passive signals. Different allpass filters may be applied to one or more channels of the decoded passive signals to decorrelate the channels prior to the combination with the decoded active signals. Decorrelation of the channels leads to a more diffuse and less directional rendering, which is generally what is preferable for the decoded passive signals.
  • additional processing of the decoded signals can be carried out before combining the decoded signals; for instance, different filters can be applied to the decoded active and passive signals.
  • additional processing of the decoded signals is carried out after combining the decoded signals; for instance, a filter can be applied for equalization.
  • the active and passive decoders can be configured to have different output formats.
  • the active decoder block 308 and the passive spatial decoder block 310 are configured to decode to different spatial audio formats.
  • the active decoder block 308 can be configured to decode to a binaural format for headphone playback while the passive spatial decoder block 310 can be configured to decode to a multichannel loudspeaker layout, or vice versa.
  • the active spatial decoder block 308 and the passive spatial decoder block 310 can be configured to decode to different multichannel loudspeaker layouts, each of which is a subset or the entirety of an available multichannel loudspeaker layout.
  • the final signal format at the output of the decoder system 300 is a union or other combination of the output formats of the active and passive spatial decoder logic blocks 308, 310.
  • FIG 4 is an illustrative block diagram of the example active/passive decomposition block 306 of Figure 3 .
  • the FOA channel subset received on line 316 at the decomposition block 306 is routed to a direction estimation operation block 404, to a subspace determination operation block 406 and to a residual determination operation block 408.
  • the direction block 404 provides on line 405 an estimate of the number of directional audio sources in the FOA channel subset and the direction of arrival (DOA) of each of the enumerated directional audio sources.
  • the subspace determination block 406 determines the active component of the FOA channel subset provided on line 316 based upon the estimates on line 405 of the number and DOAs of directional sound sources and the FOA channel subset input signal received on line 316.
  • the residual passive component determination block 408 determines the passive component of the FOA channel subset on line 322 based upon a difference between the received input signal on line 316 and the active component of the FOA channel subset on line 320, determined by the subspace determination block 406.
  • the passive component of the FOA channel subset is determined first and the active component of the FOA channel subset is determined thereafter based upon a difference between the received input signal on line 316 and the determined passive component of the FOA channel subset.
  • An active HOA component determination block 420 determines an active component of the HOA channel subset based upon the number of directional audio sources and their directions of arrival (line 405) identified in direction estimation block 404 for the FOA channel subset input signal provided on line 316.
  • the determined active component of the HOA channel subset thus corresponds to the directional audio sources identified in the FOA channel subset input signal. More particularly, an example active HOA component determination block 420 produces active HOA components on line 324 that are consistent with the directional sources in the active component of the FOA channel subset on line 320.
  • FIG. 5 is an illustrative flow diagram representing an example hybrid higher order decoding process 500.
  • a computer system that includes one or more processor devices are configured to operatively couple to one or more non-transitory storage devices store instructions to configure the processing devices to control the blocks of the examples described with reference to Figures 1-4 to perform the example spatial audio format decoding process 500.
  • the modules of Figure 5 correspond to control logic of the one or more processor devices configured according to the instructions.
  • an ambisonic audio input signal is received that includes a set of audio signal channels.
  • module 502 further comprises transforming the input audio signals into a time-frequency representation, for example using a short-time Fourier transform, which often includes a windowing process.
  • the input signal channel set is partitioned into a first channel subset and a second channel subset.
  • an active component of the first channel subset, a passive component of the first channel subset, and an active component of the second channel subset are determined based at least in part upon directions of arrival of audio sources represented in the first channel subset.
  • the determined active component of the second channel subset is removed from the second channel subset to determine the passive component of the second channel subset.
  • the passive component of the first channel subset is combined with the determined passive component of the second channel subset to form a passive component for the complete channel set.
  • module 512 the active component of the first channel subset is decoded using an active decoder to provide first output signals in a specified output format.
  • module 514 the recombined passive component for the complete channel set is decoded using a passive decoder to provide second output signals in a specified output format.
  • module 516 the decoded first output signals and the decoded second output signals are combined to provide combined decoded output signals.
  • module 516 further comprises transforming the output audio signals from a time-frequency representation to time-domain signals, for instance using an inverse short-time Fourier transform, which may include windowing and overlap-add processing.
  • the frequency bins of a short-term Fourier transform are grouped into frequency bands.
  • a spatial analysis is carried out for each band rather than for each bin. This reduces the computational complexity of the spatial decoder system and also facilitates smoothing for the direction estimation process.
  • the frequency range are partitioned into bands. There are different approaches to partitioning the frequency range into bands.
  • One example approach involves the following parameters:
  • an example band partition is determined as follows. All bins below the low frequency cutoff are grouped into a single band. All bins above the high frequency cutoff are grouped into a single band. Between the low and high frequency cutoff, the band edges are distributed logarithmically so as to form a requisite total number of bands (where the low and high bands already formed by the cutoff frequencies are included in the count). Logarithmic spacing is chosen since this is a good mathematical approximation of psychoacoustic models of the frequency resolution of the human auditory system.
  • This scale factor is used in the pseudo-code to construct a partition band by band consisting of B logarithmically spaced bands between frequencies f 0 and f 1 .
  • additional frequency bands may be appended to the frequency partition outside of this frequency range, for instance a low frequency band below frequency f 0 and a high frequency band above frequency f 1 as in the pseudocode in Table 1.
  • the corresponding bins for each frequency band can be derived in a straightforward manner based on the discrete Fourier transform (DFT) size used for the STFT.
  • DFT discrete Fourier transform
  • Figure 6A is an illustrative chart showing the bandwidths of an example frequency band partition as a function of the band center frequencies on a log-log scale.
  • Figure 6B is an illustrative drawing representing an example use of frequency band edges to group frequency bins into frequency bands.
  • each of the tick marks on the horizontal line corresponds to a frequency bin.
  • Each of the longer dashed lines corresponds to a frequency bin identified as a frequency band edge for the partition.
  • the frequency bin corresponding to the lower frequency band edge is included in the frequency band whereas the frequency bin corresponding to the higher frequency band edge is excluded from the frequency band; this latter bin will be included as the lower band edge for the adjacent higher-frequency band.
  • This grouping of frequency bins into frequency bands is depicted by the bracket in Figure 6B .
  • the direction block 404 estimates the number and directions of sources in the FOA signal partition portion (the FOA channel subset) of the input spatial audio signal 316.
  • the source directions which are typically referred to as directions of arrival (DOAs) may correspond to the angular locations of the point sources in a soundfield.
  • the example direction block 404 estimates direction vectors corresponding to the DOAs of audio sources by selecting from a codebook of candidate directions based on the eigenvectors of a spatial correlation matrix in accordance with a multiple signal classification (MUSIC) algorithm for DOA estimation.
  • MUSIC multiple signal classification
  • the eigenvalues of the spatial correlation matrix are used for source counting. See, Schmidt, R.O, "Multiple Emitter Location and Signal Parameter Estimation," IEEE Trans. Antennas Propagation, Vol.
  • the MUSIC algorithm is used to estimate the spatial directions of prominent sources in an input spatial audio signal in the ambisonic format.
  • An example system is configured to receive a first-order ambisonic signal (the B-format).
  • the MUSIC algorithm framework is also applicable to higher order ambisonic as well as other spatial audio formats.
  • the MUSIC algorithm codebook includes direction vectors corresponding to defined locations on a virtual sphere.
  • the direction block 404 estimates a number and directions of audio sources for each of a number of frequency bands within the input signal, based upon eigenvalues and eigenvectors of a spatial correlation matrix and codebook directions associated with the virtual sphere in accordance with the MUSIC algorithm.
  • An example direction block 404 is configured to perform the MUSIC algorithm as follows.
  • a set of candidate spatial directions is determined. Each spatial direction is specified as an (azimuth, elevation) angle pair corresponding to a point on a virtual sphere.
  • the set of candidates includes a list of such angle pairs. This list of angle pairs may be denoted as ⁇ ; the z-th element of this list may be denoted as ( ⁇ i , ⁇ i ).
  • the set of candidate directions may be constructed to have equal resolution in azimuth and elevation.
  • the set of candidate directions may be constructed to have variable azimuth resolution based on the elevation angle.
  • the set of candidate directions may be constructed based on the density of the distribution of directions on a unit sphere.
  • a codebook of direction vectors corresponding to the set of spatial directions ⁇ is established.
  • the codebook entries may be alternatively referred to as steering vectors.
  • the codebook consists of vectors constructed from the angle pairs in accordance with the directional patterns of the B-format channels.
  • the codebook can be expressed as a matrix D 0 where each column is a direction vector d i (which may be referred to as a steering vector) that includes an element for each of multiple channels, each channel characterized by the three-dimensional directivity pattern of the corresponding B-format channel.
  • the spatial correlation matrix of the FOA channel subset of the input signal 316 is estimated.
  • the estimate is aggregated over one or more frequency bins and one or more time frames.
  • a frequency-domain processing framework is used to estimate the spatial correlation matrix for each of multiple bin frequencies and time frames.
  • the estimate is computed for each one of multiple frequency bands by aggregating data for frequency bins within each respective frequency band and further aggregating across time frames.
  • R xx b t ⁇ b R xx b , t ⁇ 1 + 1 ⁇ ⁇ b 1 N b ⁇ k ⁇ band b x ⁇ k x ⁇ k H
  • N b is the number of frequency bins in band b
  • t is a time frame index
  • x k is a vector of input format signal values for frequency bin k at time t .
  • An eigendecomposition of the spatial correlation matrix is carried out.
  • the eigenvectors and eigenvalues are portioned into signal and noise components (often referred to as subspaces).
  • the portioning is done based upon applying a threshold to the eigenvalues, with the larger eigenvalues interpreted as signal components and the smaller eigenvalues interpreted as noise components.
  • the portioning is done based upon applying a threshold to a logarithm of the eigenvalues, with the larger logarithmic values interpreted as signal components and the smaller logarithmic values interpreted as noise components.
  • An optimality metric is computed for each element of the codebook.
  • An example optimality metric quantifies how orthogonal the codebook element is to the noise eigenvectors.
  • Q H d i comprises correlations between the direction vector d i and one or more eigenvectors of the noise subspace. If M is the number of eigenvalue signal components plus the number of eigenvalue noise components in the input format and P is the estimated number of sources, then Q may comprise at most M - P such noise subspace eigenvectors.
  • M is the number of elements in each direction vector in the codebook since M is the number of channels in the subset being used for direction estimation. M is furthermore equivalent to the sum of the number of signal eigenvalues and noise eigenvalues.
  • the extrema in the optimality metric are identified by a search algorithm in accordance with the formulation of the optimality metric.
  • the extrema identified by the search algorithm may be maxima.
  • the extrema identified in the search algorithm may be minima.
  • the extrema indicate which codebook elements are most orthogonal to the noise eigenvectors; these correspond to the estimates of the directions of prominent audio sources.
  • One of the computational costs of a MUSIC-based ambisonic active decoding algorithm is the computation of the optimality metric c [i] for a current input's noise subspace across the entire codebook of possible input source directions for each of multiple frequency bands.
  • the extrema in this metric reveal the best fit of codes to the input signal, namely, the best direction estimates.
  • the elements in the codebook must sufficiently represent all possible directions in azimuth and elevation, both above and below the ear level.
  • the codebook may be constructed to have a specified azimuth angle resolution for each of a set of elevation angles.
  • the codebook may be constructed to have a specified size in accordance with computational constraints.
  • the elements in the codebook may be configured with certain symmetries to allow for computational simplifications.
  • the elements in the codebook may be configured to have angular resolutions in accordance with psychoacoustic considerations.
  • methods other than the MUSIC-based algorithm can be used for estimating the number and direction of sources in the FOA portion of an input spatial audio signal. For instance, an optimality metric can be computed based on the correlation between the input signal vector and the elements of the direction codebook, and the elements with the highest correlation can be selected as the estimated source directions. Such alternative methods are within the scope of the present disclosure.
  • a full ambisonic codebook contains the omnidirectional W-channel normalization gain and each of the steering channel gains X (front/back), Y (left/right) and Z (up/down).
  • Figure 7 is an illustrative drawing representing the B-format ambisonic spatial format.
  • the encoding equations correspond to the directivity patterns of the B-format components.
  • the codebook of direction vectors is constructed in accordance with the B-format encoding equations.
  • Each vector in the direction codebook corresponds to a candidate angle pair ( ⁇ , ⁇ ).
  • the elements of a vector in the codebook correspond to the directional gains of the component directivity patterns at the candidate angle pair.
  • each column vector of the matrix G may correspond to a direction vector d ⁇ ⁇ at a particular angle pair associated with an estimated direction of a source.
  • the matrix G is a matrix of estimated source direction vectors.
  • direction estimation and various matrices can be derived per frequency band. They can be applied to the FOA channel subset of an input signal independently for each bin in a respective band.
  • the subspace determination block 406 provides the active component of the FOA channel subset of the input signal resulting from the active subspace projection in accordance with Eq. (17), as input to the active decoder block 308, and also to the residual passive FOA component determination block 408, and also to the active HOA component determination block 420.
  • the passive component of the FOA channel subset of an input signal 316 is determined first, and the active component of the FOA channel subset of the input signal is determined thereafter.
  • the residual passive component block 408 provides the determined passive FOA component as input to the order combination block 307.
  • the active HOA component determination block 420 determines an active component for the HOA channel subset which corresponds to the active component determined for the FOA channel subset in block 406.
  • the active component determined for the FOA channel subset in block 406 corresponds to a determined number of directional audio sources identified in the FOA channel subset. In a full HOA channel set representing a soundfield containing those directional sources, the directional sources are present both in the FOA channel subset and in the higher order HOA channel subset.
  • the determined active component of the FOA channel subset is to be rendered with an FOA active decoder such that the directional sources are reproduced with spatial fidelity, in other words with the directional sources well localized in the reproduction.
  • the HOA channel subset is to be rendered by a passive decoder.
  • the active signal content in the higher order HOA channel subset corresponding to the determined FOA directional sources should be removed from the HOA channel subset signals prior to the subsequent decoding by HOA passive decoder so as to avoid decoding the directional sources with both the FOA active decoder and the HOA passive decoder.
  • the active HOA component determination block 420 provides the determined active component of the HOA channel subset as input to the subset recombination block 307.
  • FIG 8 is an illustrative block diagram of the example subset recombination block 307 of Figure 4 .
  • the subset recombination block 307 includes a residual passive HOA channel subset component determination block 802 and a passive ambisonic component recombination block 804.
  • the residual passive HOA component determination block 802 receives the HOA channel subset of the input signal provided on line 318and receives the determined active component of the HOA channel subset on line 324.
  • the residual passive HOA channel subset component determination block 802 determines the passive HOA channel subset component of the input signal 318 to provide on line 805, based upon a difference between the HOA channel subset input signals provided on line 318 and the determined active component of the HOA channel subset provided on line 324.
  • the passive ambisonic subset recombination block 804 combines the determined passive HOA channel subset component on line 805 with the determined passive FOA channel subset component provided on line 320, through concatenation or interleaving, for example, of the channel subsets of the passive FOA and HOA components, to produce on line 325 a corresponding recombined passive ambisonic component for the complete channel set.
  • the subset recombination block 307 provides the recombined passive ambisonic component to the passive decoder block 310.
  • the subset recombination block 307 performs the above operations for each of one or more frequency bands.
  • the active spatial decoder 308 is configured, for each of one or more frequency bands, based upon directions determined by the direction estimation block 404 and based upon an active subspace projection matrix determined using the subspace determination block 406.
  • Each column of the matrix ⁇ is a direction vector ⁇ i for the output format corresponding to a source direction ( ⁇ i , ⁇ i ) determined in direction estimation block 404.
  • N is the number of signal channels in the output format.
  • the matrix H A is independent of the order of the P columns in the matrices G and ⁇ if the ordering is consistent between those two matrices.
  • the decoder matrix H A may be smoothed across time to reduce artifacts.
  • the decoder matrix H A may be smoothed across frequency to reduce artifacts.
  • the decoder matrix may be smoothed across time and frequency to reduce artifacts.
  • the passive spatial decoder 310 performs a passive signal spatial transformation that is determined independent of spatial characteristics of the input signal 316. More particularly, an example passive spatial decoder 310 is configured according to a passive spatial decoder matrix H P . Each row of the decoder matrix corresponds to an output channel. For example, the n -th output channel is formed as a linear combination of the elements of the passive component vector x P using the elements of the n -th row of the passive spatial decoder matrix as the weights in the combination.
  • an example passive spatial decoder 310 may apply a different decoding matrix to different frequency regions of the signal. For instance, an example passive spatial decoder 310 may apply one decoding matrix for frequencies below a certain frequency cutoff and a different decoding matrix for frequencies above the frequency cutoff.
  • the term 'passive signal' refers to a signal that is received at the passive decoder.
  • the term 'passive decoder' refers to a decoder that decodes the passive signal without further spatial analysis of the passive signal.
  • Figure 1B depicts a decoding matrix. Such a decoding matrix is an example of a passive decoder if the coefficients of the matrix are fixed to constant values (as described by "Passive Spatial decoder Configuration" above).
  • FIG. 9 is a schematic block diagram of an example second spatial decoder system 900.
  • the second higher order ambisonic decoder system 900 includes a computer system that includes one or more processor devices that are configured to be operatively coupled to one or more non-transitory storage devices that store instructions to configure the processing devices to provide the processing operation blocks described with reference to Figure 9 .
  • the second decoder system 900 includes a channel partitioning operation block 904, an active/passive decomposition and projection operation block 906, a subset recombination block 907, an active decoder operation block 908, a passive decoder operation block 910, and a combiner operation block 914.
  • a channel partitioning operation block 904 an active/passive decomposition and projection operation block 906
  • subset recombination block 907 an active decoder operation block 908
  • passive decoder operation block 910 an active decoder operation block 910
  • combiner operation block 914 To avoid complicating the description, explanations of constituent blocks of
  • the second decoder system 900 is configured for generalized partitioning of an input signal 902 based upon individual channel subsets.
  • the second decode system 900 is not limited to partitioning based upon FOA channels and HOA channels.
  • the subsets C 0 and C 1 are disjoint, meaning that no input channel is in both subsets.
  • An example input signal can consist of a channel set C that includes FOA channels WXYZ and second-order channels RSTUV.
  • an example input signal can include a channel set C that includes FOA channels WXYZ, and second order channels RSTUV, and third order channels KLMNOPQ.
  • the C 0 subset includes one or more ambisonic channels contained within C
  • the C 1 subset includes the ambisonic channels contained within C that are not included in C 0 .
  • the C 0 channel subset is not limited to FOA channels and the C 1 channel subset is not limited to HOA channels. It will be appreciated that that an input signal can have an input format other than ambisonic, such as two-channel stereo or a standard multichannel format such as 5.1.
  • the channel partitioning block 904 receives a channel select control signal on line 901 to control partitioning of channels.
  • the channel control signal can indicate a partitioning of a channel set C into C 0 and C 1 based upon practical considerations such as cost, for example.
  • An active decoder is more costly than the passive decoder on a per-channel basis, so it might be useful to reduce the number of actively decoded channels to meet a certain computation budget.
  • Output layout is another example practical consideration. For instance, if the output layout is purely horizontal (no elevation speakers), it may not be perceptually worthwhile to actively decode elevation channels.
  • the channel control signal on line 901 can indicate a partitioning of a channel set C into C 0 and C 1 based upon output formats such as a standard 5.1 horizontal-only multichannel loudspeaker layout, for example.
  • output formats such as a standard 5.1 horizontal-only multichannel loudspeaker layout, for example.
  • Other possible output formats include but are not limited to 7.1 horizontal-only multichannel loudspeaker formats, 7.1.4 multichannel loudspeaker formats with elevation loudspeakers, and 11.2.4 multichannel loudspeaker formats with elevation loudspeakers.
  • the channel control signal on line 901 can indicate a partitioning of a channel set C into C 0 and C 1 based upon input metadata such as the spatial location of active sources, for example.
  • the channel control signal can include input metadata that indicates partitioning of the channel set C in which, for example, if active sources are present in frontal directions in the input audio signal, the frontal channels are partitioned into the subset C 0 and the remaining channels are partitioned into the subset C 1 .
  • the channels containing prevalent active sources are partitioned into the subset C 0 and the remaining channels are partitioned into the subset C 1 .
  • the channel control signal metadata may indicate the partitioning by including a channel subset designation for each input channel.
  • An example decoder includes a prioritization protocol for prioritizing which channels are to be routed to the active decoder based upon computation resource availability or other factors.
  • the active/passive decomposition and projection block 906 determines estimated DOAs of sound point sources within a soundfield and uses an active signal subspace projection to determine an active component of the C 0 channel subset signals corresponding to the estimated DOAs.
  • the active/passive decomposition and projection block 906 determines a passive component of the C 0 channel subset signals.
  • the active/passive decomposition and projection block 906 determines an active component of the C 1 channel subset.
  • the subset recombination block 907 uses the determined active component of the C 1 channel subset as a basis to determine a passive component of the C 1 channel subset signals and recombines the determined passive C 0 channel subset signals with the corresponding determined passive component of the C 1 channel subset signals to produce a passive component of the C channel subset signals.
  • the active decoder block 908 decodes the active component of the C 0 channel subset signals to produce on line 926 actively decoded output signals decoded to an output spatial format such as a 7.1.4 multichannel loudspeaker layout.
  • the passive decoder block 910 decodes the passive component of the C channel set signals to produce on line 928 passively decoded output signals decoded to an output spatial format.
  • the combiner block 914 combines the actively decoded signals and the passively decoded signals to produce decoded output signals on line 930.
  • FIG 10 is a schematic block diagram of an example third spatial decoder system 1000.
  • the third spatial decoder system 1000 includes a computer system that includes one or more processor devices configured to be operatively coupled to one or more non-transitory storage devices that store instructions to configure the processing devices to provide the processing operation blocks described with reference to Figures 10 .
  • the third decoder system 1000 includes a channel partitioning operation block 1004, a spatial analysis block 1040, an active/passive decomposition block 1006, a subset recombination block 1007, an active decoder operation block 1008, a passive decoder operation block 1010, and a combiner operation block 1014.
  • a spatial analysis block 1040 to analyze the third decoder system 1000 that are identical to or substantially identical to corresponding blocks of the example first and second decoder systems 300, 900 are not repeated in the following description.
  • the channel partitioning block 1004 partitions an input spatial audio channel set C into subsets C 0 , C 1 , and C 2 .
  • a channel control signal on line 1001 can be used to selectively control channel partitioning.
  • Set C represents a complete set of input spatial audio signal channels.
  • Subset C 0 represents a channel subset to be processed by the active decoder and the passive decoder.
  • Subset C 1 represents a channel subset to be processed only by the passive decoder.
  • Subset C 2 represents a channel subset to be used for spatial analysis.
  • subset C 3 further represents a channel subset to be used for constructing a plane-wave signal model;
  • C 0 is a subset of C 3 .
  • C 3 may be a subset of C or may be equivalent to C .
  • the spatial analysis block 1040 determines estimated DOAs based upon the C 2 channel subset, which can match or be different from the C 0 channel subset.
  • An example spatial analysis block 1040 is configured to uses the MUSIC algorithm with a codebook of direction vectors D 2 corresponding to the C 2 channel subset that represent DOAs in terms of source directions ⁇ (for each time and frequency band) of sound point sources of a soundfield represented by the channel signal subset C 2 .
  • the C 2 subset can be a subset of C 0 , equal to C 0 , or a superset of C 0 .
  • C 2 C 0 such that only the channels that are to be passed to the active decoder are used for spatial analysis.
  • some channels that are not to be passed to the active decoder are used for spatial analysis in order to increase the accuracy the accuracy of the spatial analysis by providing more data representative of the the soundfield.
  • only a subset of the channels that are to be passed to the active decoder are used for spatial analysis in order to reduce the computational cost of the spatial analysis, for instance if computational resources available to the spatial decoder are limited.
  • only a subset of the channels that are to be passed to the active decoder are used for spatial analysis in order to configure the analysis in accordance with the output spatial format; for instance, if the output spatial format corresponds to a horizontal-only loudspeaker layout, it may be desirable to limit spatial analysis to channels in which spatial information about azimuth angles is encoded.
  • An example active/passive decomposition block 1006 can be configured to implement alternative implicit directional source models used to determine decoder matrices.
  • the matrix G 0 is determined or constructed based upon directions estimated by the spatial analysis using the C 2 channel subset.
  • An example active/passive decomposition block 1006 can be configured to implement a second example directional source model that uses direction vectors corresponding to the C 3 channel subset and models only the C 3 channels of the input signal.
  • x ⁇ 3 G 3 ⁇ ⁇ 3
  • x 3 is a vector that represents the C 3 subset channel signals
  • G 3 is a direction vector matrix of source directions for the C 3 subset
  • ⁇ 3 represents implicit coefficients for sources in the C 3 channel subset of signals.
  • the matrix G 3 is determined or constructed based upon directions estimated by the spatial analysis using the C 2 channel subset.
  • C 3 may be equivalent to C 2 .
  • C 3 may be a subset of C 2 , for instance to reduce the computational cost.
  • C 3 may be a superset of C 2 , for instance to increase the accuracy of the active component estimates.
  • C 3 may be equal to C 0 , a superset of C 0 , or a subset of C 0 .
  • a model for the C 0 channels can be determined from the second example directional source model.
  • a rationale for using the second directional source model is to use the additional channels in C 3 to provide a better fit for the full ensemble of input channels, which includes C 0 and C 1 , than if only the C 0 subset of channels was used.
  • we find the best active component for the full input channel set for the directions determined from the C 2 subset. This better fit can result in a better subtraction of the active components from the C 1 subset in a later step.
  • the example active/passive decomposition block 1006 is configured to determine or construct a direction vector matrix G 1 for C 1 based upon the directions estimated during spatial analysis. It will be appreciated that the directions are estimated using the channel subset C 2 . Those directions are used to find corresponding direction vectors for the C 0 channel subset and the C 1 channel subset.
  • the example active/passive decomposition operation block 1006 is configured to compute a source-model decoder matrix ⁇ used to map input signals to source-model coefficients; the ⁇ matrix also is sometimes referred to as a plane-wave decoder matrix since it is used to compute a plane-wave source model.
  • the source-model decoder matrix is computed based upon the implicit source model and a least-squares fit using a pseudo-inverse.
  • the example active/passive decomposition operation block 1006 also is configured to determine the passive component of the C 0 channel subset.
  • the residual can be expressed differently for the first and second source models.
  • x ⁇ 0 P x ⁇ 0 ⁇ x ⁇ 0
  • x ⁇ 0 P x ⁇ 0 ⁇ G 3 ⁇ 3 x ⁇ 3 wherein X 0 P represents the determined passive component of the C 0 channel subset.
  • the example active/passive decomposition operation block 1006 is configured to determine the active component of the C 1 channel subset.
  • An example active/passive decomposition operation block 1006 is selectively configurable to use different versions of the coefficients ⁇ 1 .
  • G 1 consists of direction vectors for the C 1 channel subset. They are based on the directions identified in the spatial analysis, the same directions used to construct G 0 .
  • the active decoder operation block 1008 is configured to determine or construct a direction vector matrix ⁇ based upon the directions estimated by the spatial analysis block 1040.
  • the active decoder constructs an active decoder matrix relying upon an implicit source model.
  • the coefficients are not explicitly computed. Rather, we use formulations of the coefficients to achieve a model-based active decoder that can be expressed as a matrix applied to the input.
  • active decoder matrix may be smoothed prior to this step. In some examples, the active decoder matrix may be smoothed across time. In some examples, the active decoder matrix may be smoothed across frequency. In some examples, the active decoder matrix may be smoothed across both time and frequency.
  • the subset recombination block 1007 is configured to recombine the respective passive components of the C 0 and and C 1 channel subset signals.
  • the passive components of the channel subsets are concatenated to form the full passive vector x P to be decoded by a passive decoder matrix H P .
  • x ⁇ P x ⁇ 0 P x ⁇ 1 P
  • a passive decoder matrix H P is determined based upon the input spatial format (which corresponds to the channel subsets C 0 and C 1 to be processed by the passive decoder) and the output spatial format. This can be done offline before processing.
  • An example passive decoder block 1010 applies the passive decoder matrix H P to the passive component x P of the combined channel subsets C 0 and C 1 (determined above in the subset recombination block) to produce a passively decoded output signal component on line 1028.
  • y ⁇ P H P x ⁇ P
  • Figure 11 is an illustrative block diagram illustrating components of a machine 1100, according to some example embodiments, able to read instructions 1116 from a machine-readable medium (e.g., a machine-readable storage medium) and perform any one or more of the methodologies discussed herein.
  • Figure 11 shows a diagrammatic representation of the machine 1100 in the example form of a computer system, within which the instructions 1116 (e.g., software, a program, an application, an applet, an app, or other executable code) for causing the machine 1100 to perform any one or more of the methodologies discussed herein may be executed.
  • the instructions 1116 e.g., software, a program, an application, an applet, an app, or other executable code
  • the instructions 1116 can configure one or more processor devices 1110 to implement the spatial decoder 106 of Figure 1A , the spatial decoder 206 of Figure 2 , the example decoder system 300 of Figures 3-5 , the decoder systems 900, 1000 of Figures 9-10 , for example.
  • the instructions 1116 can transform the general, non-programmed machine 1100 into a particular machine programmed to carry out the described and illustrated functions in the manner described (e.g., as an audio processor circuit).
  • the machine 1100 operates as a standalone device or can be coupled (e.g., networked) to other machines. In a networked deployment, the machine 1100 can operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine 1100 can comprise, but is not limited to, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a personal digital assistant (PDA), an entertainment media system or system component, a cellular telephone, a smart phone, a mobile device, a wearable device (e.g., a smart watch), a smart home device (e.g., a smart appliance), other smart devices, a web appliance, a network router, a network switch, a network bridge, a headphone driver, or any machine capable of executing the instructions 1116, sequentially or otherwise, that specify actions to be taken by the machine 1100.
  • the term "machine” shall also be taken to include a collection of machines 1100 that individually or jointly execute the instructions 1116 to perform any one or more of the methodologies discussed herein.
  • the machine 1100 can include or use processors 1110, such as including an audio processor circuit, non-transitory memory/storage 1130, and I/O components 1150, which can be configured to communicate with each other such as via a bus 1102.
  • the processors 1110 e.g., a central processing unit (CPU), a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphics processing unit (GPU), a digital signal processor (DSP), an ASIC, a radio-frequency integrated circuit (RFIC), another processor, or any suitable combination thereof
  • the processors 1110 can include, for example, a circuit such as a processor 1112 and a processor 1114 that may execute the instructions 1116.
  • processor is intended to include a multi-core processor 1112, 1114 that can comprise two or more independent processors 1112, 1114 (sometimes referred to as "cores") that may execute the instructions 1116 contemporaneously.
  • Figure 11 shows multiple processors 1110
  • the machine 1100 may include a single processor 1112, 1114 with a single core, a single processor 1112, 1114 with multiple cores (e.g., a multi-core processor 1112, 1114), multiple processors 1112, 1114 with a single core, multiple processors 1112, 1114 with multiples cores, or any combination thereof, wherein any one or more of the processors can include a circuit configured to apply a height filter to an audio signal to render a processed or virtualized audio signal.
  • the memory/storage 1130 can include a memory 1132, such as a main memory circuit, or other memory storage circuit, and a storage unit 1136, both accessible to the processors 1110 such as via the bus 1102.
  • the storage unit 1136 and memory 1132 store the instructions 1116 embodying any one or more of the methodologies or functions described herein.
  • the instructions 1116 may also reside, completely or partially, within the memory 1132, within the storage unit 1136, within at least one of the processors 1110 (e.g., within the cache memory of processor 1112, 1114), or any suitable combination thereof, during execution thereof by the machine 1100. Accordingly, the memory 1132, the storage unit 1136, and the memory of the processors 1110 are examples of machine-readable media.
  • machine-readable medium means a device able to store the instructions 1116 and data temporarily or permanently and may include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., erasable programmable read-only memory (EEPROM)), and/or any suitable combination thereof.
  • RAM random-access memory
  • ROM read-only memory
  • buffer memory flash memory
  • optical media magnetic media
  • cache memory other types of storage
  • EEPROM erasable programmable read-only memory
  • machine-readable medium should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store the instructions 1116.
  • machine-readable medium shall also be taken to include any medium, or combination of multiple media, that is capable of storing instructions (e.g., instructions 1116) for execution by a machine (e.g., machine 1100), such that the instructions 1116, when executed by one or more processors of the machine 1100 (e.g., processors 1110), cause the machine 1100 to perform any one or more of the methodologies described herein.
  • a “machine-readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices.
  • the term “machine-readable medium” excludes signals per se.
  • the I/O components 1150 may include a variety of components to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on.
  • the specific I/O components 1150 that are included in a particular machine 1100 will depend on the type of machine 1100. For example, portable machines such as mobile phones will likely include a touch input device or other such input mechanisms, while a headless server machine will likely not include such a touch input device. It will be appreciated that the I/O components 1150 may include many other components.
  • the I/O components 1150 are grouped by functionality merely for simplifying the following discussion, and the grouping is in no way limiting. In various example embodiments, the I/O components 1150 may include output components 1152 and input components 1154.
  • the output components 1152 can include visual components (e.g., a display such as a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)), acoustic components (e.g., loudspeakers), haptic components (e.g., a vibratory motor, resistance mechanisms), other signal generators, and so forth.
  • visual components e.g., a display such as a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)
  • acoustic components e.g., loudspeakers
  • haptic components e.g., a vibratory motor, resistance mechanisms
  • the input components 1154 can include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, a photo-optical keyboard, or other alphanumeric input components), point based input components (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, or other pointing instruments), tactile input components (e.g., a physical button, a touch screen that provides location and/or force of touches or touch gestures, or other tactile input components), audio input components (e.g., a microphone), and the like.
  • alphanumeric input components e.g., a keyboard, a touch screen configured to receive alphanumeric input, a photo-optical keyboard, or other alphanumeric input components
  • point based input components e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, or other pointing instruments
  • tactile input components e.g., a physical button,
  • the I/O components 1150 can include biometric components 1156, motion components 1158, environmental components 1160, or position components 1162, among a wide array of other components.
  • the biometric components 1156 can include components to detect expressions (e.g., hand expressions, facial expressions, vocal expressions, body gestures, or eye tracking), measure biosignals (e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves), identify a person (e.g., voice identification, retinal identification, facial identification, fingerprint identification, or electroencephalogram based identification), and the like, such as can influence a inclusion, use, or selection of a listener-specific or environment-specific impulse response or FIRTF, for example.
  • expressions e.g., hand expressions, facial expressions, vocal expressions, body gestures, or eye tracking
  • measure biosignals e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves
  • identify a person e.g., voice identification, retinal identification, facial
  • the biometric components 1156 can include one or more sensors configured to sense or provide information about a detected location of the listener in an environment.
  • the motion components 1158 can include acceleration sensor components (e.g., accelerometer), gravitation sensor components, rotation sensor components (e.g., gyroscope), and so forth, such as can be used to track changes in the location of the listener.
  • the environmental components 1160 can include, for example, illumination sensor components (e.g., photometer), temperature sensor components (e.g., one or more thermometers that detect ambient temperature), humidity sensor components, pressure sensor components (e.g., barometer), acoustic sensor components (e.g., one or more microphones that detect reverberation decay times, such as for one or more frequencies or frequency bands), proximity sensor or room volume sensing components (e.g., infrared sensors that detect nearby objects), gas sensors (e.g., gas detection sensors to detect concentrations of hazardous gases for safety or to measure pollutants in the atmosphere), or other components that may provide indications, measurements, or signals corresponding to a surrounding physical environment.
  • illumination sensor components e.g., photometer
  • temperature sensor components e.g., one or more thermometers that detect ambient temperature
  • humidity sensor components e.g., pressure sensor components (e.g., barometer)
  • acoustic sensor components e.g., one or more microphones that detect reverb
  • the position components 1162 can include location sensor components (e.g., a Global Position System (GPS) receiver component), altitude sensor components (e.g., altimeters or barometers that detect air pressure from which altitude may be derived), orientation sensor components (e.g., magnetometers), and the like.
  • location sensor components e.g., a Global Position System (GPS) receiver component
  • altitude sensor components e.g., altimeters or barometers that detect air pressure from which altitude may be derived
  • orientation sensor components e.g., magnetometers
  • the I/O components 1150 can include communication components 1164 operable to couple the machine 1100 to a network 1180 or devices 1170 via a coupling 1182 and a coupling 1172 respectively.
  • the communication components 1164 can include a network interface component or other suitable device to interface with the network 1180.
  • the communication components 1164 can include wired communication components, wireless communication components, cellular communication components, near field communication (NFC) components, Bluetooth ® components (e.g., Bluetooth ® Low Energy), Wi-Fi ® components, and other communication components to provide communication via other modalities.
  • the devices 1170 can be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via a USB).
  • the communication components 1164 can detect identifiers or include components operable to detect identifiers.
  • the communication components 1164 can include radio frequency identification (RFID) tag reader components, NFC smart tag detection components, optical reader components (e.g., an optical sensor to detect one-dimensional bar codes such as Universal Product Code (UPC) bar code, multi-dimensional bar codes such as Quick Response (QR) code, Aztec code, Data Matrix, Dataglyph, MaxiCode, PDF49, Ultra Code, UCC RSS-2D bar code, and other optical codes), or acoustic detection components (e.g., microphones to identify tagged audio signals).
  • RFID radio frequency identification
  • NFC smart tag detection components e.g., an optical sensor to detect one-dimensional bar codes such as Universal Product Code (UPC) bar code, multi-dimensional bar codes such as Quick Response (QR) code, Aztec code, Data Matrix, Dataglyph, MaxiCode, PDF49, Ultra Code, UCC RSS-2D bar code, and other optical codes
  • acoustic detection components e
  • IP Internet Protocol
  • Wi-Fi ® Wireless Fidelity
  • NFC beacon detecting an NFC beacon signal that may indicate a particular location
  • identifiers can be used to determine information about one or more of a reference or local impulse response, reference or local environment characteristic, or a listener-specific characteristic.
  • one or more portions of the network 1180 can be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), the Internet, a portion of the Internet, a portion of the public switched telephone network (PSTN), a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi ® network, another type of network, or a combination of two or more such networks.
  • VPN virtual private network
  • LAN local area network
  • WLAN wireless LAN
  • WAN wide area network
  • WWAN wireless WAN
  • MAN metropolitan area network
  • PSTN public switched telephone network
  • POTS plain old telephone service
  • the network 1180 or a portion of the network 1180 can include a wireless or cellular network and the coupling 1182 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile communications (GSM) connection, or another type of cellular or wireless coupling.
  • CDMA Code Division Multiple Access
  • GSM Global System for Mobile communications
  • the coupling 1482 can implement any of a variety of types of data transfer technology, such as Single Carrier Radio Transmission Technology (1xRTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, third Generation Partnership Project (3GPP) including 3G, fourth generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (IISPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standard, others defined by various standard-setting organizations, other long range protocols, or other data transfer technology.
  • a wireless communication protocol or network can be configured to transmit headphone audio signals from a centralized processor or machine to a headphone device in use by a listener.
  • the instructions 1116 can be transmitted or received over the network 1180 using a transmission medium via a network interface device (e.g., a network interface component included in the communication components 1164) and using any one of a number of well-known transfer protocols (e.g., hypertext transfer protocol (HTTP)). Similarly, the instructions 1116 can be transmitted or received using a transmission medium via the coupling 1172 (e.g., a peer-to-peer coupling) to the devices 1170.
  • the term "transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding, or carrying the instructions 1116 for execution by the machine 1100, and includes digital or analog communications signals or other intangible media to facilitate communication of such software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Stereophonic System (AREA)

Claims (15)

  1. Decodierer für räumliche Audiosignale, umfassend:
    einen Prozessor und ein operativ damit verbundenes nichtflüchtiges computerlesbares Medium, wobei das nichtflüchtige computerlesbare Medium eine Mehrzahl von in Zuordnung damit gespeicherten Anweisungen umfasst, die für den Prozessor zugänglich und von demselben ausführbar sind, wobei die Mehrzahl von Anweisungen bei Ausführung den Prozessor veranlassen zum:
    Empfangen eines eingegebenen räumliches Audiosignals, das eine erste Menge von Kanälen (C) mit einem eingegebenen räumlichen Format aufweist;
    Unterteilen der ersten Menge von Kanälen (C) des eingegebenen räumlichen Audiosignals in mindestens eine erste Kanalteilmenge (C 0) des eingegebenen räumlichen Audiosignals und eine zweite Kanalteilmenge (C 1) des eingegebenen räumlichen Audiosignals;
    Bestimmen von Schätzungen einer Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in mindestens einem Teil der ersten Menge von Kanälen des eingegebenen räumlichen Audiosignals dargestellt werden;
    Bestimmen einer aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge und einer passiven Komponente der Signale (x P0) der ersten Kanalteilmenge mindestens teilweise basierend auf der geschätzten Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in dem mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden, wobei die aktive Komponente einer Einfallsrichtung einer aktiven Schallquelle entspricht und die passive Komponente nicht einer Einfallsrichtung einer aktiven Schallquelle entspricht;
    Bestimmen der anderen aktiven Komponente der ersten Signale (x 0A) der ersten Kanalteilmenge und der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge mindestens teilweise basierend auf der bestimmten der aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge und der passiven Komponente der Signale der ersten Kanalteilmenge (x P0);
    Bestimmen einer aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge mindestens teilweise basierend auf den geschätzten Einfallsrichtungen von gerichteten Audioquellen, die in dem mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden;
    Bestimmen einer passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge basierend auf den Signalen (C 1) der zweiten Kanalteilmenge und der bestimmten aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge;
    Dekodieren der aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge in ein erstes Ausgangssignal mit einem ersten Ausgangsformat;
    Dekodieren der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge und der passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge in ein zweites Ausgangssignal mit einem zweiten Ausgabeformat.
  2. Decodierer nach Anspruch 1,
    wobei die Anweisungen, die bei Ausführung eine Schätzung einer Anzahl und Einfallsrichtungen von gerichteten Audioquellen bestimmen, die in mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden, eine Schätzung einer Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in Signalen (C 0) der ersten Kanalteilmenge dargestellt werden, bestimmen.
  3. Decodierer nach Anspruch 1,
    wobei die Anweisungen, die bei Ausführung den ersten Satz von Kanälen (C) der eingegebenen räumlichen Audiosignale unterteilen, den ersten Satz von Kanälen (C) des eingegebenen räumlichen Audiosignals in mindestens eine erste Kanalteilmenge (C 0) des eingegebenen räumlichen Audiosignals und eine zweite Kanalteilmenge (C 1) des eingegebenen räumlichen Audiosignals und eine dritte Kanalteilmenge (C 2) des eingegebenen räumlichen Audiosignals unterteilen; und
    wobei die Anweisungen, die bei Ausführung eine Schätzung einer Anzahl und Einfallsrichtungen gerichteter Audioquellen bestimmen, die in mindestens einem Teil der Kanäle der eingegebenen räumlichen Audiosignale dargestellt werden, Schätzungen einer Anzahl und Einfallsrichtungen gerichteter Audioquellen, die in den Signalen (C 2) der dritten Kanalteilmenge dargestellt werden, bestimmen.
  4. Decodierer nach Anspruch 1,
    wobei jeder Kanal des ersten Satzes von Kanälen einem anderen dreidimensionalen räumlichen Richtmuster entspricht.
  5. Decodierer nach Anspruch 1,
    wobei das räumliche Eingabeformat ein ambisonisches Format umfasst.
  6. Decodierer nach Anspruch 1,
    wobei die erste Kanalteilmenge (C 0) des eingegebenen räumlichen Audiosignals aus Ambisonics erster Ordnung besteht und eine zweite Kanalteilmenge (C 1) des eingegebenen räumlichen Audiosignals Ambisonics höherer Ordnung aufweist.
  7. Decodierer nach Anspruch 1,
    wobei das eingegebene räumliche Audiosignal innerhalb eines ersten Frequenzbandes liegt.
  8. Decodierer nach Anspruch 1, wobei die Mehrzahl von Anweisungen, wenn sie ausgeführt werden, den Prozessor veranlassen zum:
    Kombinieren des ersten decodierten aktiven Signals und des ersten decodierten passiven Signals, um ein decodiertes Ausgangssignal bereitzustellen.
  9. Decodierer nach Anspruch 1,
    wobei das erste Ausgabeformat mit dem zweiten Format übereinstimmt.
  10. Decodierer nach Anspruch 1,
    wobei sich das erste Ausgabeformat von dem zweiten Format unterscheidet.
  11. Decodierer nach Anspruch 1,
    wobei das Bestimmen der passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge ein Subtrahieren der bestimmten aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge von den Signalen (C 1) der zweiten Kanalteilmengen aufweist.
  12. Decodierer nach Anspruch 1, wobei die Mehrzahl von Anweisungen, wenn sie ausgeführt werden, den Prozessor veranlassen zum:
    Kombinieren der bestimmten passiven Komponente der Signale (x 0P ) der ersten Kanalteilmenge mit der bestimmten passiven Komponente der Signale der zweiten Kanalteilmenge (x 1P), um eine passive Komponente für eine rekombinierte Kanalmenge (CP) bereitzustellen; und
    wobei das Dekodieren der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge und der passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge mit dem eingegebenen räumlichen Format in ein zweites Ausgabesignal mit einem zweiten Ausgabeformat ein Dekodieren der passiven Komponente für die rekombinierte Kanalmenge (CP) aufweist.
  13. Decodierer nach Anspruch 1,
    wobei die Anweisungen, die bei Ausführung die Anzahl und Einfallsrichtung gerichteter Audioquellen bestimmen, einen Unterraum bestimmen, der einem oder mehreren Richtungsvektoren eines Codebuchs entspricht, um eine Kanalteilmenge des eingegebenen räumlichen Audiosignals darzustellen.
  14. Verfahren zum Dekodieren von Audiosignalen, umfassend:
    Empfangen eines eingegebenen räumliches Audiosignals, das eine erste Menge von Kanälen (C) mit einem eingegebenen räumlichen Format aufweist, innerhalb mehrerer jeweiliger Frequenzbänder;
    Unterteilen, innerhalb der jeweiligen Frequenzbänder, der ersten Menge von Kanälen (C) des eingegebenen räumlichen Audiosignals in mindestens eine erste Kanalteilmenge (C 0) des eingegebenen räumlichen Audiosignals und eine zweite Kanalteilmenge (C 1) des eingegebenen räumlichen Audiosignals;
    in jedem jeweiligen Frequenzband,
    Bestimmen einer Schätzung einer Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in mindestens einem Teil der ersten Menge von Kanälen des eingegebenen räumlichen Audiosignals dargestellt werden;
    Bestimmen einer aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge und einer passiven Komponente der Signale (x P0) der ersten Kanalteilmenge mindestens teilweise basierend auf der geschätzten Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in dem mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden, wobei die aktive Komponente einer Einfallsrichtung einer aktiven Schallquelle entspricht und die passive Komponente nicht einer Einfallsrichtung einer aktiven Schallquelle entspricht;
    Bestimmen der anderen aktiven Komponente der ersten Signale (x 0A) der ersten Kanalteilmenge und der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge mindestens teilweise basierend auf der bestimmten der aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge und der passiven Komponente der Signale der ersten Kanalteilmenge (x P0);
    Bestimmen einer aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge mindestens teilweise basierend auf den geschätzten Einfallsrichtungen von gerichteten Audioquellen, die in dem mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden;
    Bestimmen einer passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge basierend auf den Signalen (C 1) der zweiten Kanalteilmenge und der bestimmten aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge;
    Dekodieren der aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge in ein erstes Ausgangssignal mit einem ersten Ausgangsformat;
    Dekodieren der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge und der passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge in ein zweites Ausgangssignal mit einem zweiten Ausgabeformat.
  15. Herstellungsgegenstand, der ein nichtflüchtiges maschinenlesbares Speichermedium aufweist, das Anweisungen aufweist, die, wenn sie von einer Maschine ausgeführt werden, die Maschine dazu veranlassen, Operationen durchzuführen, die umfassen:
    Empfangen eines eingegebenen räumliches Audiosignals, das eine erste Menge von Kanälen (C) mit einem eingegebenen räumlichen Format aufweist, innerhalb mehrerer jeweiliger Frequenzbänder;
    Unterteilen, innerhalb der jeweiligen Frequenzbänder, der ersten Menge von Kanälen (C) des eingegebenen räumlichen Audiosignals in mindestens eine erste Kanalteilmenge (C 0) des eingegebenen räumlichen Audiosignals und eine zweite Kanalteilmenge (C 1) des eingegebenen räumlichen Audiosignals;
    in jedem jeweiligen Frequenzband,
    Bestimmen einer Schätzung einer Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in mindestens einem Teil der ersten Menge von Kanälen des eingegebenen räumlichen Audiosignals dargestellt werden;
    Bestimmen einer aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge und einer passiven Komponente der Signale (x P0) der ersten Kanalteilmenge mindestens teilweise basierend auf der geschätzten Anzahl und Einfallsrichtungen von gerichteten Audioquellen, die in dem mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden, wobei die aktive Komponente einer Einfallsrichtung einer aktiven Schallquelle entspricht und die passive Komponente nicht einer Einfallsrichtung einer aktiven Schallquelle entspricht;
    Bestimmen der anderen aktiven Komponente der ersten Signale (x 0A) der ersten Kanalteilmenge und der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge mindestens teilweise basierend auf der bestimmten der aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge und der passiven Komponente der Signale der ersten Kanalteilmenge (x P0);
    Bestimmen einer aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge mindestens teilweise basierend auf den geschätzten Einfallsrichtungen von gerichteten Audioquellen, die in dem mindestens einem Teil der Kanäle des eingegebenen räumlichen Audiosignals dargestellt werden;
    Bestimmen einer passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge basierend auf den Signalen (C 1) der zweiten Kanalteilmenge und der bestimmten aktiven Komponente der Signale (x 1A) der zweiten Kanalteilmenge;
    Dekodieren der aktiven Komponente der Signale (x 0A) der ersten Kanalteilmenge in ein erstes Ausgangssignal mit einem ersten Ausgangsformat;
    Dekodieren der passiven Komponente der Signale (x P0) der ersten Kanalteilmenge und der passiven Komponente der Signale (x 1P) der zweiten Kanalteilmenge in ein zweites Ausgangssignal mit einem zweiten Ausgabeformat.
EP20711440.6A 2019-06-06 2020-02-14 Decodierer von hybridem räumlichem audio Active EP3980993B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962858296P 2019-06-06 2019-06-06
PCT/US2020/018447 WO2020247033A1 (en) 2019-06-06 2020-02-14 Hybrid spatial audio decoder

Publications (3)

Publication Number Publication Date
EP3980993A1 EP3980993A1 (de) 2022-04-13
EP3980993C0 EP3980993C0 (de) 2024-07-31
EP3980993B1 true EP3980993B1 (de) 2024-07-31

Family

ID=69811943

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20711440.6A Active EP3980993B1 (de) 2019-06-06 2020-02-14 Decodierer von hybridem räumlichem audio

Country Status (3)

Country Link
EP (1) EP3980993B1 (de)
KR (1) KR20220027938A (de)
WO (1) WO2020247033A1 (de)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205435B2 (en) 2018-08-17 2021-12-21 Dts, Inc. Spatial audio signal encoder
WO2020037280A1 (en) 2018-08-17 2020-02-20 Dts, Inc. Spatial audio signal decoder
CN118800248A (zh) * 2023-04-13 2024-10-18 华为技术有限公司 场景音频解码方法及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012125855A1 (en) * 2011-03-16 2012-09-20 Dts, Inc. Encoding and reproduction of three dimensional audio soundtracks
MY179136A (en) * 2013-03-05 2020-10-28 Fraunhofer Ges Forschung Apparatus and method for multichannel direct-ambient decomposition for audio signal processing
EP3324406A1 (de) * 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Vorrichtung und verfahren zur zerlegung eines audiosignals mithilfe eines variablen schwellenwerts
WO2020037280A1 (en) * 2018-08-17 2020-02-20 Dts, Inc. Spatial audio signal decoder

Also Published As

Publication number Publication date
KR20220027938A (ko) 2022-03-08
EP3980993A1 (de) 2022-04-13
WO2020247033A1 (en) 2020-12-10
EP3980993C0 (de) 2024-07-31

Similar Documents

Publication Publication Date Title
US11355132B2 (en) Spatial audio signal decoder
US11205435B2 (en) Spatial audio signal encoder
CN109076305B (zh) 增强现实耳机环境渲染
EP3980993B1 (de) Decodierer von hybridem räumlichem audio
US11894004B2 (en) Audio coder window and transform implementations
US10979844B2 (en) Distributed audio virtualization systems
KR20190005206A (ko) 몰입형 오디오 재생 시스템
KR102557774B1 (ko) 음향 주밍
EP3523801B1 (de) Kodierung einer audioschallfelddarstellung
KR102656969B1 (ko) 불일치 오디오 비주얼 캡쳐 시스템
Vennerød Binaural reproduction of higher order ambisonics-a real-time implementation and perceptual improvements
EP3977447A1 (de) Omnidirektionale codierung und decodierung für ambisonics

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220104

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20240321

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602020034833

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

U01 Request for unitary effect filed

Effective date: 20240829

U07 Unitary effect registered

Designated state(s): AT BE BG DE DK EE FI FR IT LT LU LV MT NL PT RO SE SI

Effective date: 20240909