US10262663B2 - Method and apparatus for low bit rate compression of a higher order ambisonics HOA signal representation of a sound field - Google Patents
Method and apparatus for low bit rate compression of a higher order ambisonics HOA signal representation of a sound field Download PDFInfo
- Publication number
- US10262663B2 US10262663B2 US15/509,596 US201515509596A US10262663B2 US 10262663 B2 US10262663 B2 US 10262663B2 US 201515509596 A US201515509596 A US 201515509596A US 10262663 B2 US10262663 B2 US 10262663B2
- Authority
- US
- United States
- Prior art keywords
- sub
- band
- tilde over
- representation
- hoa
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 36
- 230000006835 compression Effects 0.000 title description 13
- 238000007906 compression Methods 0.000 title description 13
- 238000002156 mixing Methods 0.000 claims abstract description 79
- 238000001228 spectrum Methods 0.000 claims abstract description 22
- 230000006837 decompression Effects 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 152
- 230000010076 replication Effects 0.000 claims description 43
- 230000001131 transforming effect Effects 0.000 claims description 10
- 230000003111 delayed effect Effects 0.000 claims description 5
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 28
- 239000013598 vector Substances 0.000 description 20
- 230000006870 function Effects 0.000 description 11
- 238000013459 approach Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 230000001934 delay Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000005428 wave function Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/02—Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the invention relates to a method and to an apparatus for low bit rate compression of a Higher Order Ambisonics HOA signal representation of a sound field, wherein the HOA signal representation is spatially sparse due to the low bit rate.
- HOA Higher Order Ambisonics
- WFS wave field synthesis
- 22.2 channel based approaches like 22.2.
- HOA Higher Order Ambisonics
- WFS wave field synthesis
- 22.2 channel based approaches
- the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up.
- HOA may also be rendered to set-ups consisting of only few loud-speakers.
- a further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to head-phones.
- HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion.
- SH Spherical Harmonics
- Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function.
- O denotes the number of expansion coefficients.
- the spatial resolution of the HOA representation improves with a growing maximum order N of the expansion.
- the total bit rate for the transmission of HOA representation is determined by O ⁇ f S ⁇ N b .
- a reasonable minimum number of quantised signals is ‘8’ for the approaches in EP 2665208 A1, EP 2743922 A1 and International application PCT/EP2013/059363.
- the data rate with one of these methods is typically not lower than 256 kbit/s assuming a data rate of 32 kbit/s for each individual perceptual coder.
- this total data rate might be too high, which makes desirable HOA compression methods for significantly lower data rates, e.g. 128 kbit/s.
- the reconstructed HOA representation consists of highly correlated components because all HOA components are reconstructed from only a small number of quantised signals. Due to such small number of quantised signals, the prediction of directional HOA components thereof can be unsatisfactory and can lead to the effect that the reconstructed HOA representation is spatially sparse. This can make the sound dry and quieter than in the original HOA representation. Ambient sound fields, which typically consist of spatially uncorrelated signal components, are not reconstructed properly if the number of quantised signals is very small, e.g. ‘1’ or ‘2’.
- the processing described is called Parametric Ambience Replication (PAR), and it complements a reconstructed, spatially sparse HOA representation by potentially missing ambient components, which are parametrically replicated from itself.
- the replication is performed by first creating from the signals of the sparse HOA representation (which may include directional signals and an ambient component) a number of new signals with modified phase spectra, thus being uncorrelated with the former signals. Second, the newly created signals are mixed with each other in order to provide a replicated ambient HOA component.
- the final enhanced HOA representation is computed by the superposition of the original sparse HOA representation and the replicated ambient HOA component. The mixing is carried out so as to match the spatial acoustic properties of the final enhanced HOA representation with that of the original HOA representation.
- the mixing is performed in the frequency domain, offering the possibility to vary between different frequency bands.
- the side information for PAR to be included into the compressed HOA representation consists only of the mixing parameters, which are essentially complex-valued mixing matrices.
- One particular method for creating the uncorrelated signals from the sparse HOA representation with the goal to reduce the amount of side information for PAR is to first represent the sparse HOA representations by virtual loudspeaker signals (or equivalently by general plane wave functions) from some predefined directions, which should be distributed on the unit sphere as uniformly as possible.
- the rendering for creating the virtual loudspeaker signals from the HOA representation is referred to as a spatial transform in the following.
- Second, for each of these directions one uncorrelated signal is created by modifying the phase spectrum of the corresponding virtual loudspeaker signal of the sparse HOA representation using a de-correlation filter.
- the replicated ambient HOA component is also represented by virtual loudspeaker signals for the same directions, where each virtual loudspeaker signal for a certain direction is mixed only from uncorrelated signals created for predefined directions in the neighbourhood of that particular direction.
- the mixing from only a small number of uncorrelated signals offers the advantage that the number of mixing coefficients to create one uncorrelated signal can be kept low, as well as the amount of side information for PAR.
- Another advantage is that for the mixing of the individual virtual loudspeaker signals of the replicated ambient HOA component only signals from the spatial neighbourhood, and thus with similar amplitude spectrum, are considered. This operation prevents that directional components of the sparse HOA representation are undesirably spatially distributed over all directions.
- de-correlation filters are pairwise different and that their number is equal to the number of virtual loudspeaker directions.
- the practical construction of many such de-correlation filters usually causes each individual filter to have only a limited de-correlation effect.
- the assignment of the de-correlation filters to the virtual directions (or equivalently spatial positions) should be reasonably chosen in order to minimise the mutual correlation between the signals to be mixed for creating a single virtual loudspeaker signal of the replicated ambient HOA component.
- the number of virtual loudspeaker directions is allowed to vary for individual frequency bands and can be used for specifying a frequency-dependent order of the replicated ambient HOA component.
- a further extension of the method of creating the uncorrelated signals from the sparse HOA representation is the usage of a time-varying number of uncorrelated signals to be considered for the mixing of a virtual loudspeaker signal of the replicated ambient HOA component.
- the number of uncorrelated signals to be mixed depends on the amount of missing ambience in the sparse HOA representation. This variation usually would lead to changes in the assignment of the de-correlation filters to the virtual loudspeaker positions.
- the assignment of the de-correlation filters to the virtual loudspeaker signals of the sparse HOA representation can be exchanged by an equivalent assignment of the virtual loudspeaker signals to the de-correlation filters.
- This assignment can be expressed by a simple permutation matrix.
- the input to each de-correlation filter can be computed by overlap-add between the signals arising from two different assignments.
- the input to and output of each de-correlation filter is continuous.
- the assignment has to be inverted in order to re-assign the output of each de-correlation filter to each virtual loudspeaker direction.
- This application describes a processing for the creation of ambience in the context of HOA representations.
- the inventive compression improving method is adapted for improving a low bit rate compressed and decompressed Higher Order Ambisonics HOA signal representation of a sound field, so as to provide a Parametric Ambience Replication parameter set, wherein said decompression provides a spatially sparse decoded HOA representation and a set of indices of coefficient sequences of this representation, said method including:
- the inventive compression improving apparatus is adapted for improving a low bit rate compressed and decompressed Higher Order Ambisonics HOA signal representation of a sound field, so as to provide a Parametric Ambience Replication parameter set, wherein said decompression provides a spatially sparse decoded HOA representation and a set of indices of coefficient sequences of this representation, said apparatus including means adapted to:
- the inventive decompression improving method is adapted for improving a spatially sparse decoded HOA representation, for which a set of indices of coefficient sequences of this representation was provided by said decoding, using a Parametric Ambience Replication parameter set generated according to the above compression improving method, said method including:
- the inventive decompression improving apparatus is adapted for improving a spatially sparse decoded HOA representation, for which a set of indices of coefficient sequences of this representation was provided by said decoding, using a Parametric Ambience Replication parameter set generated according to the above compression improving method, said apparatus including means adapted to:
- FIG. 1 HOA data encoder including a PAR encoder
- FIG. 3 PAR sub-band encoder
- FIG. 4 HOA data decompressor including a PAR decoder
- FIG. 5 PAR decoder in more detail
- FIG. 6 PAR sub-band decoder
- FIG. 7 spherical coordinate system.
- the Parametric Ambience Replication (PAR) processing is used as an additional coding tool that extends the basic HOA compression, like it is shown in FIG. 1 , where a frame based processing of frames with a frame index k is assumed.
- the HOA encoder step or stage 11 decomposes the HOA representation C(k) into the transport signal matrix Z(k ⁇ k HOA ) and a set of HOA side information ⁇ HOA (k ⁇ k HOA ) like it is described in EP 2665208 A1, EP 2743922 A1, International application PCT/EP2013/059363 and European patent application EP 14306077.0.
- the HOA representation matrix C(k) for the frame index k consists of O rows, where each row holds L time domain samples of the corresponding HOA coefficient, and it is also fed to a frame delay step or stage 14 .
- the rows of the matrix Z(k ⁇ k HOA ) hold the L time domain samples of the transport signals in which C(k) has been composed.
- the time domain signals from Z(k ⁇ k HOA ) are perceptually encoded in perceptual audio encoder step or stage 15 to the transport signal parameter set ⁇ Trans (k ⁇ k HOA ⁇ k enc ) which are fed to a multiplexer and frame synchronisation step or stage 16 .
- the O ⁇ L matrix D(k ⁇ k HOA ) of the sparse HOA representation is restored from ⁇ HOA (k ⁇ k HOA ) and Z(k ⁇ k HOA ) in a HOA decoder step or stage 12 , which also provides a set of active ambience coefficients used (k ⁇ k HOA )
- This HOA decoder step/stage 12 is identical to the HOA decoder step or stage 43 used in the HOA data decompressor shown in FIG. 4 .
- the term ‘sparse’ or ‘spatially sparse HOA representation’ means that in this representation spatially uncorrelated signal components of the original sound field are missing.
- the term ‘sparse’ may, but does not have to mean that the most coefficient sequences of the respective HOA representation are zero.
- a sound field that is coded/represented by only two plane waves is meant to be spatially sparse. However, usually none of the respective HOA coefficient sequences will be zero.
- the sparse HOA representation D(k ⁇ k HOA ) is fed into a PAR encoder step or stage 13 together with the delay-compensated HOA representation C(k ⁇ k HOA ), the set of active ambience coefficients used (k ⁇ k HOA ), and PAR encoder parameters F, o PAR , n SIG (k ⁇ k HOA ) and v COMPLEX delay compensated in step/stage 14 .
- the PAR processing is performed in N SB sub-band groups, where the rows of the matrix F hold the first and the last sub-band index of the PAR filter bank for each corresponding sub-band group.
- the vector o PAR contains for all PAR sub-band groups the HOA order used for the processing.
- the index set used (k ⁇ k HOA ) holds the indexes of the rows from D(k ⁇ k HOA ) that are used for the PAR processing.
- the number of spatial domain signals per sub-band group that are used to compute one spatial domain signal of the replicated ambient HOA representation is defined by the vector n SIG (k) for frame k.
- the vector v COMPLEX indicates for each sub-band group whether the elements of the PAR mixing matrix are complex-valued numbers or real-valued non-negative numbers. From these input signals and parameters the PAR encoder computes the encoded PAR parameter set ⁇ PAR (k ⁇ k HOA ⁇ 1) that is also fed to step/stage 16 .
- Multiplexer and frame synchronisation step/stage 16 synchronises the frame delays of the parameter sets ⁇ HOA (k ⁇ k HOA ), ⁇ PAR (k ⁇ k HOA ⁇ 1) and ⁇ Trans (k ⁇ k HOA ⁇ k enc ), and combines them into the coded HOA frame ⁇ (k ⁇ k max ).
- the HOA encoder delay is defined by k HOA , where it is assumed that the HOA decoder does not introduce any additional delay. The same definitions hold for the perceptual encoder delay k enc .
- a basic feature of the PAR processing is the creation of de-correlated signals from the sparse HOA representation D(k′), and obtaining mixing matrices in the frequency domain that combine these de-correlated signals to a replicated ambient HOA representation that enhances the sparse and highly correlated HOA representation, in order to match the spatial properties of the original HOA representation C(k′).
- De-correlation means in this context that the phase of the sub-band signals is modified without changing its magnitude. Therefore the PAR encoder shown in FIG.
- the PAR processing is performed in frequency domain.
- the PAR analysis filter bank transforms the input HOA representation into its complex-valued frequency domain representation, where it is assumed that the number of time domain samples is equal to the number of frequency domain samples.
- Quadrature Mirror Filter banks QMF
- N FB sub-bands can be used as filter banks.
- L ⁇ L N FB
- step or stage 25 which also receives F, o PAR , n SIG (k′) and v COMPLEX , these sub-bands are grouped into N SB sub-band groups.
- the PAR sub-band configuration is defined by the matrix
- the sub-band configuration is encoded in step or stage 21 to the parameter set ⁇ SUBBAND by the method described in European patent application EP 14306347.7. Because it is fixed for each frame index k, it has to be transmitted to the decoder only once for initialisation.
- the parameter o PAR,g indicates the HOA order for which the PAR encoder computes parameters. This order is equal or less than the HOA order N of the HOA representation C(k′). It is used to reduce the data rate for transmitting the encoded PAR parameters ⁇ M g (k′ ⁇ 1).
- the vector o PAR [ o PAR,1 , . . . ,o PAR,N SB ] T (2) holds the HOA orders for all sub-band groups.
- the mixing of the de-correlated signals is done by a matrix multiplication, where the encoded matrix is included in the PAR parameter set ⁇ M g (k′ ⁇ 1).
- the phase information of the decoded transport signals might get lost at decoder side due to parametric coding tools (for example in case the spectral band replication method is applied).
- the PAR processing can only replicate the spatial power distribution of the missing ambience components, which means that the phase information of the PAR mixing matrix is obsolete.
- the parameter used (k′) is input to each PAR sub-band encoder step/stage 26 , 27 .
- This set holds the indexes of the sparse HOA coefficient sequences from D(k′) that are used to create de-correlated signals.
- the indexes should address coefficient sequences within the HOA order o PAR,g , which should not differ significantly from the sequences of the original HOA representation C(k′).
- the sequences are identical at the PAR encoder so that at decoder side the selected sequences differ only by the distortions added by the perceptual coding.
- the encoded sub-band configuration set ⁇ SUBBAND and the PAR coding parameters o PAR , n SIG (k′) and v COMPLEX are synchronised by their frame indexes and multiplexed into the PAR bit stream parameter set ⁇ PAR (k′ ⁇ 1) in a multiplexer and frame synchronisation step or stage 22 .
- the PAR sub-band encoder steps/stages 26 and 27 are shown in more detail in FIG. 3 .
- the matrices ⁇ tilde over (C) ⁇ (k′,j g ) and ⁇ tilde over (D) ⁇ (k′,j g ) are transformed in steps or stages 311 , 312 , 313 to their spatial domain representations ⁇ tilde over (W) ⁇ (k′,j g ) and ⁇ tilde over (E) ⁇ (k′,j g ) by a spatial transform that is described below in section Spatial transform.
- the matrices of the previous frame are included in order to obtain covariance matrices that are valid for the current and previous frame for enabling a cross-fade between the matrices of two adjacent frames at the PAR decoder.
- the creation of de-correlated signals in steps or stages 331 and 332 transforms a sub-set of coefficient sequences from ⁇ tilde over (D) ⁇ (k′,j g ), which is selected according to the index set of used coefficients used (k′) to the spatial domain and permutes these spatial domain signals with the permutation matrix P o PAR,g ,n SIG,g (k′ ⁇ 1) in order to assign the signals to the corresponding de-correlators that create a matrix ⁇ tilde over (B) ⁇ (k′,j g ).
- a detailed description of these processing steps is given below in section Creation of de-correlated signals.
- the permutation included in ⁇ tilde over (B) ⁇ (k′,j g ) has to be inverted by the matrix P H o PAR,g ,n SIG,g (k′ ⁇ 1) . Therefore the covariance matrices of the de-correlated signals are obtained from
- ⁇ ⁇ D , j g ⁇ ( k ′ - 1 ) P o PAR , g , ⁇ n SIG , g ⁇ ( k ′ - 1 ) H ⁇ B ⁇ ⁇ ( k ′ , j g ) ⁇ B ⁇ ⁇ ( k ′ , j g ) H ⁇ P o PAR , g , n SIG , g ⁇ ( k ′ - 1 ) + ⁇ ( 7 ) ⁇ P o PAR , g , ⁇ n SIG , g ⁇ ( k ′ - 1 ) H ⁇ B ⁇ ⁇ ( k ′ - 1 , j g ) ⁇ B ⁇ ⁇ ( k ′ - 1 , j g ) H ⁇ P o PAR , g , ⁇ n SIG , g ⁇ ( k ′ - 1 ) H
- step or stage 37 mixing matrix M g (k′ ⁇ 1) is quantised and encoded to the parameter set ⁇ M g (k′ ⁇ 1) as described in section Encoding of the mixing matrix.
- the input HOA representation C is transformed to its spatial domain representation W using the spherical harmonic transform from section Definition of real valued Spherical Harmonics for the given HOA order o PAR,g .
- the creation of the de-correlated signals includes the following processing steps:
- the de-correlator removes all inactive HOA coefficient sequences from the input matrix ⁇ tilde over (D) ⁇ (k′,j g ) by replacing rows that have an index that is not an element of the index set used (k′) by an 1 ⁇ tilde over (L) ⁇ vector of zeros.
- the resulting matrix ⁇ tilde over (D) ⁇ ACT is then transformed to its Q PAR,g ⁇ tilde over (L) ⁇ spatial domain representation matrix ⁇ tilde over (W) ⁇ ACT using the spatial transform from section Spatial transform.
- n SIG,g (k′) spatially adjacent signals from ⁇ tilde over (B) ⁇ (k′,j g ) are selected. Therefore the matrix ⁇ tilde over (W) ⁇ ACT is permuted for directing the signals from ⁇ tilde over (W) ⁇ ACT to the de-correlators, so that the best de-correlation between the n SIG,g (k′) selected signals is guaranteed.
- a fixed Q PAR,g ⁇ Q PAR,g permutation matrix P o PAR,g ,n SIG,g (k′) has to be defined for each predefined combination of n SIG,g (k′) and o PAR,g
- the computation of these permutations matrices and the corresponding signal selection tables are given in section Computation of permutation and selection matrices.
- the fading from one permutation matrix to the other prevents discontinuities in the input signals of the de-correlators.
- the Q PAR,g signals in each row of ⁇ tilde over (W) ⁇ PERMUTE are de-correlated by the corresponding de-correlators in order to form the matrix ⁇ tilde over (B) ⁇ (k′,j g ).
- the used de-correlation method is defined in the MPEG Surround standard ISO/IEC FDIS 23003-1, MPEG Surround, section 6.6.
- each de-correlator delays each frequency band signal by an individual number of samples, where the delay is equal for all Q PAR,g de-correlators. Additionally each of the de-correlators applies an individual all-pass filter to its input signal.
- the different configurations of the de-correlators distort the phase information of the spatial domain signals ⁇ tilde over (W) ⁇ PERMUTE differently, which results in a de-correlation of the spatial domain signals.
- the mixing matrix M g (k′ ⁇ 1) can be computed for real-valued non-negative or complex-valued matrix elements which is signalled by the variable v COMPLEX,g .
- v COMPLEX,g the complex-valued mixing matrix is computed according to section Complex-valued mixing matrices, whereby this computation is only applicable if the perceptual coding of the transport channels does not destroy the phase information of the samples in the sub-band group g.
- the diagonal matrix G normalises the energy of ⁇ to the energy of Y where the diagonal elements of G are given by
- Each sub-band j g f g,1 , . . .
- n SIG,g (k′ ⁇ 1) spatially adjacent signals from ⁇ tilde over (B) ⁇ ( ⁇ k′,k′ ⁇ 1 ⁇ ,j g ) can be selected for the computation of each spatial domain signal of the replicated ambient HOA representation.
- each row of the mixing matrix M g (k′ ⁇ 1) has to be computed individually according to the selection matrix
- At least the elements m o,i of the mixing matrix M g (k′ ⁇ 1) are assigned to
- the mixing matrix is chosen such that the sum of the powers of all weighted spatial sub-band signals of the de-correlated HOA representation best approximates the power of the residuum of the original and the spatial domain sub-band signals of the sparse HOA representation.
- NMF Nonnegative Matrix Factorisation
- the quantisation of the matrix elements has to reduce the data rate without decreasing the perceived audio quality of the replicated ambient HOA representation. Therefore the fact can be exploited that, due to the computation of the covariance matrices on overlapping frames, there is a high correlation between the mixing matrices of successive frames.
- each sub-matrix element can be represented by its magnitude and its angle, and then the differences of angles and magnitudes between successive frames are coded.
- the inventors have found experimentally that the occurrence probabilities of the individual differences are distributed in a highly non-uniform manner. In particular, small differences in the magnitudes as well as in the angles occur significantly more frequently than big ones. Hence, a coding method (like Huffman coding) that is based on the a-priori probabilities of the individual values to be coded can be exploited in order to reduce significantly the average number of bits per mixing matrix element.
- n SIG,g (k′ ⁇ 1) has to be transmitted per frame.
- An index of a predefined table can be signalled for this purpose, which index is defined for each valid PAR HOA order.
- the number of active (i.e. non-zero) elements per row can be reduced.
- the active row elements correspond to n SIG of Q PAR de-correlated signals in the spatial domain that are used for mixing one spatial domain signal of the replicated ambient HOA representation, which is now called target signal.
- the complex-valued sub-band signals of the de-correlated spatial domain signals to be mixed should ideally have a scaled magnitude spectrum as the target signal, but different phase spectra. This can be achieved by selecting the signals to be mixed from the spatial vicinity of the target signal.
- n SIG signals of a group for a given HOA order o PAR is to compute the angular distance between all spatial domain positions and the position of the o-th target signal, and to select the signal indexes belonging to the n SIG smallest distances into the o-th group.
- the o-th row vector of the matrix S n SIG (o PAR ) from equation (34) consists of the ascendingly sorted indexes of the o-th group.
- the matrices for each predefined combination of o PAR and n SIG are assumed to be known in the PAR encoder and decoder.
- the framework of the HOA decoder/HOA decompressor including the PAR decoder is depicted in FIG. 4 .
- the bit steam parameter set ⁇ (k) is de-multiplexed in a demultiplexer step or stage 41 into the side information parameter sets ⁇ HOA (k) and ⁇ PAR (k), and the signal parameter set ⁇ Trans (k). Because the delay between the side information and the signal parameters has already been aligned in the HOA encoder, the decoder side receives its data already synchronised.
- the signal parameter set ⁇ Trans (k) is fed to a perceptual audio decoder step or stage 42 that decodes the sparse HOA representation ⁇ circumflex over (Z) ⁇ (k) from the signal parameter set ⁇ Trans (k) following HOA decoder step or stage 43 composes the decoded sparse HOA representation ⁇ circumflex over (D) ⁇ (k) from the decoded transport signals ⁇ circumflex over (Z) ⁇ (k) and the side information parameter set ⁇ HOA (k).
- the index set used (k) is also reconstructed by the HOA decoder step/stage 43 .
- the decoded sparse HOA representation ⁇ circumflex over (D) ⁇ (k), the index set used (k) and the PAR side information parameter set ⁇ PAR (k) are fed to a PAR decoder step or stage 44 , which reconstructs therefrom the replicated ambient HOA representation and enhances the decoded sparse HOA representation ⁇ circumflex over (D) ⁇ (k) to the decoded HOA representation ⁇ (k).
- the PAR decoder framework shown in FIG. 5 enhances the decoded sparse HOA representation ⁇ circumflex over (D) ⁇ (k) by the decoded replicated ambient HOA representation C PAR (k) in order to reconstruct the decoded HOA representation ⁇ (k).
- the samples of the decoded HOA representation ⁇ (k) are delayed according to the analysis and synthesis delays of the applied filter banks.
- the applied filter-bank has to be identical to the one that has been used in the PAR encoder at encoder side.
- the group allocation step or stage 54 directs the parameters from steps/stages 51 and 53 and the frequency-band HOA representations (k,j) from step/stage 52 to the corresponding PAR sub-band decoder steps or stages 55 , 56 for sub-bands 1 . . . N SB .
- the resulting replicated ambient HOA representation matrices ⁇ tilde over (C) ⁇ PAR (k,j) of each frequency-band are transformed to the time domain HOA representation C PAR (k) in a synthesis filter bank step or stage 58 .
- C PAR (k) is in a combining step or stage 59 sample-wise added to the delay compensated (in filter bank delay compensation 57 ) sparse HOA representation ⁇ circumflex over (D) ⁇ DELAY (k), so as to create the decoded HOA representation ⁇ (k).
- the permuted and de-correlated spatial domain signal matrices ⁇ tilde over (B) ⁇ (g,j g ) are generated in steps or stages 611 , 612 from the coefficients sequences of the sparse HOA representation matrices (g,j g ) using the parameters used (k), o PAR,g and n SIG,g (k), where the processing is identical to the processing from section Creation of de-correlated signals used in the PAR sub-band encoder.
- the mixing matrix ⁇ circumflex over (M) ⁇ g (k) is obtained in mixing matrix decoding step or stage 63 from the data set of the encoded mixing matrix ⁇ M g (k) using the parameters o PAR,g , n SIG,g (k) and v COMPLEX,g
- the actual decoding of the mixing matrix elements is described in section Decoding of mixing matrix.
- the spatial domain signals of the replicated ambient HOA representation ⁇ tilde over (W) ⁇ PAR (k,j g ) are generated in ambience replication steps or stages 621 , 622 from the corresponding de-correlated spatial domain signals (k,j g ), using o PAR,g , n SIG,g (k) and ⁇ circumflex over (M) ⁇ g (k), by the ambience replication processing described in section Ambience replication for each frequency band j g of the sub-band group g.
- the spatial domain signals of the replicated ambient HOA representation ⁇ tilde over (W) ⁇ PAR (k,j g ) are transformed back in steps or stages 641 , 642 to their HOA representation using o PAR,g and the inverse spatial transform, where the inverse spherical harmonic transform from section Spherical Harmonic transform is applied.
- the created replicated ambient HOA representation matrix ⁇ tilde over (C) ⁇ PAR (k,j g ) must have the dimensions N ⁇ tilde over (L) ⁇ where only the first Q PAR,g rows of the corresponding PAR HOA order o PAR,g have non-zero elements.
- the indexes of the elements of the encoded mixing matrix are defined by the current selection matrix S n SIG,g (k) (o PAR,g ) , so that Q PAR,g times n SIG,g (k) elements per mixing matrix have to be decoded.
- the ambience replication performs an inverse permutation of the de-correlated spatial domain signals, which is defined by the permutation matrix for the parameters o PAR,g and n SIG,g (k), followed by a multiplication by the mixing matrix ⁇ circumflex over (M) ⁇ g (k).
- the de-correlated signals from the current frame are processed and cross-faded using the parameters of the current and the previous frame.
- the processing of the ambience replication is therefore defined by
- W ⁇ PAR ⁇ ( k , j g ) ( diag ⁇ ( f i ⁇ ⁇ n ) ⁇ M ⁇ g ⁇ ( k ) ⁇ P o PAR , g , n SIG , g ⁇ ( k ) H + diag ⁇ ( f out ) ⁇ M ⁇ g ⁇ ( k - 1 ) ⁇ P o PAR , g , n SIG , g ⁇ ( k - 1 ) H ) ⁇ B ⁇ ⁇ ⁇ ( k , j g ) , ( 42 ) where the cross-fade function from equations (14) and (15) are used.
- HOA Higher Order Ambisonics
- j n ( ⁇ ) denote the spherical Bessel functions of the first kind and S n m ( ⁇ , ⁇ ) denote the real valued Spherical Harmonics of order n and degree m, which are defined in section Definition of real valued Spherical Harmonics.
- the expansion coefficients A n m (k) only depend on the angular wave number k. Note that it has been implicitly assumed that the sound pressure is spatially band-limited. Thus the series is truncated with respect to the order index n at an upper limit N, which is called the order of the HOA representation.
- These time domain functions are referred to as continuous-time HOA coefficient sequences here, which can be collected in a single vector c(t) by
- c ⁇ ( t ) [ c 0 0 ⁇ ( t ) c 1 - 1 ⁇ ( t ) c 1 0 ⁇ ( t ) c 1 1 ⁇ ( t ) c 2 - 2 ⁇ ( t ) c 2 - 1 ⁇ ( t ) c 2 0 ⁇ ( t ) c 2 1 ⁇ ( t ) c 2 2 ⁇ ( t ) ... c N N - 1 ⁇ ( t ) c N N ⁇ ( t ) ] T ( 48 )
- the position index of an HOA coefficient sequence c n m (t) within vector c(t) is given by n(n+1)+1+m.
- the elements of c(lT S ) are referred to as discrete-time HOA coefficient sequences, which can be shown to always be real-valued. This property also holds for the continuous-time versions c n m (t). Definition of Real Valued Spherical Harmonics
- the described processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the complete processing.
- the instructions for operating the processor or the processors according to the described processing can be stored in one or more memories.
- the at least one processor is configured to carry out these instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Analysis (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- General Physics & Mathematics (AREA)
- Algebra (AREA)
- Stereophonic System (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
-
- transforming said spatially sparse decoded HOA representation into a number of complex-valued frequency domain sub-band representations and transforming using an analysis filter bank a correspondingly delayed version of said HOA signal representation into a corresponding number of complex-valued frequency domain sub-band representations;
- grouping said sub-bands into a number of sub-band groups, and within each of these sub-band groups:
- creating, using de-correlation filters, for each sub-band in a sub-band group from said complex-valued frequency domain sub-band representation a number of modified phase spectra signals which are uncorrelated with said complex-valued frequency domain sub-band representation;
- computing for each sub-band in a sub-band group from said modified phase spectra signals a decorrelation covariance matrix;
- transforming for each sub-band in a sub-band group said complex-valued frequency domain sub-band representation into its spatial domain representation and computing therefrom a corresponding covariance matrix;
- transforming for each sub-band in a sub-band group a complex-valued frequency domain sub-band representation for said HOA signal representation into its spatial domain representation and computing therefrom a corresponding covariance matrix,
for each sub-band group: - for all sub-bands of a sub-band group, combining said decorrelation covariance matrices so as to provide a sub-band group decorrelation covariance matrix {tilde over (Σ)}DECO,g(k′−1);
- for all sub-bands of a sub-band group, combining the covariance matrices for said spatial domain representation of said complex-valued frequency domain sub-band representations so as to provide a sub-band group covariance matrix {tilde over (Σ)}SPARS,g (k′−1);
- for all sub-bands of a sub-band group, combining the covariance matrices for said spatial domain representation of said complex-valued frequency domain sub-band representations for said HOA signal representation so as to provide a sub-band group covariance matrix {tilde over (Σ)}ORIG,g(k′−1);
- forming the residual between the combined covariance matrices {tilde over (Σ)}ORIG,g(k′−1) and {tilde over (Σ)}SPARS,g(k′−1), so as to provide a matrix ΔΣg(k′−1);
- computing, using matrix {tilde over (Σ)}DECO,g(k′−1) and matrix ΔΣg(k′−1), a corresponding mixing matrix;
- encoding said mixing matrix so as to provide a parameter set for the sub-band group;
- multiplexing said parameter sets for said sub-band groups
and encoded sub-band configuration data and Parametric Ambience Replication coding parameters so as to provide a Parametric Ambience Replication parameter set.
-
- transform said spatially sparse decoded HOA representation into a number of complex-valued frequency domain sub-band representations and transform using an analysis filter bank a correspondingly delayed version of said HOA signal representation into a corresponding number of complex-valued frequency domain sub-band representations;
- group said sub-bands into a number of sub-band groups, and within each of these sub-band groups:
- create, using de-correlation filters, for each sub-band in a sub-band group from said complex-valued frequency domain sub-band representation a number of modified phase spectra signals which are uncorrelated with said complex-valued frequency domain sub-band representation;
- compute for each sub-band in a sub-band group from said modified phase spectra signals a decorrelation covariance matrix;
- transform for each sub-band in a sub-band group said complex-valued frequency domain sub-band representation into its spatial domain representation and compute therefrom a corresponding covariance matrix;
- transform for each sub-band in a sub-band group a complex-valued frequency domain sub-band representation for said HOA signal representation into its spatial domain representation and compute therefrom a corresponding covariance matrix,
for each sub-band group: - for all sub-bands of a sub-band group, combine said decorrelation covariance matrices so as to provide a sub-band group decorrelation covariance matrix {tilde over (Σ)}DECO,g(k′−1);
- for all sub-bands of a sub-band group, combine the covariance matrices for said spatial domain representation of said complex-valued frequency domain sub-band representations so as to provide a sub-band group covariance matrix {tilde over (Σ)}SPARS,g (k′−1);
- for all sub-bands of a sub-band group, combine the covariance matrices for said spatial domain representation of said complex-valued frequency domain sub-band representations for said HOA signal representation so as to provide a sub-band group covariance matrix {tilde over (Σ)}ORIG,g(k′−1);
- form the residual between the combined covariance matrices {tilde over (Σ)}ORIG,g(k′−1) and {tilde over (Σ)}SPARS,g(k′−1), so as to provide a matrix ΔΣg(k′−1);
- compute, using matrix {tilde over (Σ)}DECO,g(k′−1) and matrix ΔΣg(k′−1), a corresponding mixing matrix;
- encode said mixing matrix so as to provide a parameter set for the sub-band group;
- multiplex said parameter sets for said sub-band groups
and encoded sub-band configuration data and Parametric Ambience Replication coding parameters so as to provide a Parametric Ambience Replication parameter set.
-
- reconstructing from said spatially sparse decoded HOA representation, said set of indices of coefficient sequences and said Parametric Ambience Replication parameter set an improved HOA representation, said reconstructing including:
- determining from said Parametric Ambience Replication parameter set a sub-band configuration;
- converting said spatially sparse decoded HOA representation into a number of frequency-band HOA representations;
- according to said sub-band configuration, allocating corresponding groups of frequency-band HOA representations together with related parameters to a corresponding number of Parametric Ambience Replication sub-band decoder steps or stages which create de-correlated coefficient sequences of a replicated ambience HOA representation;
- transforming said coefficient sequences of said replicated ambience HOA representation to a replicated time domain HOA representation;
- enhancing with said replicated time domain HOA representation said spatially sparse decoded HOA representation, so as to provide an enhanced decompressed HOA representation.
-
- reconstruct from said spatially sparse decoded HOA representation, said set of indices of coefficient sequences and said Parametric Ambience Replication parameter set an improved HOA representation, wherein that reconstruction includes:
- determine from said Parametric Ambience Replication parameter set a sub-band configuration;
- convert said spatially sparse decoded HOA representation into a number of frequency-band HOA representations;
- according to said sub-band configuration, allocate corresponding groups of frequency-band HOA representations together with related parameters to a corresponding number of Parametric Ambience Replication sub-band decoder steps or stages which create de-correlated coefficient sequences of a replicated ambience HOA representation;
- transform said coefficient sequences of said replicated ambience HOA representation to a replicated time domain HOA representation;
- enhance with said replicated time domain HOA representation said spatially sparse decoded HOA representation, so as to provide an enhanced decompressed HOA representation.
and a
In step or
where the first and second columns hold the index j of the first and last sub-band index of the corresponding sub-band group g. The sub-band configuration is encoded in step or stage 21 to the parameter set ΓSUBBAND by the method described in European patent application EP 14306347.7. Because it is fixed for each frame index k, it has to be transmitted to the decoder only once for initialisation.
o PAR=[o PAR,1 , . . . ,o PAR,N
holds the HOA orders for all sub-band groups.
n SIG(k′)=[n SIG,1(k′), . . . ,n SIG,N
with 0≤nSIG,g(k′)≤(oPAR,g+1)2 and nSIG,g(k′)∈ 0. It is updated per frame because the number of required signals depends on the HOA representation. For HOA representations comprising highly spatially diffuse scenes, more de-correlated signals are required than for a HOA representation that are less spatially diffuse. Because the data rate for the encoded PAR parameters increases with the used number of de-correlated signals, the parameter can also be used for reducing the data rate.
v COMPLEX=[v COMPLEX,1 , . . . ,v COMPLEX,N
comprises a Boolean variable that indicates whether or not the elements of the mixing matrix are real-valued non-negative or complex-valued numbers, where it can be defined that for vCOMPLEX,g=1 a matrix of complex-valued elements is used in sub-band group g. Due to the compression of the transport signals Z(k), the phase information of the decoded transport signals might get lost at decoder side due to parametric coding tools (for example in case the spectral band replication method is applied). In this case the PAR processing can only replicate the spatial power distribution of the missing ambience components, which means that the phase information of the PAR mixing matrix is obsolete. Furthermore the parameter used(k′) is input to each PAR sub-band encoder step/
the encoded sub-band configuration set ΓSUBBAND and the PAR coding parameters oPAR, nSIG(k′) and vCOMPLEX are synchronised by their frame indexes and multiplexed into the PAR bit stream parameter set ΓPAR(k′−1) in a multiplexer and frame synchronisation step or
PAR Sub-Band Encoder
{tilde over (Σ)}S,j
and
{tilde over (Σ)}O,j
are computed where AH denotes the hermitian transposed of a matrix A. The matrices of the previous frame are included in order to obtain covariance matrices that are valid for the current and previous frame for enabling a cross-fade between the matrices of two adjacent frames at the PAR decoder. The creation of de-correlated signals in steps or
{tilde over (Σ)}SPARS,g(k′−1)=Σj
in a combiner step or
{tilde over (Σ)}ORIG,g(k′−1)=Σj
in a combiner step or
{tilde over (Σ)}DECO,g(k′−1)=Σj
in a combiner step or
ΔΣg(k′−1)={tilde over (Σ)}ORIG,g(K′−1)−{tilde over (Σ)}SPARS,g(k′−1) (12)
generated in combiner step or stage 353, and from the matrices {tilde over (W)}(k′,jg) and {tilde over (B)}(k′,jg) the mixing matrix Mg(k′−1) is obtained by a mixing matrix computing step or
-
- Select a sub-set of coefficient sequences defined by the index set of used coefficients used(k′) from the sparse HOA representation {tilde over (D)}(k′,jg);
- Perform the spatial transform of the selected coefficient sequences according to section Spatial transform for the HOA order oPAR,g;
- Permutation of the spatial domain signals for the assignment to the de-correlators by the permutation matrix Po
PAR,g ,nSIG,g (k′), which is selected for the number of signals nSIG,g (k′) used for the ambience replication and the HOA order oPAR,g; - De-correlate the permuted signals using an individual processing that modifies the phase of the sub-band signals while best preserving the magnitude of the sub-band signals.
where diag(f) forms a diagonal matrix from the elements of f.
f in:=[f win(1)f win(2) . . . f win({tilde over (L)})] (14)
f out:=[f win({tilde over (L)}+1)f win({tilde over (L)}+2) . . . f win(2{tilde over (L)})] (15)
and whose elements are obtained from
satisfying M=argminM′∈A(∥M′X−GQX∥ FRO 2) (17)
with A={M′=argminM″∥ΣY −M″Σ X M″ H∥2} (18)
is given by M=K Y VU H K X −1 (19)
with ΣY =K Y K Y H =YY H ,K Y∈ Q
ΣX =K X K X H =XX H ,K X∈ Q
USV H =K X H Q H G H K Y, (22)
where ∥·∥FRO denotes the Frobenius norm of a matrix, and the signal vector X and the covariance matrix ΣY of Ŷ are known. The prototype mixing matrix Q satisfies Ŷ=QX so that Ŷ is a good approximation of Y. As the energies of the signals from Ŷ and Y might differ, the diagonal matrix G normalises the energy of Ŷ to the energy of Y where the diagonal elements of G are given by
and σY
C out({k′,k′−1},j g)={tilde over (E)}({k′,k′−1},j g)+M g(k′−1){tilde over (B)}({k′,k′−1},j g), (24)
where the notation {k′,k′−1} is used to express that the mixing matrix Mg(k′−1) is valid for the current and the previous frame.
Σout(k′−1)={tilde over (Σ)}SPARS,g(k′+1)+M g(k′+1){tilde over (Σ)}DECO,g(k′−1)M g(k′−1)H. (25)
Σout(k′−1){tilde over (Σ)}ORIG,g(k′−1). (26)
ΔΣg(k−1) M g(k′−1){tilde over (Σ)}DECO,g(k′−1)M g(k′−1)H, (27)
where ΔΣg(k′−1) is defined in equation (12).
ΣY:=ΔΣg(k′−1) (28)
ΣX:={tilde over (Σ)}DECO,g(k′−1) (29)
X:={tilde over (B)}({k′,k′−1},j g) (30)
Y:={tilde over (W)}({k′,k′−1},j g)−{tilde over (E)}({k′,k−1},j g), (31)
where KY and KX can be computed from the singular value decomposition of ΔΣg(k′−1) and {tilde over (Σ)}DECO,g (k′−1).
{tilde over (W)}({k′,k′−1},j g)−{tilde over (Σ)}({k′,k′−1},j g) Q{tilde over (B)}({k′,k′−1},j g) for all j g =f g,1 f g,2. (32)
by using the Moore-Penrose pseudoinverse.
where the elements so,n denote the indexes of the row vectors from {tilde over (B)}({k′,k′−1},jg) that are used to create the o-th spatial domain signal of the replicated ambient HOA representation with n=1 . . . nSIG,g(k′−1). To solve equation (19) individually for each row of the mixing matrix, it has to be transformed to
P −H K X H M H =K Y H, (35)
with P=VU H. It is defined that T:=P −H K X H (36)
and ta is one of the a=1 . . . QPAR,g column vectors of T. For the computation of each of the o=1 . . . QPAR,g rows of Mg(k′−1), the sub-matrix
is built and the vector mrow,o is determined by
m row,o =T o + k Y,o H (38)
where kY,o is the o-th row vector from KY and To + denotes the Moore-Penrose pseudoinverse. In some cases To can be ill-conditioned which might require a regularisation in the computation of the pseudoinverse.
where mrow,o,a are the elements of the vector mrow,o and o=1 . . . QPAR,g.
Real-Valued Non-Negative Mixing Matrices
|{tilde over (W)}({k′,k′−1},j g)−{tilde over (E)}({k′,k′−1},j g)|2 −|M g(k′−1)|2 |{tilde over (B)}({k′,k′−1},j g)|2 (40)
where the operation |·|2 is assumed to be applied element-wise to the matrices. In other words, the mixing matrix is chosen such that the sum of the powers of all weighted spatial sub-band signals of the de-correlated HOA representation best approximates the power of the residuum of the original and the spatial domain sub-band signals of the sparse HOA representation. In this case, Nonnegative Matrix Factorisation (NMF) techniques can be used to solve this optimisation problem. For an introduction to NMF, see e.g. D. D. Lee, H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization”, Nature, vol. 401, pages 788-791, 1999.
Encoding of the Mixing Matrix
-
- Build for each group a covariance sub-matrix by selecting only the elements from matrix Σ that are assigned to the signals of the group;
- Sum the quotient of the maximum and the minimum singular value of each covariance sub-matrix.
m a,b =m ABS,a,b ·e im
where ma,b is the element of {circumflex over (M)}g(k) in the a-th row and in the b-th column, mANGLE,a,b and mABS,a,b are the corresponding elements of the updated reconstructed angle and magnitude mixing matrices.
Ambience Replication
where the cross-fade function from equations (14) and (15) are used.
Basics of Higher Order Ambisonics
P(ω,x)= t(p(t,x))=∫−∞ ∞ p(t,x)e −iωt dt (43)
with ω denoting the angular frequency and i indicating the imaginary unit, may be expanded into the series of Spherical Harmonics according to
P(ω=kc s ,r,θ,ϕ)=Σn=0 NΣm=−n n A n m(k)j n(kr)S n m(θ,ϕ), (44)
wherein cs denotes the speed of sound and k denotes the angular wave number, which is related to the angular frequency ω by
Further, jn (·) denote the spherical Bessel functions of the first kind and Sn m(θ,ϕ) denote the real valued Spherical Harmonics of order n and degree m, which are defined in section Definition of real valued Spherical Harmonics. The expansion coefficients An m(k) only depend on the angular wave number k. Note that it has been implicitly assumed that the sound pressure is spatially band-limited. Thus the series is truncated with respect to the order index n at an upper limit N, which is called the order of the HOA representation. If the sound field is represented by a superposition of an infinite number of harmonic plane waves of different angular frequencies ω arriving from all possible directions specified by the angle tuple (θ,ϕ), it can be shown (see B. Rafaely, “Plane-wave decomposition of the sound field on a sphere by spherical convolution”, J. Acoust. Soc. Am., vol. 4(116), pages 2149-2157, October 2004) that the respective plane wave complex amplitude function C(ω,θ,ϕ) can be expressed by the following Spherical Harmonics expansion
C(ω=kc s,θ,ϕ)=Σn=0 NΣm=−n n C n m(k)S n m(θ,ϕ), (45)
where the expansion coefficients Cn m(k) are related to the expansion coefficients An m(k) by
A n m(k)=i n C n m(k). (46)
for each order n and degree m. These time domain functions are referred to as continuous-time HOA coefficient sequences here, which can be collected in a single vector c(t) by
={c(T S),c(2T S),c(3T S),c(4T S), . . . } (49)
where TS=1/fS denotes the sampling period. The elements of c(lTS) are referred to as discrete-time HOA coefficient sequences, which can be shown to always be real-valued. This property also holds for the continuous-time versions cn m(t).
Definition of Real Valued Spherical Harmonics
with
with the Legendre polynomial Pn(x) and, unlike in E. G. Williams, “Fourier Acoustics”, vol. 93 of Applied Mathematical Sciences, Academic Press, 1999, without the Condon-Shortley phase term (−1)m.
Spherical Harmonic Transform
c SPAT(t):=[c(t,Ω 1) . . . c(t,Ω O)]T, (53)
it can be computed from the continuous Ambisonics representation c(t) defined in equation (48) by a simple matrix multiplication as
c SPAT(t)=ΨH c(t), (54)
where (·)H indicates the joint transposition and conjugation, and Ψ denotes a mode-matrix defined by
Ψ:=[S 1 . . . S O] (55)
with
S O:=[S 0 0(ΩO)S 1 −1(Ωo)1 0(ΩO)S 1 1(ΩO) . . . S N N−1(ΩO)S N N(ΩO)]. (56)
c(t)=Ψ−H c SPAT(t). (57)
ΨH≈Ψ−1 (58)
is available, which justifies the use of Ψ−1 instead of ΨH in equation (54). Advantageously, all the mentioned relations are valid for the discrete-time domain, too.
Claims (17)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP14306607 | 2014-10-10 | ||
| EP14306607.4A EP3007167A1 (en) | 2014-10-10 | 2014-10-10 | Method and apparatus for low bit rate compression of a Higher Order Ambisonics HOA signal representation of a sound field |
| EP14306607.4 | 2014-10-10 | ||
| PCT/EP2015/072064 WO2016055284A1 (en) | 2014-10-10 | 2015-09-25 | Method and apparatus for low bit rate compression of a higher order ambisonics hoa signal representation of a sound field |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20170243589A1 US20170243589A1 (en) | 2017-08-24 |
| US10262663B2 true US10262663B2 (en) | 2019-04-16 |
Family
ID=51842455
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/509,596 Active US10262663B2 (en) | 2014-10-10 | 2015-09-25 | Method and apparatus for low bit rate compression of a higher order ambisonics HOA signal representation of a sound field |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US10262663B2 (en) |
| EP (2) | EP3007167A1 (en) |
| JP (1) | JP6378432B2 (en) |
| KR (1) | KR101970080B1 (en) |
| CN (1) | CN107077853B (en) |
| TW (1) | TW201614638A (en) |
| WO (1) | WO2016055284A1 (en) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| MC200186B1 (en) * | 2016-09-30 | 2017-10-18 | Coronal Encoding | Method for conversion, stereo encoding, decoding and transcoding of a three-dimensional audio signal |
| FR3060830A1 (en) * | 2016-12-21 | 2018-06-22 | Orange | SUB-BAND PROCESSING OF REAL AMBASSIC CONTENT FOR PERFECTIONAL DECODING |
| KR102654507B1 (en) | 2017-07-14 | 2024-04-05 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description |
| SG11202000287RA (en) | 2017-07-14 | 2020-02-27 | Fraunhofer Ges Forschung | Concept for generating an enhanced sound-field description or a modified sound field description using a depth-extended dirac technique or other techniques |
| EP3652736A1 (en) | 2017-07-14 | 2020-05-20 | Fraunhofer Gesellschaft zur Förderung der Angewand | Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description |
| CN109389987B (en) | 2017-08-10 | 2022-05-10 | 华为技术有限公司 | Audio codec mode determination method and related products |
| KR102159631B1 (en) * | 2018-11-21 | 2020-09-24 | 에스티엑스엔진 주식회사 | Method for processing the signal for an adaptive beamformer using sub-band steering covariance matrix |
| CN114144982B (en) * | 2019-08-01 | 2024-05-10 | 联想(新加坡)私人有限公司 | Method and apparatus for generating channel state information report |
| FR3101741A1 (en) * | 2019-10-02 | 2021-04-09 | Orange | Determination of corrections to be applied to a multichannel audio signal, associated encoding and decoding |
| US11601135B2 (en) * | 2020-02-27 | 2023-03-07 | BTS Software Solutions, LLC | Internet of things data compression system and method |
| KR20230062836A (en) | 2020-09-09 | 2023-05-09 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | Parametrically coded audio processing |
| CN115376528A (en) * | 2021-05-17 | 2022-11-22 | 华为技术有限公司 | Three-dimensional audio signal coding method, device and coder |
| CN120781493B (en) * | 2025-07-01 | 2025-12-12 | 北京工业大学 | Building cold load period prediction method based on boundary feature protection and HOA-LightGBM model |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2665208A1 (en) | 2012-05-14 | 2013-11-20 | Thomson Licensing | Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation |
| EP2743922A1 (en) | 2012-12-12 | 2014-06-18 | Thomson Licensing | Method and apparatus for compressing and decompressing a higher order ambisonics representation for a sound field |
| WO2014177455A1 (en) | 2013-04-29 | 2014-11-06 | Thomson Licensing | Method and apparatus for compressing and decompressing a higher order ambisonics representation |
| EP2993665A1 (en) | 2014-09-02 | 2016-03-09 | Thomson Licensing | Method and apparatus for coding or decoding subband configuration data for subband groups |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2007111568A2 (en) * | 2006-03-28 | 2007-10-04 | Telefonaktiebolaget L M Ericsson (Publ) | Method and arrangement for a decoder for multi-channel surround sound |
| CN101067931B (en) * | 2007-05-10 | 2011-04-20 | 芯晟(北京)科技有限公司 | Efficient configurable frequency domain parameter stereo-sound and multi-sound channel coding and decoding method and system |
| EP2450880A1 (en) * | 2010-11-05 | 2012-05-09 | Thomson Licensing | Data structure for Higher Order Ambisonics audio data |
| EP2637427A1 (en) * | 2012-03-06 | 2013-09-11 | Thomson Licensing | Method and apparatus for playback of a higher-order ambisonics audio signal |
| EP2688066A1 (en) * | 2012-07-16 | 2014-01-22 | Thomson Licensing | Method and apparatus for encoding multi-channel HOA audio signals for noise reduction, and method and apparatus for decoding multi-channel HOA audio signals for noise reduction |
-
2014
- 2014-10-10 EP EP14306607.4A patent/EP3007167A1/en not_active Withdrawn
-
2015
- 2015-09-25 EP EP15767514.1A patent/EP3204940B1/en active Active
- 2015-09-25 US US15/509,596 patent/US10262663B2/en active Active
- 2015-09-25 KR KR1020177009547A patent/KR101970080B1/en active Active
- 2015-09-25 CN CN201580056173.8A patent/CN107077853B/en active Active
- 2015-09-25 JP JP2017518906A patent/JP6378432B2/en active Active
- 2015-09-25 WO PCT/EP2015/072064 patent/WO2016055284A1/en not_active Ceased
- 2015-10-02 TW TW104132462A patent/TW201614638A/en unknown
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2665208A1 (en) | 2012-05-14 | 2013-11-20 | Thomson Licensing | Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation |
| EP2743922A1 (en) | 2012-12-12 | 2014-06-18 | Thomson Licensing | Method and apparatus for compressing and decompressing a higher order ambisonics representation for a sound field |
| WO2014177455A1 (en) | 2013-04-29 | 2014-11-06 | Thomson Licensing | Method and apparatus for compressing and decompressing a higher order ambisonics representation |
| EP2993665A1 (en) | 2014-09-02 | 2016-03-09 | Thomson Licensing | Method and apparatus for coding or decoding subband configuration data for subband groups |
Non-Patent Citations (14)
| Title |
|---|
| Boaz Rafaely, "Plane-wave decomposition of the sound field on a sphere by spherical convolution", J. Acoust. Soc. Am. 4(116): 2149-2157, Oct. 2004. |
| Earl G. Williams "Fourier Acoustics Sound Radiation and Nearfield Acoustical Holography", vol. 93 of Applied Mathematical Sciences. Academic Press, 1999. |
| ISO/IEC JTC 1/SC29, 23008-3 "Information Technology-High Efficiency Coding and Media Delivery in Heterogenous Environments-Part 3:3D Audio" Jul. 25, 2014. |
| ISO/IEC JTC 1/SC29, 23008-3 "Information Technology-High Efficiency Coding and Media Delivery in Heterogenous Environments—Part 3:3D Audio" Jul. 25, 2014. |
| ISO/IEC JTC1/SC29. N8324-Text of ISO/IEC FDIS 23003-1, MPEG Surround. MPEG, Klagenfurt, Audio Subgroup, Jul. 2006. |
| ISO/IEC JTC1/SC29. N8324—Text of ISO/IEC FDIS 23003-1, MPEG Surround. MPEG, Klagenfurt, Audio Subgroup, Jul. 2006. |
| ISO/IEC JTC1/SC29-WG11 N14264 "WD1-HOA Text of MPEG-H 3D Audio" Jan. 2014, San Jose, USA Audio-Subgroup. |
| Jérôme Daniel Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia. PhD thesis, Université Paris 6, 2001. |
| Jurgen, H. et al "MPEG-H Audio-The New Standard for Universal Spatial/3D Audio Co." AES Convention 137, Oct. 2014, pp. 8-9. |
| Jurgen, H. et al "MPEG-H Audio—The New Standard for Universal Spatial/3D Audio Co." AES Convention 137, Oct. 2014, pp. 8-9. |
| Lee, D. et al. "Learning the parts of objects by nonnegative matrix factorization", MacMillan Magazines Ltd. Nature, vol. 401, Oct. 21, 1999, pp. 788-791. |
| Sen, Deep et al "RM1-HOA Working Draft Text" MPEG Meeting Jan. 13-17, 2014, Spatial HOA Encoding, pp. 64-75. |
| Vilkamo, J. et al "Optimized covariance domain framework for time-frequency processing of spatial audio" J. Audio Eng. Soc., vol. 61, No. 6, Jun. 2013, pp. 403-411. |
| Ville Pulkki, "Directional audio coding in spatial sound reproduction and stereo upmixing", Directional Audio Coding in Spatial Sound, AES 28th International Conference, Pitea, Sweden, Jun. 30-Jul. 2, 2006. |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3007167A1 (en) | 2016-04-13 |
| KR20170055512A (en) | 2017-05-19 |
| WO2016055284A1 (en) | 2016-04-14 |
| EP3204940B1 (en) | 2019-08-14 |
| TW201614638A (en) | 2016-04-16 |
| JP2017534909A (en) | 2017-11-24 |
| KR101970080B1 (en) | 2019-04-17 |
| EP3204940A1 (en) | 2017-08-16 |
| CN107077853B (en) | 2020-09-08 |
| JP6378432B2 (en) | 2018-08-22 |
| CN107077853A (en) | 2017-08-18 |
| US20170243589A1 (en) | 2017-08-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10262663B2 (en) | Method and apparatus for low bit rate compression of a higher order ambisonics HOA signal representation of a sound field | |
| JP6866519B2 (en) | Methods and Devices for Encoding Multi-Channel HOA Audio Signals for Noise Reduction and Methods and Devices for Decoding Multi-Channel HOA Audio Signals for Noise Reduction | |
| US9774975B2 (en) | Method and apparatus for decoding a compressed HOA representation, and method and apparatus for encoding a compressed HOA representation | |
| EP3860154B1 (en) | Method for decoding a compressed hoa dataframe representation of a sound field. | |
| KR20160002846A (en) | Method and apparatus for compressing and decompressing a higher order ambisonics representation | |
| US20180007484A1 (en) | Method for decoding a higher order ambisonics (hoa) representation of a sound or soundfield | |
| US10403292B2 (en) | Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a HOA signal representation | |
| US10194257B2 (en) | Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a HOA signal representation | |
| US9794714B2 (en) | Method and apparatus for decoding a compressed HOA representation, and method and apparatus for encoding a compressed HOA representation | |
| JP5340378B2 (en) | Channel signal generation device, acoustic signal encoding device, acoustic signal decoding device, acoustic signal encoding method, and acoustic signal decoding method | |
| US9800986B2 (en) | Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a HOA signal representation | |
| HK1235534A1 (en) | Method and apparatus for low bit rate compression of a higher order ambisonics hoa signal representation of a sound field | |
| HK1235534B (en) | Method and apparatus for low bit rate compression of a higher order ambisonics hoa signal representation of a sound field | |
| HK1242835A1 (en) | Method, apparatus and computer readable medium for decoding hoa audio signals | |
| HK1242834A1 (en) | Method, apparatus and computer readable medium for decoding hoa audio signals | |
| HK1241131A1 (en) | Method, apparatus and computer readable medium for decoding hoa audio signals | |
| HK1233040A1 (en) | Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation | |
| CN101091205A (en) | Scalable encoding device and scalable encoding method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: DOLBY INTERNATIONAL AB, NETHERLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THOMSON LICENSING;REEL/FRAME:041669/0137 Effective date: 20160810 Owner name: THOMSON LICENSING, FRANCE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KEILER, FLORIAN;KORDON, SVEN;KRUEGER, ALEXANDER;SIGNING DATES FROM 20160531 TO 20160612;REEL/FRAME:041669/0103 |
|
| AS | Assignment |
Owner name: DOLBY LABORATORIES LICENSING CORPORATION, CALIFORN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DOLBY INTERNATIONAL AB;REEL/FRAME:043368/0789 Effective date: 20170823 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |