WO2016023323A1 - Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage - Google Patents
Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage Download PDFInfo
- Publication number
- WO2016023323A1 WO2016023323A1 PCT/CN2014/095396 CN2014095396W WO2016023323A1 WO 2016023323 A1 WO2016023323 A1 WO 2016023323A1 CN 2014095396 W CN2014095396 W CN 2014095396W WO 2016023323 A1 WO2016023323 A1 WO 2016023323A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound signal
- channel
- frequency
- time
- channel sound
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 238000013507 mapping Methods 0.000 claims abstract description 227
- 230000005236 sound signal Effects 0.000 claims description 267
- 238000001914 filtration Methods 0.000 claims description 30
- 230000003044 adaptive effect Effects 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 25
- 239000011159 matrix material Substances 0.000 description 18
- 238000005516 engineering process Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000005284 excitation Effects 0.000 description 3
- 238000012880 independent component analysis Methods 0.000 description 3
- 230000000873 masking effect Effects 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 101100500467 Arabidopsis thaliana EAAC gene Proteins 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
Definitions
- the present invention relates to the field of audio processing technologies, and in particular, to a multi-channel sound signal encoding method, a decoding method, and a device.
- multi-channel sound signals are now played to the user by multiple channels, and the encoding method of the multi-channel sound signals is also represented by AC-3 and MP3 and poor stereo
- M/ Wave coding techniques such as S Stereo and Intensity Stereo have evolved to Parametric Stereo and Parametric Surround, represented by MP3 Pro, ITU EAAC+, MPEG Surround, and Dolby DD+.
- PS including Parametric Stereo and Parametric Surround
- takes advantage of psychoacoustics such as binaural time/phase difference (ITD/IPD), binaural intensity difference (IID), and binaural correlation (IC) from the perspective of binaural psychoacoustics. Spatial characteristics to achieve parameter encoding of multi-channel sound signals.
- the PS technology generally downmixes the multi-channel sound signal at the encoding end to generate one sum channel signal, and the channel signal is waveform coded (or waveform and parameter mixed code, such as EAAC+), and each channel Parameter encoding is performed for the ITD/IPD, IID, and IC parameters of the channel signals.
- the multi-channel signal is recovered from the sum channel signal. It is also possible to group multi-channel signals at the time of encoding and to adopt the above PS codec method in different channel groups. It is also possible to perform multi-stage PS encoding on multiple channels in a cascade manner.
- both the traditional PS technology and the MPEG Surround technology rely too much on the psychoacoustic properties of both ears, ignoring the statistical properties of the multi-channel sound signal itself.
- neither the traditional PS technology nor the MPEG Surround technology utilizes statistical redundancy information between pairs of channels.
- MPEG Surround uses residual information coding, there is still statistical redundancy between the channel signal and the residual channel signal, so that the coding efficiency and the quality of the coded signal cannot be balanced.
- the invention provides a multi-channel sound signal encoding method, a decoding method and a device, aiming at solving the prior art multi-channel sound signal encoding method, which has statistical redundancy and cannot balance the encoding efficiency and the quality of the encoded signal. The problem.
- the present invention provides a multi-channel sound signal encoding method, the method comprising: A) mapping a first multi-channel sound signal into a first frequency domain signal by using a time-frequency transform, Or sub-band filtering is used to map the first multi-channel sound signal into a first sub-band signal; B) dividing the first frequency domain signal or the first sub-band signal into different time-frequency sub-bands; Calculating a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands; D) estimating an optimized sub-space mapping model according to the first statistical characteristic E) mapping the first multi-channel sound signal to a second multi-channel sound signal using the optimized subspace mapping model; F) the second plurality according to time, frequency and channel At least one of the channel sound signals A set and the optimized subspace mapping model are perceptually encoded and multiplexed into a coded multi-channel code stream.
- the present invention provides a multi-channel sound signal encoding apparatus, the apparatus comprising: a time-frequency mapping unit, configured to map a first multi-channel sound signal into a first frequency domain signal by using a time-frequency transform, Or sub-band filtering, mapping the first multi-channel sound signal into a first sub-band signal; dividing the first frequency domain signal or the first sub-band signal into different time-frequency sub-bands; adaptive subspace a mapping unit, configured to calculate, in each time-frequency subband of the different time-frequency subbands divided by the time-frequency mapping unit, a first statistical characteristic of the first multi-channel sound signal; a statistical characteristic, an optimized subspace mapping model; the optimized multi-space sound model is used to map the first multi-channel sound signal into a second multi-channel sound signal; the perceptual coding unit is configured to use time, frequency, and Depending on the channel, at least one of the second multi-channel sound signals mapped by the adaptive subspace mapping unit and the optimized subspace mapping model are perceptually
- the present invention provides a multi-channel sound signal decoding method, the method comprising: A) decoding an encoded multi-channel code stream to obtain at least one of a second multi-channel sound signal and an optimizer a spatial mapping model; B) using the optimized subspace mapping model to map the second multi-channel sound signal back to the first multi-channel sound signal; C) using an inverse time-frequency transform, the first plurality The channel sound signal is mapped from the frequency domain to the time domain, or inverse subband filtering is used to map the first multichannel sound signal from the subband domain to the time domain.
- the present invention provides a multi-channel sound signal decoding apparatus, the apparatus comprising: a perceptual decoding unit, configured to decode an encoded multi-channel code stream to obtain at least one of a second multi-channel sound signal And the subspace inverse mapping unit is configured to: use the optimized subspace mapping model obtained by the perceptual decoding unit to map the second multichannel sound signal obtained by the perceptual decoding unit back to the first a channel sound signal; a frequency time mapping unit, configured to use an inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit from a frequency domain to a time domain, or an inverse sub-band Filtering, mapping the first multi-channel sound signal from a sub-band domain to a time domain.
- an adaptive subspace mapping is adopted, and the optimized subspace mapping model is estimated by calculating the statistical characteristics of the multichannel sound signal, and then the optimized subspace mapping model is adopted. , mapping the multi-channel sound signal, and then performing perceptual coding.
- the embodiment of the present invention adaptively selects a mapping model in coding, which can better estimate and utilize the statistical characteristics of signals between channels, and minimize statistical redundancy between channels to achieve higher coding. At the same time of efficiency, the quality of the encoded signal is guaranteed.
- FIG. 1 is a flow chart of a method for encoding a multi-channel sound signal according to an embodiment of the present invention
- FIG. 2 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention
- FIG. 3 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention.
- FIG. 4 is a schematic diagram of a subspace mapping relationship in an embodiment of the present invention.
- FIG. 5 is a schematic diagram showing a comparison between a PCA model and an ICA model according to an embodiment of the present invention
- FIG. 6 is a schematic diagram of time-frequency subband division in an embodiment of the present invention.
- FIG. 7 is a flowchart of a method for decoding a multi-channel sound signal according to an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a multi-channel sound signal encoding apparatus according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of a multi-channel sound signal decoding apparatus according to an embodiment of the present invention.
- the multi-channel sound signal encoding method in the embodiment of the present invention which is different from other methods in the prior art, fully utilizes the statistical characteristics and psychoacoustic characteristics of the multi-channel sound signal, and obtains extremely high encoding efficiency while obtaining extremely high encoding efficiency.
- the adaptive subspace mapping method is adopted to minimize the statistical redundancy between the multi-channel signals, creatively use a variety of subspace mapping models, and adaptively select the mapping model in the encoding, Better estimation and utilization of statistical characteristics of inter-channel signals, and minimize statistical redundancy between channels for higher coding efficiency.
- FIG. 1 is a flowchart of a method for encoding a multi-channel sound signal according to an embodiment of the present invention, where the method includes:
- step 101 the first multi-channel sound signal is mapped to the first frequency domain signal by using time-frequency transform, or the first multi-channel sound signal is mapped to the first sub-band signal by using sub-band filtering.
- the first multi-channel sound signal is initially represented by a time domain signal u(m, t), and the multi-channel frequency domain signal or the sub-band signal x(m, k) can be obtained by the above mapping process.
- m is the channel number
- t is the frame (or subframe) number
- k is the frequency or subband number.
- the time-frequency transform can adopt the commonly used modified cosine transform (MDCT), cosine transform (DCT), and Fourier transform (FFT) time-frequency transform technology
- the sub-band filter can adopt the more commonly used positive Cross-view filter bank (QMF ⁇ PQMF ⁇ CQMF), cosine-modulated filter bank (CMF/MLT) technology
- time-frequency transform can also use multi-resolution analysis techniques such as wavelet transform
- the mapping can take one of the above three mapping methods (such as AC-3, AAC) or a combination (such as MP3, Bell Lab PAC).
- Step 102 Divide the first frequency domain signal or the first subband signal into different time frequency subbands.
- the encoded sound signal may be first divided into frames to be encoded, and then subjected to time-frequency transform or sub-band filtering. If a larger frame length is used, one frame of data may be decomposed into multiple subframes, and then time-frequency transform or sub-band filtering may be performed. After the frequency domain or subband signal is obtained, multiple frequency subbands may be formed in frequency order; the frequency domain signals obtained by multiple time-frequency transforms or sub-band filtering may also be formed into a two-dimensional time-frequency plane, where the plane is performed.
- the time-frequency region is divided; further, the time-frequency region is projected on each channel time-frequency plane, and the time-frequency sub-band x i (t, k) to be encoded can be obtained, where i is the sequence number of the time-frequency sub-band, t is Frame (or subframe) sequence number.
- the signal range in the time-frequency subband x i (t, k) is: t i-1 ⁇ t ⁇ t i , k i-1 ⁇ k ⁇ k i , t I-1 and t i are the start and end frame (or subframe) numbers of the subband, and k i-1 and k i are the start and end frequencies or subband numbers of the subband. If the total number of time-frequency sub-bands is N, then i ⁇ N.
- the area of a time-frequency subband can be represented by (t, k).
- each time-frequency sub-band includes a signal projected by each channel in the time-frequency region.
- x i (t, k, m) can be used. ) said.
- Step 103 Calculate a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands.
- Step 104 Estimate the optimized subspace mapping model according to the first statistical characteristic.
- an optimized subspace mapping model may be selected, and the mapping coefficient of the optimized subspace mapping model may be adaptively adjusted according to the first statistical characteristic; or, according to the first statistical characteristic, between a plurality of pre-selected different mapping models Adaptation switches to one of the mapping models, and the mapping model is used as an optimized subspace mapping model.
- the first statistical characteristic in the embodiment of the present invention can select the same statistic when evaluating different models, such as first-order statistic (mean), second-order statistic (variance and correlation coefficient), and high-order statistic ( Higher order moments and their transformations, usually more secondary statistic. More preferably, different statistics can be selected for different mapping models to obtain better results. For example, when evaluating the ICA model, negative entropy is used; when evaluating the PCA model, the covariance matrix is used, that is, the second-order statistic is used as the first statistical property.
- Step 105 The first multi-channel sound signal is mapped to the second multi-channel sound signal by using an optimized subspace mapping model.
- the statistical characteristics of the multi-channel sound signal x i (t, k) can be calculated in different time-frequency sub-bands, and the optimized sub-space mapping model W i (t, k) is estimated, and the estimated mapping model is adopted.
- a multi-channel signal is mapped to a new subspace to obtain a new set of multi-channel signals z i (t, k).
- Step 106 Perceptually encode at least one of the second multi-channel sound signals and the optimized subspace mapping model according to time, frequency and channel, and multiplex into the encoded multi-channel code stream.
- At least one new set of multi-channel signals z i (t, k) and the corresponding mapping model W i (t, k) may be perceptually encoded and multiplexed into an encoded multi-channel code stream.
- the above perceptual coding may specifically be hierarchical perceptual coding.
- the embodiment of the present invention adaptive subspace mapping is adopted, and the statistical characteristics of the multi-channel sound signal are first calculated, thereby estimating the optimized subspace mapping model, and then adopting The above optimized subspace mapping model for multi-channel sound signals The number is mapped and then perceptually encoded. It can be seen from the above that the embodiment of the present invention adaptively selects a mapping model in coding, which can better estimate and utilize the statistical characteristics of signals between channels, and minimize statistical redundancy between channels to achieve higher coding. At the same time of efficiency, the quality of the encoded signal is guaranteed.
- the sound components of some channels are significantly different from the sound components of other channels.
- these channels can be grouped separately, and the above method is adopted, and the optimized mapping model extraction is more accurate. Therefore, when encoding such a multi-channel sound signal, it is also possible to add a step of channel grouping processing to improve encoding efficiency.
- FIG. 2 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention.
- a step of processing a channel group is added.
- the first multi-channel sound signal is mapped to the first frequency domain signal by using time-frequency transform, or the first multi-channel sound signal is mapped to the first sub-band signal by using sub-band filtering.
- Step 202 Divide the first frequency domain signal or the first subband signal into different time frequency subbands.
- the encoded sound signal may be first divided into frames to be encoded, and then subjected to time-frequency transform or sub-band filtering. If a larger frame length is used, one frame of data may be decomposed into multiple subframes, and then time-frequency transform or sub-band filtering may be performed. After the frequency domain or subband signal is obtained, multiple frequency subbands may be formed in frequency order; the frequency domain signals obtained by multiple time-frequency transforms or sub-band filtering may also be formed into a two-dimensional time-frequency plane, where the plane is performed. The time-frequency region division can obtain the time-frequency sub-band to be encoded.
- Step 203 Calculate a second statistical characteristic of the first multi-channel sound signal in each time-frequency sub-band of the different time-frequency sub-bands, and divide the first multi-channel sound signal into multiple according to the second statistical characteristic. Grouped sound signals.
- the statistical characteristics of the multi-channel sound signal x(m, k) can be calculated in different time-frequency sub-bands, and then the multi-channel signal is divided into one according to the statistical characteristics of the sound components of each channel.
- Group or groups of channels, and each group contains at least one channel signal, for one channel grouping, directly performing perceptual encoding, and for more than one channel grouping, performing subsequent processing.
- the second statistical characteristic of the present invention can adopt a first-order statistic (mean) and a second-order statistic (square Differences and correlation coefficients) and higher-order statistics (high-order moments) and their transformations, usually more secondary statistic, especially correlation coefficients.
- the first statistical characteristic may also be used as a criterion for judging the group.
- the second statistical characteristic and the first statistical characteristic may have the same value.
- steps 204 to 207 are performed as each of the packet sound signals as the first multi-channel sound signal.
- Step 204 Calculate a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands.
- Step 205 Estimate the optimized subspace mapping model according to the first statistical characteristic.
- Step 206 Map the first multi-channel sound signal to the second multi-channel sound signal by using an optimized subspace mapping model.
- the optimized subspace mapping model W i (t, k) can be estimated according to the statistical characteristics of the sound components of each channel; and the multi-channel signal is mapped to the new subspace by using the estimated mapping model.
- Step 207 Perceptually encode at least one of the second multi-channel sound signals and the optimized subspace mapping model according to different time, frequency, and channel, and multiplex into the encoded multi-channel code stream.
- At least one new multi-channel signal z i (t, k) and a corresponding mapping model W i (t, k) may be perceptually encoded, and all perceptual coding information may be multiplexed to obtain a coded multi-channel code. flow.
- the second statistical characteristic of the first multi-channel sound signal may be calculated, and then the first multi-channel sound signal is divided into a plurality of grouped sound signals according to the second statistical characteristic, for each The grouped sound signals perform steps 102 to 106 as each of the grouped sound signals as the first multi-channel sound signals.
- FIG. 3 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention.
- a multi-channel sound signal is first grouped and then processed for each packet sound signal. Processing such as time-frequency mapping, the method includes:
- Step 301 Calculate a third statistical characteristic of the first multi-channel sound signal, and divide the first multi-channel sound signal into a plurality of packet sound signals according to the third statistical characteristic.
- the statistical characteristics of the multi-channel sound signal u(m, t) can be calculated, and according to the statistical characteristics, the multi-channel signal is divided into one or more groups of channels, and each group includes at least one channel signal.
- the third statistical characteristic of the present invention may adopt a first-order statistic (mean), a second-order statistic (variance and correlation coefficient), and a high-order statistic (high-order moment) and a transformation thereof, and generally more secondary selection Statistics, especially correlation coefficients.
- each of the packet sound signals is performed as steps 1 to 307 as the first multi-channel sound signal.
- Step 302 The first multi-channel sound signal is mapped to the first frequency domain signal by using a time-frequency transform, or the first multi-channel sound signal is mapped to the first sub-band signal by using sub-band filtering.
- Step 303 dividing the first frequency domain signal or the first sub-band signal into different time-frequency sub-bands.
- the time-frequency transform or sub-band filtering may be used to map the grouped multi-channel time domain signal u(m,t) into a multi-channel frequency domain signal or a sub-band signal x(m,k), and time The frequency mapped signal is divided into different time-frequency sub-bands.
- Step 304 Calculate a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands.
- Step 305 Estimate the optimized subspace mapping model according to the first statistical characteristic.
- an adaptive subspace mapping is used to estimate an optimized subspace mapping model.
- the adaptive subspace mapping is different from the existing multichannel sound encoding method, and an innovative subspace mapping method is adopted.
- the optimized subspace mapping model of multi-channel is estimated.
- the model is an adaptive linear transformation matrix and subspace mapping method, which can adopt the multidimensional spatial statistical analysis method developed in recent years. Such as Independent Components Analysis (ICA), Principal Components Analysis (PCA), Canonical Correlation Analysis, CCA) and Projetion Pursuit.
- ICA Independent Components Analysis
- PCA Principal Components Analysis
- CCA Canonical Correlation Analysis
- Projetion Pursuit Projetion Pursuit.
- the present invention proposes an encoding method that more effectively takes into account the statistical and psychoacoustic characteristics of the channel sound signal, and has proved that the method of the present invention achieves higher coding efficiency and quality than the prior methods.
- Step 306 The first multi-channel sound signal is mapped to the second multi-channel sound signal by using an optimized subspace mapping model.
- the statistical characteristics of the multi-channel sound signal x i (t, k) can be calculated in different time-frequency sub-bands, and the optimized subspace mapping model W i (t, k) can be estimated; using the estimated mapping model, The multi-channel signal is mapped to a new subspace, and a new set of multi-channel signals z i (t, k) is obtained.
- Step 307 Perceptually encode at least one of the second multi-channel sound signals and the optimized subspace mapping model according to different time, frequency, and channel, and multiplex into the encoded multi-channel code stream.
- At least one new multi-channel signal z i (t, k) and the corresponding mapping model W i (t, k) can be perceptually encoded; all perceptual coding information is multiplexed to obtain a coded multi-channel code flow.
- Waveform coding perceptual quantization and Huffman entropy coding used in MP3, AAC, exponential-mantissa coding used in AC-3, perceptual vector quantization coding used in OggVorbis and TwinVQ, etc.
- Parameter coding such as harmonics used in MPEG HILN, independent chord component and noise coding, harmonic vector excitation coding used in MPEG HVXC, code excitation and transform code excitation (TCX) coding in AMR WB+;
- Waveform-parameter mixed coding For example, MP3Pro, AAC+, AMR WB+ and other methods use waveform coding for low frequencies and frequency band extension parameters for high frequencies.
- the adaptive subspace mapping in the embodiment of the present invention is different from any existing method, and the adaptation can be embodied in selecting a mapping model, and adaptively adjusting the mapping coefficient of the model according to statistical characteristics between channels; Adaptive switching between different mapping models according to statistical characteristics between channels, such as switching between ICA mapping method and PCA mapping method.
- the adaptive subspace mapping strategy of the present invention has significant significance for achieving the object of the present invention, that is, to obtain a very high coding efficiency while encoding a multi-channel signal while ensuring the quality of the encoded signal.
- the subspace mapping model can be described as follows:
- A is the current subspace mapping matrix.
- W is a new subspace mapping matrix.
- s, x, z are vectors of de-average scalar random variables.
- the adaptive subspace mapping of the present invention finds an optimized mapping matrix W, so that the new subspace observation vector z obtained by the mapping is optimal, that is, the optimal coding efficiency can be obtained.
- W the new subspace observation vector z obtained by the mapping
- the statistical characteristics of multi-channel signals are time-varying.
- the distribution of different signal components may be Laplacian or Gaussian or other forms.
- different coding rates and coding modes require different mapping matrix performance (such as orthogonality, correlation, etc.).
- ICA independent component analysis model
- PCA principal component analysis model
- each random variable in the sound source vector s is statistically independent of each other, and at most one of them is a Gaussian distribution, and the optimal solution of the mapped observation vector z is the source vector s (or only one ratio different from the source vector s)
- the subspace mapping model is equivalent to the Independent Component Analysis Model (ICA).
- the mapping matrix W can be obtained by maximizing the measure of the non-Gaussian distribution (such as the Kurtosis index, the Negentropy index, etc.).
- the FastICA algorithm can be used to implement fast ICA model mapping, as described below:
- Negative entropy is defined as:
- Ng(y) H(y gauss )-H(y) (4)
- ygauss is a Gaussian random variable with the same variance as y
- H(y) is the differential entropy of random variables:
- Ng(y) ⁇ E[g(y)]-E[g(y gauss )] ⁇ 2 (6)
- E[ ⁇ ] is a mean operation and g( ⁇ ) is a nonlinear function.
- the basic calculation steps are as follows:
- W p E ⁇ zg(W p T z) ⁇ -E ⁇ g'(W p T z) ⁇ W p , g be a nonlinear function
- mapping vector z and the mapping matrix W are obtained.
- the spatial mapping model is equivalent when it is assumed that the random variables in the sound source vector s are statistically independent of each other and both conform to the Gaussian distribution, and the optimal condition of z is to concentrate the subspace vocal tract information on the least channel.
- PCA principal component analysis model
- the mapping matrix W can be obtained by calculating the eigenvalue and the feature vector of the covariance matrix of the observation vector x.
- the PCA model is essentially a commonly used Karhunen-Loeve transform, which can be solved using the singular value decomposition (SVD) method.
- Step one calculating a covariance matrix C of the observation vector x;
- Step two calculating the feature vectors e 1 , e 2 , . . . , e M of the covariance matrix and the eigenvalues ⁇ 1 , ⁇ 2 , . . . , ⁇ M , and classifying the feature values in descending order;
- step 3 the observation vector x is mapped into the space formed by the feature vector, and the mapping vector z is obtained.
- the ICA model is very suitable for blind separation and classification of signal components, which is beneficial for decomposing multi-channel signals into multiple statistical independent channels for encoding, and maximally removing statistical redundancy between channels.
- the mapping matrix vectors in the PCA model are orthogonal, and the multi-channel signal components can be concentrated on as few channels as possible, which is beneficial to reduce the dimension of the encoded signal at a lower code rate.
- Figure 5 is a schematic diagram comparing the characteristics of the PCA model and the ICA model. From the perspective of mapping efficiency, for most occasions, the multi-channel signal component does not satisfy the orthogonal distribution. At this time, the PCA model cannot obtain the highest mapping. effectiveness.
- the ICA model does not require the orthogonality of the signal, and most of the sound signals (including the sub-band sound signals) are consistent with the Laplacian distribution. Therefore, the ICA model often achieves high mapping efficiency.
- the ICA model and the CPA model have different characteristics, but there is great complementarity.
- the following options are made:
- the first, ICA coding mode all using ICA coding
- PCA encoding mode all using PCA encoding
- ICA and PCA hybrid coding mode Dynamically select ICA or PCA coding mode using open-loop or closed-loop search strategy.
- the method in the ICA and PCA hybrid coding mode, can be determined according to the signal-to-noise ratio (SNR) or the masking noise ratio (MNR) of the two coding modes of ICA and PCA at a specific code rate.
- SNR signal-to-noise ratio
- MNR masking noise ratio
- the calculation of SNR and MNR can be performed in a general manner.
- the perceptual coding of the present invention perceptually encodes at least one new set of multi-channel signals and corresponding mapping models.
- the encoded signal component and the corresponding mapping model parameters may be selected based on the current coded target code rate and the perceived importance of the new multi-channel signal.
- the multi-channel signal to be encoded is divided into a plurality of sub-bands along three dimensions of time, frequency and channel.
- known psychoacoustic models such as Johnston model, MPEG Model 1 and Model 2
- the perceptual importance (weight) of each sub-band is calculated separately, and the number of sub-bands to be encoded and the quantization precision are determined.
- mapping the model coding the corresponding mapping matrix/vector may be encoded, or other transformation forms of the model may be encoded, or the statistical feature parameters of the mapping matrix may be directly encoded.
- the invention unifies the selection of the inter-channel subspace mapping model, the parameter calculation and encoding of the mapping matrix, and the perceptual coding of the sub-band (ie time-frequency-channel) into a Rate Distortion Theory coding framework; High-efficiency coding of multi-channel signals is achieved according to constraints such as coding rate, psychoacoustic masking effect, binaural auditory effect, and the like.
- time-frequency sub-band division is a schematic diagram of time-frequency sub-band division.
- the time-frequency-channel is divided into multiple time-frequency sub-bands, which are assumed to be within the time-frequency sub-band (t, k).
- the subspace mapping model is T(t, k), which can be selected among K models T 1 , T 2 , ..., T K , such as including ICA model and PCA model; mapping matrix is W(t, k), can Estimation of statistical parameters between channels (such as ICA and PCA methods); perceptually encoded subband signals are x(t, k, m), ie subband signals x(t, k) in channel m;
- the signal masking ratio SMR(t,k,m) with signal can be calculated by psychoacoustic model; the target bit number is B bit; using MNR(t,k,m) as the distortion evaluation standard, the following encoding can be used.
- the given stator carries the signals z(t, k, m), SMR(t, k, m) and the number of target bits B.
- K mapping models select a model that maximizes MNR(t, k, m), and Coding model number T(t, k), mapping matrix W(t, k) and new subband signal z(t, k, m).
- the adaptive subspace mapping and perceptual coding of the present invention cooperate to achieve adaptive coding under different coding target conditions.
- the mapping method of independent component analysis can not only encode high-quality sound signals, but also eliminate noise; and when the code rate is low, the mapping of principal component analysis
- the method may be more suitable for encoding complex sound signals.
- the adaptive subspace mapping and perceptual coding method of the present invention can also provide scalable coding, that is, the multi-channel sound signal is encoded only once, and a sound code stream is obtained, thereby providing transmission and decoding of multiple code rates and quality. This supports different application needs of multiple types of users.
- the perceptual coding module can be further broken down into the following steps:
- Step 1 selecting at least one set of signals and a corresponding mapping model to perform perceptual coding, and the code rate of the partial code stream is not higher than a base layer code rate constraint;
- Step 2 selecting a second important at least one set of signals and a corresponding mapping model, performing perceptual coding, and the code rate of the partial code stream is not higher than the first enhancement layer code rate constraint;
- Step 3 selecting a third important at least one group of signals and a corresponding mapping model, performing perceptual coding, and the code rate of the partial code stream is not higher than the second enhancement layer code rate constraint;
- Step four and so on, until lossless coding is achieved, and an N-layer code stream is obtained.
- step five all N layers of code streams are multiplexed into one compressed stream.
- the compressed stream recombined from the scalable code stream according to the service request shall include at least the base layer code stream, and at a higher code rate, the enhancement layer code stream may be multiplexed in order of importance.
- FIG. 7 is a flowchart of a method for decoding a multi-channel sound signal according to an embodiment of the present invention, where the method includes:
- Step 701 Decode the encoded multi-channel code stream to obtain at least one of the second multi-channel sound signals and an optimized subspace mapping model.
- Step 702 Map the second multi-channel sound signal back to the first multi-channel sound signal by using an optimized subspace mapping model.
- Step 703 using an inverse time-frequency transform, mapping the first multi-channel sound signal from the frequency domain to the time domain, or using inverse sub-band filtering to map the first multi-channel sound signal from the sub-band domain to the time domain.
- the method further includes: grouping the plurality of grouped sound signals into a group to obtain a third multi-channel sound signal, which is to be the third The channel sound signal is performed as step 703 as the first multi-channel sound signal.
- the method when the first multi-channel sound signal is a plurality of packet sound signals in the time domain, after step 703, the method further includes: grouping the plurality of packet sound signals into a group to obtain the fourth Channel sound signal.
- the method may further include: performing demultiplexing processing on the encoded multi-channel code stream to obtain a plurality of layered code streams; performing step 701 as each of the layered code streams as the encoded multi-channel code stream; After step 701 is performed on all the layered code streams, step 702 and step 703 are uniformly performed.
- FIG. 8 is a schematic structural diagram of a multi-channel sound signal encoding apparatus according to an embodiment of the present invention, the apparatus includes:
- the time-frequency mapping unit 801 is configured to map the first multi-channel sound signal into the first frequency domain signal by using time-frequency transform, or map the first multi-channel sound signal into the first sub-band signal by using sub-band filtering. Dividing the first frequency domain signal or the first subband signal into different time-frequency sub-bands;
- the adaptive subspace mapping unit 802 is configured to calculate, in each time-frequency subband of the different time-frequency subbands divided by the time-frequency mapping unit 801, a first statistical characteristic of the first multi-channel sound signal. And estimating an optimized subspace mapping model according to the first statistical characteristic; and using the optimized subspace mapping model to map the first multichannel sound signal into a second multichannel sound signal;
- a perceptual coding unit 803 configured to: the adaptive sub-function according to time, frequency, and channel At least one of the second multi-channel sound signals mapped by the spatial mapping unit 802 and the optimized subspace mapping model are perceptually encoded and multiplexed into an encoded multi-channel code stream.
- the method further includes:
- a first channel grouping unit configured to calculate, in the each time-frequency subband of the different time-frequency subbands, the first statistical characteristic of the first multi-channel sound signal in the adaptive subspace mapping unit 802
- the second statistical characteristic of the first multi-channel sound signal is calculated in each time-frequency sub-band of the different time-frequency sub-bands divided by the time-frequency mapping unit; according to the second statistical characteristic, Dividing the first multi-channel sound signal into a plurality of packet sound signals;
- the adaptive subspace mapping unit 802 and the perceptual coding unit 803 are specifically configured to: use, as the first, each packet sound signal for each packet sound signal divided by the first channel grouping unit. Multi-channel sound signals are processed.
- the method further includes:
- a second channel grouping unit configured to: at the time-frequency mapping unit 801, use a time-frequency transform to map the first multi-channel sound signal into a first frequency domain signal, or adopt sub-band filtering to use the first multi-channel Calculating a third statistical characteristic of the first multi-channel sound signal before the sound signal is mapped to the first sub-band signal; dividing the first multi-channel sound signal into a plurality of groups according to the third statistical characteristic Sound signal
- the time-frequency mapping unit 801, the adaptive sub-space mapping unit 802, and the perceptual coding unit 803 are specifically configured to: each of the packet sound signals divided by the second channel grouping unit The packet sound signal is processed as the first multi-channel sound signal.
- the adaptive subspace mapping unit 802 is specifically configured to calculate the first multichannel in each time-frequency subband of the different time-frequency subbands divided by the time-frequency mapping unit 801. a first statistical characteristic of the sound signal; selecting an optimized subspace mapping model, adaptively adjusting a mapping coefficient of the optimized subspace mapping model according to the first statistical characteristic; using the optimized subspace mapping model, The first multi-channel sound signal is mapped to a second multi-channel sound signal.
- the adaptive subspace mapping unit 802 is specifically configured to: Calculating a first statistical characteristic of the first multi-channel sound signal in each time-frequency sub-band of the different time-frequency sub-bands divided by the element 801; and pre-selecting the plurality according to the first statistical characteristic Adaptively switching between different mapping models to one of the mapping models, the mapping model is used as an optimized subspace mapping model; and the optimized multi-space mapping model is used to map the first multi-channel sound signal into a second multi-channel Sound signal.
- the perceptual coding in the perceptual coding unit 803 is specifically hierarchical perceptual coding.
- FIG. 9 is a schematic structural diagram of a multi-channel sound signal decoding apparatus according to an embodiment of the present invention, the apparatus includes:
- the perceptual decoding unit 901 is configured to decode the encoded multi-channel code stream, obtain at least one of the second multi-channel sound signals, and optimize the subspace mapping model;
- the sub-space inverse mapping unit 902 is configured to map the second multi-channel sound signal obtained by the perceptual decoding unit 901 back to the first multi-channel sound signal by using the optimized sub-space mapping model obtained by the perceptual decoding unit 901;
- the frequency time mapping unit 903 is configured to use the inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 from the frequency domain to the time domain, or adopt inverse sub-band filtering.
- the first multi-channel sound signal is mapped from a sub-band domain to a time domain.
- the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 is a plurality of packet sound signals, and the device further includes:
- a first packet restoring unit configured to: in the frequency time mapping unit 903, use an inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 from a frequency domain to a time domain, or Using the inverse subband filtering, before the first multi-channel sound signal is mapped from the sub-band domain to the time domain, the plurality of packet sound signals are group-recovered to obtain a third multi-channel sound signal;
- the frequency time mapping unit 903 is specifically configured to process the third multi-channel sound signal obtained by the first packet restoration unit as the first multi-channel sound signal.
- the first multi-channel sound signal after the mapping process is performed by the time-frequency mapping unit 903 is a plurality of packet sound signals in the time domain, and the device further includes:
- a second packet restoring unit configured to: at the time-frequency mapping unit 903, use an inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 from a frequency domain to a time domain, or After the first multi-channel sound signal is mapped from the sub-band domain to the time domain by inverse sub-band filtering, the plurality of packet sound signals are group-recovered to obtain a fourth multi-channel sound signal.
- the device further comprises:
- a demultiplexing unit configured to decode the encoded multi-channel code stream by the perceptual decoding unit 901, obtain at least one of the second multi-channel sound signals, and optimize the sub-space mapping model before encoding the multi-channel code
- the stream is subjected to demultiplexing processing to obtain a plurality of layered code streams
- the perceptual decoding unit 901, the sub-space inverse mapping unit 902, and the frequency-time mapping unit 903 are specifically configured to: use each layered code stream obtained by the demultiplexing unit as an encoded multi-channel code stream. deal with.
- the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both.
- the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Abstract
L'invention concerne un procédé de codage de signal acoustique multicanal, un procédé et un dispositif de décodage destinés à sélectionner de manière auto-adaptative un modèle de mappage lors d'un codage, en vue de permettre un rendement de codage et une qualité de codage supérieurs, le procédé de codage consistant à : employer une conversion de fréquence temporelle pour mapper un premier signal acoustique multicanal en tant que premier signal de domaine de fréquence, ou employer un filtre de sous-bande pour mapper le premier signal acoustique multicanal en tant que premier signal de sous-bande (101) ; diviser le premier signal de domaine de fréquence ou le premier signal de sous-bande en différentes sous-bandes de fréquence temporelle (102) ; calculer une première caractéristique statistique du premier signal acoustique multicanal dans chaque sous-bande de fréquence temporelle (103) ; estimer un modèle de mappage de sous-espace optimisé selon la première caractéristique statistique (104) ; utiliser le modèle de mappage de sous-espace optimisé pour mapper le premier signal acoustique multicanal en tant que second signal acoustique multicanal (105) ; et, en fonction du temps, de la fréquence et du canal, effectuer un codage perceptuel sur le modèle de mappage de sous-espace optimisé et au moins un groupe dans le second signal acoustique multicanal pour obtenir un flux de code multicanal de codage (106).
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410395806.5 | 2014-08-12 | ||
CN201410395806.5A CN105336333B (zh) | 2014-08-12 | 2014-08-12 | 多声道声音信号编码方法、解码方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016023323A1 true WO2016023323A1 (fr) | 2016-02-18 |
Family
ID=55286819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/095396 WO2016023323A1 (fr) | 2014-08-12 | 2014-12-29 | Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105336333B (fr) |
WO (1) | WO2016023323A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111599375A (zh) * | 2020-04-26 | 2020-08-28 | 云知声智能科技股份有限公司 | 一种语音交互中多路语音的白化方法及其装置 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108461086B (zh) * | 2016-12-13 | 2020-05-15 | 北京唱吧科技股份有限公司 | 一种音频的实时切换方法和装置 |
CN108206022B (zh) * | 2016-12-16 | 2020-12-18 | 南京青衿信息科技有限公司 | 利用aes/ebu信道传输三维声信号的编解码器及其编解码方法 |
CN115132214A (zh) | 2018-06-29 | 2022-09-30 | 华为技术有限公司 | 立体声信号的编码、解码方法、编码装置和解码装置 |
TWI692719B (zh) * | 2019-03-21 | 2020-05-01 | 瑞昱半導體股份有限公司 | 音訊處理方法與音訊處理系統 |
CN111682881B (zh) * | 2020-06-17 | 2021-12-24 | 北京润科通用技术有限公司 | 一种适用于多用户信号的通信侦察仿真方法及系统 |
CN113873420B (zh) * | 2021-09-28 | 2023-06-23 | 联想(北京)有限公司 | 音频数据处理方法及装置 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007003702A (ja) * | 2005-06-22 | 2007-01-11 | Ntt Docomo Inc | 雑音除去装置、通信端末、及び、雑音除去方法 |
CN101401152A (zh) * | 2006-03-15 | 2009-04-01 | 法国电信公司 | 通过多通道音频信号的主分量分析进行编码的设备和方法 |
CN101667425A (zh) * | 2009-09-22 | 2010-03-10 | 山东大学 | 一种对卷积混叠语音信号进行盲源分离的方法 |
CN103077709A (zh) * | 2012-12-28 | 2013-05-01 | 中国科学院声学研究所 | 一种基于共有鉴别性子空间映射的语种识别方法及装置 |
CN103366751A (zh) * | 2012-03-28 | 2013-10-23 | 北京天籁传音数字技术有限公司 | 一种声音编解码装置及其方法 |
CN103366749A (zh) * | 2012-03-28 | 2013-10-23 | 北京天籁传音数字技术有限公司 | 一种声音编解码装置及其方法 |
WO2014062304A2 (fr) * | 2012-10-18 | 2014-04-24 | Google Inc. | Décorrélation hiérarchique d'un signal audio multicanal |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7240001B2 (en) * | 2001-12-14 | 2007-07-03 | Microsoft Corporation | Quality improvement techniques in an audio encoder |
KR100981694B1 (ko) * | 2002-04-10 | 2010-09-13 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 스테레오 신호들의 코딩 |
EP1691348A1 (fr) * | 2005-02-14 | 2006-08-16 | Ecole Polytechnique Federale De Lausanne | Codage paramétrique combiné de sources audio |
CN101490744B (zh) * | 2006-11-24 | 2013-07-17 | Lg电子株式会社 | 用于编码和解码基于对象的音频信号的方法和装置 |
EP2375410B1 (fr) * | 2010-03-29 | 2017-11-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Processeur audio spatial et procédé de fourniture de paramètres spatiaux basée sur un signal d'entrée acoustique |
CN102682779B (zh) * | 2012-06-06 | 2013-07-24 | 武汉大学 | 面向3d音频的双声道编解码方法和编解码器 |
-
2014
- 2014-08-12 CN CN201410395806.5A patent/CN105336333B/zh active Active
- 2014-12-29 WO PCT/CN2014/095396 patent/WO2016023323A1/fr active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007003702A (ja) * | 2005-06-22 | 2007-01-11 | Ntt Docomo Inc | 雑音除去装置、通信端末、及び、雑音除去方法 |
CN101401152A (zh) * | 2006-03-15 | 2009-04-01 | 法国电信公司 | 通过多通道音频信号的主分量分析进行编码的设备和方法 |
CN101667425A (zh) * | 2009-09-22 | 2010-03-10 | 山东大学 | 一种对卷积混叠语音信号进行盲源分离的方法 |
CN103366751A (zh) * | 2012-03-28 | 2013-10-23 | 北京天籁传音数字技术有限公司 | 一种声音编解码装置及其方法 |
CN103366749A (zh) * | 2012-03-28 | 2013-10-23 | 北京天籁传音数字技术有限公司 | 一种声音编解码装置及其方法 |
WO2014062304A2 (fr) * | 2012-10-18 | 2014-04-24 | Google Inc. | Décorrélation hiérarchique d'un signal audio multicanal |
CN103077709A (zh) * | 2012-12-28 | 2013-05-01 | 中国科学院声学研究所 | 一种基于共有鉴别性子空间映射的语种识别方法及装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111599375A (zh) * | 2020-04-26 | 2020-08-28 | 云知声智能科技股份有限公司 | 一种语音交互中多路语音的白化方法及其装置 |
CN111599375B (zh) * | 2020-04-26 | 2023-03-21 | 云知声智能科技股份有限公司 | 一种语音交互中多路语音的白化方法及其装置 |
Also Published As
Publication number | Publication date |
---|---|
CN105336333A (zh) | 2016-02-17 |
CN105336333B (zh) | 2019-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11735192B2 (en) | Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework | |
WO2016023323A1 (fr) | Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage | |
JP6641018B2 (ja) | チャネル間時間差を推定する装置及び方法 | |
TWI669705B (zh) | 用以使用側邊增益及殘餘增益編碼或解碼多通道信號之設備及方法 | |
TWI397903B (zh) | 編碼音訊之節約音量測量技術 | |
RU2645271C2 (ru) | Стереофонический кодер и декодер аудиосигналов | |
RU2369918C2 (ru) | Многоканальное восстановление на основе множественной параметризации | |
US9830918B2 (en) | Enhanced soundfield coding using parametric component generation | |
EP3776541B1 (fr) | Appareil, procédé ou programme d'ordinateur pour estimer une différence de temps entre canaux | |
WO2007026821A1 (fr) | Dispositif de conformage d’énergie et procédé de conformage d’énergie | |
CA3017405C (fr) | Appareil de codage permettant de traiter un signal d'entree et appareil de decodage permettant de traiter un signal code | |
WO2017206794A1 (fr) | Procédé et dispositif d'extraction de paramètre de déphasage inter-canaux | |
US9373337B2 (en) | Reconstruction of a high-frequency range in low-bitrate audio coding using predictive pattern analysis | |
CN109285553A (zh) | 对高阶高保真立体声信号应用动态范围压缩的方法和设备 | |
US9848272B2 (en) | Decorrelator structure for parametric reconstruction of audio signals | |
Hu et al. | Multi-step coding structure of spatial audio object coding | |
WO2016023322A1 (fr) | Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage | |
US10950251B2 (en) | Coding of harmonic signals in transform-based audio codecs | |
Gorlow et al. | Informed separation of spatial images of stereo music recordings using second-order statistics | |
Gorlow et al. | Reverse engineering stereo music recordings pursuing an informed two-stage approach | |
Wang et al. | Critical band subspace-based speech enhancement using SNR and auditory masking aware technique | |
Gorlow et al. | Informed separation of spatial images of stereo music recordings using low-order statistics | |
Cantzos | Statistical enhancement methods for immersive audio environments and compressed audio | |
Zhu et al. | Fast convolution for binaural rendering based on HRTF spectrum |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14899905 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14899905 Country of ref document: EP Kind code of ref document: A1 |