WO2016023323A1 - Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage - Google Patents

Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage Download PDF

Info

Publication number
WO2016023323A1
WO2016023323A1 PCT/CN2014/095396 CN2014095396W WO2016023323A1 WO 2016023323 A1 WO2016023323 A1 WO 2016023323A1 CN 2014095396 W CN2014095396 W CN 2014095396W WO 2016023323 A1 WO2016023323 A1 WO 2016023323A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound signal
channel
frequency
time
channel sound
Prior art date
Application number
PCT/CN2014/095396
Other languages
English (en)
Chinese (zh)
Inventor
潘兴德
Original Assignee
北京天籁传音数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京天籁传音数字技术有限公司 filed Critical 北京天籁传音数字技术有限公司
Publication of WO2016023323A1 publication Critical patent/WO2016023323A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Definitions

  • the present invention relates to the field of audio processing technologies, and in particular, to a multi-channel sound signal encoding method, a decoding method, and a device.
  • multi-channel sound signals are now played to the user by multiple channels, and the encoding method of the multi-channel sound signals is also represented by AC-3 and MP3 and poor stereo
  • M/ Wave coding techniques such as S Stereo and Intensity Stereo have evolved to Parametric Stereo and Parametric Surround, represented by MP3 Pro, ITU EAAC+, MPEG Surround, and Dolby DD+.
  • PS including Parametric Stereo and Parametric Surround
  • takes advantage of psychoacoustics such as binaural time/phase difference (ITD/IPD), binaural intensity difference (IID), and binaural correlation (IC) from the perspective of binaural psychoacoustics. Spatial characteristics to achieve parameter encoding of multi-channel sound signals.
  • the PS technology generally downmixes the multi-channel sound signal at the encoding end to generate one sum channel signal, and the channel signal is waveform coded (or waveform and parameter mixed code, such as EAAC+), and each channel Parameter encoding is performed for the ITD/IPD, IID, and IC parameters of the channel signals.
  • the multi-channel signal is recovered from the sum channel signal. It is also possible to group multi-channel signals at the time of encoding and to adopt the above PS codec method in different channel groups. It is also possible to perform multi-stage PS encoding on multiple channels in a cascade manner.
  • both the traditional PS technology and the MPEG Surround technology rely too much on the psychoacoustic properties of both ears, ignoring the statistical properties of the multi-channel sound signal itself.
  • neither the traditional PS technology nor the MPEG Surround technology utilizes statistical redundancy information between pairs of channels.
  • MPEG Surround uses residual information coding, there is still statistical redundancy between the channel signal and the residual channel signal, so that the coding efficiency and the quality of the coded signal cannot be balanced.
  • the invention provides a multi-channel sound signal encoding method, a decoding method and a device, aiming at solving the prior art multi-channel sound signal encoding method, which has statistical redundancy and cannot balance the encoding efficiency and the quality of the encoded signal. The problem.
  • the present invention provides a multi-channel sound signal encoding method, the method comprising: A) mapping a first multi-channel sound signal into a first frequency domain signal by using a time-frequency transform, Or sub-band filtering is used to map the first multi-channel sound signal into a first sub-band signal; B) dividing the first frequency domain signal or the first sub-band signal into different time-frequency sub-bands; Calculating a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands; D) estimating an optimized sub-space mapping model according to the first statistical characteristic E) mapping the first multi-channel sound signal to a second multi-channel sound signal using the optimized subspace mapping model; F) the second plurality according to time, frequency and channel At least one of the channel sound signals A set and the optimized subspace mapping model are perceptually encoded and multiplexed into a coded multi-channel code stream.
  • the present invention provides a multi-channel sound signal encoding apparatus, the apparatus comprising: a time-frequency mapping unit, configured to map a first multi-channel sound signal into a first frequency domain signal by using a time-frequency transform, Or sub-band filtering, mapping the first multi-channel sound signal into a first sub-band signal; dividing the first frequency domain signal or the first sub-band signal into different time-frequency sub-bands; adaptive subspace a mapping unit, configured to calculate, in each time-frequency subband of the different time-frequency subbands divided by the time-frequency mapping unit, a first statistical characteristic of the first multi-channel sound signal; a statistical characteristic, an optimized subspace mapping model; the optimized multi-space sound model is used to map the first multi-channel sound signal into a second multi-channel sound signal; the perceptual coding unit is configured to use time, frequency, and Depending on the channel, at least one of the second multi-channel sound signals mapped by the adaptive subspace mapping unit and the optimized subspace mapping model are perceptually
  • the present invention provides a multi-channel sound signal decoding method, the method comprising: A) decoding an encoded multi-channel code stream to obtain at least one of a second multi-channel sound signal and an optimizer a spatial mapping model; B) using the optimized subspace mapping model to map the second multi-channel sound signal back to the first multi-channel sound signal; C) using an inverse time-frequency transform, the first plurality The channel sound signal is mapped from the frequency domain to the time domain, or inverse subband filtering is used to map the first multichannel sound signal from the subband domain to the time domain.
  • the present invention provides a multi-channel sound signal decoding apparatus, the apparatus comprising: a perceptual decoding unit, configured to decode an encoded multi-channel code stream to obtain at least one of a second multi-channel sound signal And the subspace inverse mapping unit is configured to: use the optimized subspace mapping model obtained by the perceptual decoding unit to map the second multichannel sound signal obtained by the perceptual decoding unit back to the first a channel sound signal; a frequency time mapping unit, configured to use an inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit from a frequency domain to a time domain, or an inverse sub-band Filtering, mapping the first multi-channel sound signal from a sub-band domain to a time domain.
  • an adaptive subspace mapping is adopted, and the optimized subspace mapping model is estimated by calculating the statistical characteristics of the multichannel sound signal, and then the optimized subspace mapping model is adopted. , mapping the multi-channel sound signal, and then performing perceptual coding.
  • the embodiment of the present invention adaptively selects a mapping model in coding, which can better estimate and utilize the statistical characteristics of signals between channels, and minimize statistical redundancy between channels to achieve higher coding. At the same time of efficiency, the quality of the encoded signal is guaranteed.
  • FIG. 1 is a flow chart of a method for encoding a multi-channel sound signal according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention
  • FIG. 3 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a subspace mapping relationship in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram showing a comparison between a PCA model and an ICA model according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of time-frequency subband division in an embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for decoding a multi-channel sound signal according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a multi-channel sound signal encoding apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a multi-channel sound signal decoding apparatus according to an embodiment of the present invention.
  • the multi-channel sound signal encoding method in the embodiment of the present invention which is different from other methods in the prior art, fully utilizes the statistical characteristics and psychoacoustic characteristics of the multi-channel sound signal, and obtains extremely high encoding efficiency while obtaining extremely high encoding efficiency.
  • the adaptive subspace mapping method is adopted to minimize the statistical redundancy between the multi-channel signals, creatively use a variety of subspace mapping models, and adaptively select the mapping model in the encoding, Better estimation and utilization of statistical characteristics of inter-channel signals, and minimize statistical redundancy between channels for higher coding efficiency.
  • FIG. 1 is a flowchart of a method for encoding a multi-channel sound signal according to an embodiment of the present invention, where the method includes:
  • step 101 the first multi-channel sound signal is mapped to the first frequency domain signal by using time-frequency transform, or the first multi-channel sound signal is mapped to the first sub-band signal by using sub-band filtering.
  • the first multi-channel sound signal is initially represented by a time domain signal u(m, t), and the multi-channel frequency domain signal or the sub-band signal x(m, k) can be obtained by the above mapping process.
  • m is the channel number
  • t is the frame (or subframe) number
  • k is the frequency or subband number.
  • the time-frequency transform can adopt the commonly used modified cosine transform (MDCT), cosine transform (DCT), and Fourier transform (FFT) time-frequency transform technology
  • the sub-band filter can adopt the more commonly used positive Cross-view filter bank (QMF ⁇ PQMF ⁇ CQMF), cosine-modulated filter bank (CMF/MLT) technology
  • time-frequency transform can also use multi-resolution analysis techniques such as wavelet transform
  • the mapping can take one of the above three mapping methods (such as AC-3, AAC) or a combination (such as MP3, Bell Lab PAC).
  • Step 102 Divide the first frequency domain signal or the first subband signal into different time frequency subbands.
  • the encoded sound signal may be first divided into frames to be encoded, and then subjected to time-frequency transform or sub-band filtering. If a larger frame length is used, one frame of data may be decomposed into multiple subframes, and then time-frequency transform or sub-band filtering may be performed. After the frequency domain or subband signal is obtained, multiple frequency subbands may be formed in frequency order; the frequency domain signals obtained by multiple time-frequency transforms or sub-band filtering may also be formed into a two-dimensional time-frequency plane, where the plane is performed.
  • the time-frequency region is divided; further, the time-frequency region is projected on each channel time-frequency plane, and the time-frequency sub-band x i (t, k) to be encoded can be obtained, where i is the sequence number of the time-frequency sub-band, t is Frame (or subframe) sequence number.
  • the signal range in the time-frequency subband x i (t, k) is: t i-1 ⁇ t ⁇ t i , k i-1 ⁇ k ⁇ k i , t I-1 and t i are the start and end frame (or subframe) numbers of the subband, and k i-1 and k i are the start and end frequencies or subband numbers of the subband. If the total number of time-frequency sub-bands is N, then i ⁇ N.
  • the area of a time-frequency subband can be represented by (t, k).
  • each time-frequency sub-band includes a signal projected by each channel in the time-frequency region.
  • x i (t, k, m) can be used. ) said.
  • Step 103 Calculate a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands.
  • Step 104 Estimate the optimized subspace mapping model according to the first statistical characteristic.
  • an optimized subspace mapping model may be selected, and the mapping coefficient of the optimized subspace mapping model may be adaptively adjusted according to the first statistical characteristic; or, according to the first statistical characteristic, between a plurality of pre-selected different mapping models Adaptation switches to one of the mapping models, and the mapping model is used as an optimized subspace mapping model.
  • the first statistical characteristic in the embodiment of the present invention can select the same statistic when evaluating different models, such as first-order statistic (mean), second-order statistic (variance and correlation coefficient), and high-order statistic ( Higher order moments and their transformations, usually more secondary statistic. More preferably, different statistics can be selected for different mapping models to obtain better results. For example, when evaluating the ICA model, negative entropy is used; when evaluating the PCA model, the covariance matrix is used, that is, the second-order statistic is used as the first statistical property.
  • Step 105 The first multi-channel sound signal is mapped to the second multi-channel sound signal by using an optimized subspace mapping model.
  • the statistical characteristics of the multi-channel sound signal x i (t, k) can be calculated in different time-frequency sub-bands, and the optimized sub-space mapping model W i (t, k) is estimated, and the estimated mapping model is adopted.
  • a multi-channel signal is mapped to a new subspace to obtain a new set of multi-channel signals z i (t, k).
  • Step 106 Perceptually encode at least one of the second multi-channel sound signals and the optimized subspace mapping model according to time, frequency and channel, and multiplex into the encoded multi-channel code stream.
  • At least one new set of multi-channel signals z i (t, k) and the corresponding mapping model W i (t, k) may be perceptually encoded and multiplexed into an encoded multi-channel code stream.
  • the above perceptual coding may specifically be hierarchical perceptual coding.
  • the embodiment of the present invention adaptive subspace mapping is adopted, and the statistical characteristics of the multi-channel sound signal are first calculated, thereby estimating the optimized subspace mapping model, and then adopting The above optimized subspace mapping model for multi-channel sound signals The number is mapped and then perceptually encoded. It can be seen from the above that the embodiment of the present invention adaptively selects a mapping model in coding, which can better estimate and utilize the statistical characteristics of signals between channels, and minimize statistical redundancy between channels to achieve higher coding. At the same time of efficiency, the quality of the encoded signal is guaranteed.
  • the sound components of some channels are significantly different from the sound components of other channels.
  • these channels can be grouped separately, and the above method is adopted, and the optimized mapping model extraction is more accurate. Therefore, when encoding such a multi-channel sound signal, it is also possible to add a step of channel grouping processing to improve encoding efficiency.
  • FIG. 2 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention.
  • a step of processing a channel group is added.
  • the first multi-channel sound signal is mapped to the first frequency domain signal by using time-frequency transform, or the first multi-channel sound signal is mapped to the first sub-band signal by using sub-band filtering.
  • Step 202 Divide the first frequency domain signal or the first subband signal into different time frequency subbands.
  • the encoded sound signal may be first divided into frames to be encoded, and then subjected to time-frequency transform or sub-band filtering. If a larger frame length is used, one frame of data may be decomposed into multiple subframes, and then time-frequency transform or sub-band filtering may be performed. After the frequency domain or subband signal is obtained, multiple frequency subbands may be formed in frequency order; the frequency domain signals obtained by multiple time-frequency transforms or sub-band filtering may also be formed into a two-dimensional time-frequency plane, where the plane is performed. The time-frequency region division can obtain the time-frequency sub-band to be encoded.
  • Step 203 Calculate a second statistical characteristic of the first multi-channel sound signal in each time-frequency sub-band of the different time-frequency sub-bands, and divide the first multi-channel sound signal into multiple according to the second statistical characteristic. Grouped sound signals.
  • the statistical characteristics of the multi-channel sound signal x(m, k) can be calculated in different time-frequency sub-bands, and then the multi-channel signal is divided into one according to the statistical characteristics of the sound components of each channel.
  • Group or groups of channels, and each group contains at least one channel signal, for one channel grouping, directly performing perceptual encoding, and for more than one channel grouping, performing subsequent processing.
  • the second statistical characteristic of the present invention can adopt a first-order statistic (mean) and a second-order statistic (square Differences and correlation coefficients) and higher-order statistics (high-order moments) and their transformations, usually more secondary statistic, especially correlation coefficients.
  • the first statistical characteristic may also be used as a criterion for judging the group.
  • the second statistical characteristic and the first statistical characteristic may have the same value.
  • steps 204 to 207 are performed as each of the packet sound signals as the first multi-channel sound signal.
  • Step 204 Calculate a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands.
  • Step 205 Estimate the optimized subspace mapping model according to the first statistical characteristic.
  • Step 206 Map the first multi-channel sound signal to the second multi-channel sound signal by using an optimized subspace mapping model.
  • the optimized subspace mapping model W i (t, k) can be estimated according to the statistical characteristics of the sound components of each channel; and the multi-channel signal is mapped to the new subspace by using the estimated mapping model.
  • Step 207 Perceptually encode at least one of the second multi-channel sound signals and the optimized subspace mapping model according to different time, frequency, and channel, and multiplex into the encoded multi-channel code stream.
  • At least one new multi-channel signal z i (t, k) and a corresponding mapping model W i (t, k) may be perceptually encoded, and all perceptual coding information may be multiplexed to obtain a coded multi-channel code. flow.
  • the second statistical characteristic of the first multi-channel sound signal may be calculated, and then the first multi-channel sound signal is divided into a plurality of grouped sound signals according to the second statistical characteristic, for each The grouped sound signals perform steps 102 to 106 as each of the grouped sound signals as the first multi-channel sound signals.
  • FIG. 3 is a flowchart of a method for encoding a multi-channel sound signal according to another embodiment of the present invention.
  • a multi-channel sound signal is first grouped and then processed for each packet sound signal. Processing such as time-frequency mapping, the method includes:
  • Step 301 Calculate a third statistical characteristic of the first multi-channel sound signal, and divide the first multi-channel sound signal into a plurality of packet sound signals according to the third statistical characteristic.
  • the statistical characteristics of the multi-channel sound signal u(m, t) can be calculated, and according to the statistical characteristics, the multi-channel signal is divided into one or more groups of channels, and each group includes at least one channel signal.
  • the third statistical characteristic of the present invention may adopt a first-order statistic (mean), a second-order statistic (variance and correlation coefficient), and a high-order statistic (high-order moment) and a transformation thereof, and generally more secondary selection Statistics, especially correlation coefficients.
  • each of the packet sound signals is performed as steps 1 to 307 as the first multi-channel sound signal.
  • Step 302 The first multi-channel sound signal is mapped to the first frequency domain signal by using a time-frequency transform, or the first multi-channel sound signal is mapped to the first sub-band signal by using sub-band filtering.
  • Step 303 dividing the first frequency domain signal or the first sub-band signal into different time-frequency sub-bands.
  • the time-frequency transform or sub-band filtering may be used to map the grouped multi-channel time domain signal u(m,t) into a multi-channel frequency domain signal or a sub-band signal x(m,k), and time The frequency mapped signal is divided into different time-frequency sub-bands.
  • Step 304 Calculate a first statistical characteristic of the first multi-channel sound signal in each of the different time-frequency sub-bands.
  • Step 305 Estimate the optimized subspace mapping model according to the first statistical characteristic.
  • an adaptive subspace mapping is used to estimate an optimized subspace mapping model.
  • the adaptive subspace mapping is different from the existing multichannel sound encoding method, and an innovative subspace mapping method is adopted.
  • the optimized subspace mapping model of multi-channel is estimated.
  • the model is an adaptive linear transformation matrix and subspace mapping method, which can adopt the multidimensional spatial statistical analysis method developed in recent years. Such as Independent Components Analysis (ICA), Principal Components Analysis (PCA), Canonical Correlation Analysis, CCA) and Projetion Pursuit.
  • ICA Independent Components Analysis
  • PCA Principal Components Analysis
  • CCA Canonical Correlation Analysis
  • Projetion Pursuit Projetion Pursuit.
  • the present invention proposes an encoding method that more effectively takes into account the statistical and psychoacoustic characteristics of the channel sound signal, and has proved that the method of the present invention achieves higher coding efficiency and quality than the prior methods.
  • Step 306 The first multi-channel sound signal is mapped to the second multi-channel sound signal by using an optimized subspace mapping model.
  • the statistical characteristics of the multi-channel sound signal x i (t, k) can be calculated in different time-frequency sub-bands, and the optimized subspace mapping model W i (t, k) can be estimated; using the estimated mapping model, The multi-channel signal is mapped to a new subspace, and a new set of multi-channel signals z i (t, k) is obtained.
  • Step 307 Perceptually encode at least one of the second multi-channel sound signals and the optimized subspace mapping model according to different time, frequency, and channel, and multiplex into the encoded multi-channel code stream.
  • At least one new multi-channel signal z i (t, k) and the corresponding mapping model W i (t, k) can be perceptually encoded; all perceptual coding information is multiplexed to obtain a coded multi-channel code flow.
  • Waveform coding perceptual quantization and Huffman entropy coding used in MP3, AAC, exponential-mantissa coding used in AC-3, perceptual vector quantization coding used in OggVorbis and TwinVQ, etc.
  • Parameter coding such as harmonics used in MPEG HILN, independent chord component and noise coding, harmonic vector excitation coding used in MPEG HVXC, code excitation and transform code excitation (TCX) coding in AMR WB+;
  • Waveform-parameter mixed coding For example, MP3Pro, AAC+, AMR WB+ and other methods use waveform coding for low frequencies and frequency band extension parameters for high frequencies.
  • the adaptive subspace mapping in the embodiment of the present invention is different from any existing method, and the adaptation can be embodied in selecting a mapping model, and adaptively adjusting the mapping coefficient of the model according to statistical characteristics between channels; Adaptive switching between different mapping models according to statistical characteristics between channels, such as switching between ICA mapping method and PCA mapping method.
  • the adaptive subspace mapping strategy of the present invention has significant significance for achieving the object of the present invention, that is, to obtain a very high coding efficiency while encoding a multi-channel signal while ensuring the quality of the encoded signal.
  • the subspace mapping model can be described as follows:
  • A is the current subspace mapping matrix.
  • W is a new subspace mapping matrix.
  • s, x, z are vectors of de-average scalar random variables.
  • the adaptive subspace mapping of the present invention finds an optimized mapping matrix W, so that the new subspace observation vector z obtained by the mapping is optimal, that is, the optimal coding efficiency can be obtained.
  • W the new subspace observation vector z obtained by the mapping
  • the statistical characteristics of multi-channel signals are time-varying.
  • the distribution of different signal components may be Laplacian or Gaussian or other forms.
  • different coding rates and coding modes require different mapping matrix performance (such as orthogonality, correlation, etc.).
  • ICA independent component analysis model
  • PCA principal component analysis model
  • each random variable in the sound source vector s is statistically independent of each other, and at most one of them is a Gaussian distribution, and the optimal solution of the mapped observation vector z is the source vector s (or only one ratio different from the source vector s)
  • the subspace mapping model is equivalent to the Independent Component Analysis Model (ICA).
  • the mapping matrix W can be obtained by maximizing the measure of the non-Gaussian distribution (such as the Kurtosis index, the Negentropy index, etc.).
  • the FastICA algorithm can be used to implement fast ICA model mapping, as described below:
  • Negative entropy is defined as:
  • Ng(y) H(y gauss )-H(y) (4)
  • ygauss is a Gaussian random variable with the same variance as y
  • H(y) is the differential entropy of random variables:
  • Ng(y) ⁇ E[g(y)]-E[g(y gauss )] ⁇ 2 (6)
  • E[ ⁇ ] is a mean operation and g( ⁇ ) is a nonlinear function.
  • the basic calculation steps are as follows:
  • W p E ⁇ zg(W p T z) ⁇ -E ⁇ g'(W p T z) ⁇ W p , g be a nonlinear function
  • mapping vector z and the mapping matrix W are obtained.
  • the spatial mapping model is equivalent when it is assumed that the random variables in the sound source vector s are statistically independent of each other and both conform to the Gaussian distribution, and the optimal condition of z is to concentrate the subspace vocal tract information on the least channel.
  • PCA principal component analysis model
  • the mapping matrix W can be obtained by calculating the eigenvalue and the feature vector of the covariance matrix of the observation vector x.
  • the PCA model is essentially a commonly used Karhunen-Loeve transform, which can be solved using the singular value decomposition (SVD) method.
  • Step one calculating a covariance matrix C of the observation vector x;
  • Step two calculating the feature vectors e 1 , e 2 , . . . , e M of the covariance matrix and the eigenvalues ⁇ 1 , ⁇ 2 , . . . , ⁇ M , and classifying the feature values in descending order;
  • step 3 the observation vector x is mapped into the space formed by the feature vector, and the mapping vector z is obtained.
  • the ICA model is very suitable for blind separation and classification of signal components, which is beneficial for decomposing multi-channel signals into multiple statistical independent channels for encoding, and maximally removing statistical redundancy between channels.
  • the mapping matrix vectors in the PCA model are orthogonal, and the multi-channel signal components can be concentrated on as few channels as possible, which is beneficial to reduce the dimension of the encoded signal at a lower code rate.
  • Figure 5 is a schematic diagram comparing the characteristics of the PCA model and the ICA model. From the perspective of mapping efficiency, for most occasions, the multi-channel signal component does not satisfy the orthogonal distribution. At this time, the PCA model cannot obtain the highest mapping. effectiveness.
  • the ICA model does not require the orthogonality of the signal, and most of the sound signals (including the sub-band sound signals) are consistent with the Laplacian distribution. Therefore, the ICA model often achieves high mapping efficiency.
  • the ICA model and the CPA model have different characteristics, but there is great complementarity.
  • the following options are made:
  • the first, ICA coding mode all using ICA coding
  • PCA encoding mode all using PCA encoding
  • ICA and PCA hybrid coding mode Dynamically select ICA or PCA coding mode using open-loop or closed-loop search strategy.
  • the method in the ICA and PCA hybrid coding mode, can be determined according to the signal-to-noise ratio (SNR) or the masking noise ratio (MNR) of the two coding modes of ICA and PCA at a specific code rate.
  • SNR signal-to-noise ratio
  • MNR masking noise ratio
  • the calculation of SNR and MNR can be performed in a general manner.
  • the perceptual coding of the present invention perceptually encodes at least one new set of multi-channel signals and corresponding mapping models.
  • the encoded signal component and the corresponding mapping model parameters may be selected based on the current coded target code rate and the perceived importance of the new multi-channel signal.
  • the multi-channel signal to be encoded is divided into a plurality of sub-bands along three dimensions of time, frequency and channel.
  • known psychoacoustic models such as Johnston model, MPEG Model 1 and Model 2
  • the perceptual importance (weight) of each sub-band is calculated separately, and the number of sub-bands to be encoded and the quantization precision are determined.
  • mapping the model coding the corresponding mapping matrix/vector may be encoded, or other transformation forms of the model may be encoded, or the statistical feature parameters of the mapping matrix may be directly encoded.
  • the invention unifies the selection of the inter-channel subspace mapping model, the parameter calculation and encoding of the mapping matrix, and the perceptual coding of the sub-band (ie time-frequency-channel) into a Rate Distortion Theory coding framework; High-efficiency coding of multi-channel signals is achieved according to constraints such as coding rate, psychoacoustic masking effect, binaural auditory effect, and the like.
  • time-frequency sub-band division is a schematic diagram of time-frequency sub-band division.
  • the time-frequency-channel is divided into multiple time-frequency sub-bands, which are assumed to be within the time-frequency sub-band (t, k).
  • the subspace mapping model is T(t, k), which can be selected among K models T 1 , T 2 , ..., T K , such as including ICA model and PCA model; mapping matrix is W(t, k), can Estimation of statistical parameters between channels (such as ICA and PCA methods); perceptually encoded subband signals are x(t, k, m), ie subband signals x(t, k) in channel m;
  • the signal masking ratio SMR(t,k,m) with signal can be calculated by psychoacoustic model; the target bit number is B bit; using MNR(t,k,m) as the distortion evaluation standard, the following encoding can be used.
  • the given stator carries the signals z(t, k, m), SMR(t, k, m) and the number of target bits B.
  • K mapping models select a model that maximizes MNR(t, k, m), and Coding model number T(t, k), mapping matrix W(t, k) and new subband signal z(t, k, m).
  • the adaptive subspace mapping and perceptual coding of the present invention cooperate to achieve adaptive coding under different coding target conditions.
  • the mapping method of independent component analysis can not only encode high-quality sound signals, but also eliminate noise; and when the code rate is low, the mapping of principal component analysis
  • the method may be more suitable for encoding complex sound signals.
  • the adaptive subspace mapping and perceptual coding method of the present invention can also provide scalable coding, that is, the multi-channel sound signal is encoded only once, and a sound code stream is obtained, thereby providing transmission and decoding of multiple code rates and quality. This supports different application needs of multiple types of users.
  • the perceptual coding module can be further broken down into the following steps:
  • Step 1 selecting at least one set of signals and a corresponding mapping model to perform perceptual coding, and the code rate of the partial code stream is not higher than a base layer code rate constraint;
  • Step 2 selecting a second important at least one set of signals and a corresponding mapping model, performing perceptual coding, and the code rate of the partial code stream is not higher than the first enhancement layer code rate constraint;
  • Step 3 selecting a third important at least one group of signals and a corresponding mapping model, performing perceptual coding, and the code rate of the partial code stream is not higher than the second enhancement layer code rate constraint;
  • Step four and so on, until lossless coding is achieved, and an N-layer code stream is obtained.
  • step five all N layers of code streams are multiplexed into one compressed stream.
  • the compressed stream recombined from the scalable code stream according to the service request shall include at least the base layer code stream, and at a higher code rate, the enhancement layer code stream may be multiplexed in order of importance.
  • FIG. 7 is a flowchart of a method for decoding a multi-channel sound signal according to an embodiment of the present invention, where the method includes:
  • Step 701 Decode the encoded multi-channel code stream to obtain at least one of the second multi-channel sound signals and an optimized subspace mapping model.
  • Step 702 Map the second multi-channel sound signal back to the first multi-channel sound signal by using an optimized subspace mapping model.
  • Step 703 using an inverse time-frequency transform, mapping the first multi-channel sound signal from the frequency domain to the time domain, or using inverse sub-band filtering to map the first multi-channel sound signal from the sub-band domain to the time domain.
  • the method further includes: grouping the plurality of grouped sound signals into a group to obtain a third multi-channel sound signal, which is to be the third The channel sound signal is performed as step 703 as the first multi-channel sound signal.
  • the method when the first multi-channel sound signal is a plurality of packet sound signals in the time domain, after step 703, the method further includes: grouping the plurality of packet sound signals into a group to obtain the fourth Channel sound signal.
  • the method may further include: performing demultiplexing processing on the encoded multi-channel code stream to obtain a plurality of layered code streams; performing step 701 as each of the layered code streams as the encoded multi-channel code stream; After step 701 is performed on all the layered code streams, step 702 and step 703 are uniformly performed.
  • FIG. 8 is a schematic structural diagram of a multi-channel sound signal encoding apparatus according to an embodiment of the present invention, the apparatus includes:
  • the time-frequency mapping unit 801 is configured to map the first multi-channel sound signal into the first frequency domain signal by using time-frequency transform, or map the first multi-channel sound signal into the first sub-band signal by using sub-band filtering. Dividing the first frequency domain signal or the first subband signal into different time-frequency sub-bands;
  • the adaptive subspace mapping unit 802 is configured to calculate, in each time-frequency subband of the different time-frequency subbands divided by the time-frequency mapping unit 801, a first statistical characteristic of the first multi-channel sound signal. And estimating an optimized subspace mapping model according to the first statistical characteristic; and using the optimized subspace mapping model to map the first multichannel sound signal into a second multichannel sound signal;
  • a perceptual coding unit 803 configured to: the adaptive sub-function according to time, frequency, and channel At least one of the second multi-channel sound signals mapped by the spatial mapping unit 802 and the optimized subspace mapping model are perceptually encoded and multiplexed into an encoded multi-channel code stream.
  • the method further includes:
  • a first channel grouping unit configured to calculate, in the each time-frequency subband of the different time-frequency subbands, the first statistical characteristic of the first multi-channel sound signal in the adaptive subspace mapping unit 802
  • the second statistical characteristic of the first multi-channel sound signal is calculated in each time-frequency sub-band of the different time-frequency sub-bands divided by the time-frequency mapping unit; according to the second statistical characteristic, Dividing the first multi-channel sound signal into a plurality of packet sound signals;
  • the adaptive subspace mapping unit 802 and the perceptual coding unit 803 are specifically configured to: use, as the first, each packet sound signal for each packet sound signal divided by the first channel grouping unit. Multi-channel sound signals are processed.
  • the method further includes:
  • a second channel grouping unit configured to: at the time-frequency mapping unit 801, use a time-frequency transform to map the first multi-channel sound signal into a first frequency domain signal, or adopt sub-band filtering to use the first multi-channel Calculating a third statistical characteristic of the first multi-channel sound signal before the sound signal is mapped to the first sub-band signal; dividing the first multi-channel sound signal into a plurality of groups according to the third statistical characteristic Sound signal
  • the time-frequency mapping unit 801, the adaptive sub-space mapping unit 802, and the perceptual coding unit 803 are specifically configured to: each of the packet sound signals divided by the second channel grouping unit The packet sound signal is processed as the first multi-channel sound signal.
  • the adaptive subspace mapping unit 802 is specifically configured to calculate the first multichannel in each time-frequency subband of the different time-frequency subbands divided by the time-frequency mapping unit 801. a first statistical characteristic of the sound signal; selecting an optimized subspace mapping model, adaptively adjusting a mapping coefficient of the optimized subspace mapping model according to the first statistical characteristic; using the optimized subspace mapping model, The first multi-channel sound signal is mapped to a second multi-channel sound signal.
  • the adaptive subspace mapping unit 802 is specifically configured to: Calculating a first statistical characteristic of the first multi-channel sound signal in each time-frequency sub-band of the different time-frequency sub-bands divided by the element 801; and pre-selecting the plurality according to the first statistical characteristic Adaptively switching between different mapping models to one of the mapping models, the mapping model is used as an optimized subspace mapping model; and the optimized multi-space mapping model is used to map the first multi-channel sound signal into a second multi-channel Sound signal.
  • the perceptual coding in the perceptual coding unit 803 is specifically hierarchical perceptual coding.
  • FIG. 9 is a schematic structural diagram of a multi-channel sound signal decoding apparatus according to an embodiment of the present invention, the apparatus includes:
  • the perceptual decoding unit 901 is configured to decode the encoded multi-channel code stream, obtain at least one of the second multi-channel sound signals, and optimize the subspace mapping model;
  • the sub-space inverse mapping unit 902 is configured to map the second multi-channel sound signal obtained by the perceptual decoding unit 901 back to the first multi-channel sound signal by using the optimized sub-space mapping model obtained by the perceptual decoding unit 901;
  • the frequency time mapping unit 903 is configured to use the inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 from the frequency domain to the time domain, or adopt inverse sub-band filtering.
  • the first multi-channel sound signal is mapped from a sub-band domain to a time domain.
  • the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 is a plurality of packet sound signals, and the device further includes:
  • a first packet restoring unit configured to: in the frequency time mapping unit 903, use an inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 from a frequency domain to a time domain, or Using the inverse subband filtering, before the first multi-channel sound signal is mapped from the sub-band domain to the time domain, the plurality of packet sound signals are group-recovered to obtain a third multi-channel sound signal;
  • the frequency time mapping unit 903 is specifically configured to process the third multi-channel sound signal obtained by the first packet restoration unit as the first multi-channel sound signal.
  • the first multi-channel sound signal after the mapping process is performed by the time-frequency mapping unit 903 is a plurality of packet sound signals in the time domain, and the device further includes:
  • a second packet restoring unit configured to: at the time-frequency mapping unit 903, use an inverse time-frequency transform to map the first multi-channel sound signal obtained by the sub-space inverse mapping unit 902 from a frequency domain to a time domain, or After the first multi-channel sound signal is mapped from the sub-band domain to the time domain by inverse sub-band filtering, the plurality of packet sound signals are group-recovered to obtain a fourth multi-channel sound signal.
  • the device further comprises:
  • a demultiplexing unit configured to decode the encoded multi-channel code stream by the perceptual decoding unit 901, obtain at least one of the second multi-channel sound signals, and optimize the sub-space mapping model before encoding the multi-channel code
  • the stream is subjected to demultiplexing processing to obtain a plurality of layered code streams
  • the perceptual decoding unit 901, the sub-space inverse mapping unit 902, and the frequency-time mapping unit 903 are specifically configured to: use each layered code stream obtained by the demultiplexing unit as an encoded multi-channel code stream. deal with.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne un procédé de codage de signal acoustique multicanal, un procédé et un dispositif de décodage destinés à sélectionner de manière auto-adaptative un modèle de mappage lors d'un codage, en vue de permettre un rendement de codage et une qualité de codage supérieurs, le procédé de codage consistant à : employer une conversion de fréquence temporelle pour mapper un premier signal acoustique multicanal en tant que premier signal de domaine de fréquence, ou employer un filtre de sous-bande pour mapper le premier signal acoustique multicanal en tant que premier signal de sous-bande (101) ; diviser le premier signal de domaine de fréquence ou le premier signal de sous-bande en différentes sous-bandes de fréquence temporelle (102) ; calculer une première caractéristique statistique du premier signal acoustique multicanal dans chaque sous-bande de fréquence temporelle (103) ; estimer un modèle de mappage de sous-espace optimisé selon la première caractéristique statistique (104) ; utiliser le modèle de mappage de sous-espace optimisé pour mapper le premier signal acoustique multicanal en tant que second signal acoustique multicanal (105) ; et, en fonction du temps, de la fréquence et du canal, effectuer un codage perceptuel sur le modèle de mappage de sous-espace optimisé et au moins un groupe dans le second signal acoustique multicanal pour obtenir un flux de code multicanal de codage (106).
PCT/CN2014/095396 2014-08-12 2014-12-29 Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage WO2016023323A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410395806.5 2014-08-12
CN201410395806.5A CN105336333B (zh) 2014-08-12 2014-08-12 多声道声音信号编码方法、解码方法及装置

Publications (1)

Publication Number Publication Date
WO2016023323A1 true WO2016023323A1 (fr) 2016-02-18

Family

ID=55286819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/095396 WO2016023323A1 (fr) 2014-08-12 2014-12-29 Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage

Country Status (2)

Country Link
CN (1) CN105336333B (fr)
WO (1) WO2016023323A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111599375A (zh) * 2020-04-26 2020-08-28 云知声智能科技股份有限公司 一种语音交互中多路语音的白化方法及其装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108461086B (zh) * 2016-12-13 2020-05-15 北京唱吧科技股份有限公司 一种音频的实时切换方法和装置
CN108206022B (zh) * 2016-12-16 2020-12-18 南京青衿信息科技有限公司 利用aes/ebu信道传输三维声信号的编解码器及其编解码方法
CN115132214A (zh) 2018-06-29 2022-09-30 华为技术有限公司 立体声信号的编码、解码方法、编码装置和解码装置
TWI692719B (zh) * 2019-03-21 2020-05-01 瑞昱半導體股份有限公司 音訊處理方法與音訊處理系統
CN111682881B (zh) * 2020-06-17 2021-12-24 北京润科通用技术有限公司 一种适用于多用户信号的通信侦察仿真方法及系统
CN113873420B (zh) * 2021-09-28 2023-06-23 联想(北京)有限公司 音频数据处理方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007003702A (ja) * 2005-06-22 2007-01-11 Ntt Docomo Inc 雑音除去装置、通信端末、及び、雑音除去方法
CN101401152A (zh) * 2006-03-15 2009-04-01 法国电信公司 通过多通道音频信号的主分量分析进行编码的设备和方法
CN101667425A (zh) * 2009-09-22 2010-03-10 山东大学 一种对卷积混叠语音信号进行盲源分离的方法
CN103077709A (zh) * 2012-12-28 2013-05-01 中国科学院声学研究所 一种基于共有鉴别性子空间映射的语种识别方法及装置
CN103366751A (zh) * 2012-03-28 2013-10-23 北京天籁传音数字技术有限公司 一种声音编解码装置及其方法
CN103366749A (zh) * 2012-03-28 2013-10-23 北京天籁传音数字技术有限公司 一种声音编解码装置及其方法
WO2014062304A2 (fr) * 2012-10-18 2014-04-24 Google Inc. Décorrélation hiérarchique d'un signal audio multicanal

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7240001B2 (en) * 2001-12-14 2007-07-03 Microsoft Corporation Quality improvement techniques in an audio encoder
KR100981694B1 (ko) * 2002-04-10 2010-09-13 코닌클리케 필립스 일렉트로닉스 엔.브이. 스테레오 신호들의 코딩
EP1691348A1 (fr) * 2005-02-14 2006-08-16 Ecole Polytechnique Federale De Lausanne Codage paramétrique combiné de sources audio
CN101490744B (zh) * 2006-11-24 2013-07-17 Lg电子株式会社 用于编码和解码基于对象的音频信号的方法和装置
EP2375410B1 (fr) * 2010-03-29 2017-11-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Processeur audio spatial et procédé de fourniture de paramètres spatiaux basée sur un signal d'entrée acoustique
CN102682779B (zh) * 2012-06-06 2013-07-24 武汉大学 面向3d音频的双声道编解码方法和编解码器

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007003702A (ja) * 2005-06-22 2007-01-11 Ntt Docomo Inc 雑音除去装置、通信端末、及び、雑音除去方法
CN101401152A (zh) * 2006-03-15 2009-04-01 法国电信公司 通过多通道音频信号的主分量分析进行编码的设备和方法
CN101667425A (zh) * 2009-09-22 2010-03-10 山东大学 一种对卷积混叠语音信号进行盲源分离的方法
CN103366751A (zh) * 2012-03-28 2013-10-23 北京天籁传音数字技术有限公司 一种声音编解码装置及其方法
CN103366749A (zh) * 2012-03-28 2013-10-23 北京天籁传音数字技术有限公司 一种声音编解码装置及其方法
WO2014062304A2 (fr) * 2012-10-18 2014-04-24 Google Inc. Décorrélation hiérarchique d'un signal audio multicanal
CN103077709A (zh) * 2012-12-28 2013-05-01 中国科学院声学研究所 一种基于共有鉴别性子空间映射的语种识别方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111599375A (zh) * 2020-04-26 2020-08-28 云知声智能科技股份有限公司 一种语音交互中多路语音的白化方法及其装置
CN111599375B (zh) * 2020-04-26 2023-03-21 云知声智能科技股份有限公司 一种语音交互中多路语音的白化方法及其装置

Also Published As

Publication number Publication date
CN105336333A (zh) 2016-02-17
CN105336333B (zh) 2019-07-05

Similar Documents

Publication Publication Date Title
US11735192B2 (en) Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
WO2016023323A1 (fr) Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage
JP6641018B2 (ja) チャネル間時間差を推定する装置及び方法
TWI669705B (zh) 用以使用側邊增益及殘餘增益編碼或解碼多通道信號之設備及方法
TWI397903B (zh) 編碼音訊之節約音量測量技術
RU2645271C2 (ru) Стереофонический кодер и декодер аудиосигналов
RU2369918C2 (ru) Многоканальное восстановление на основе множественной параметризации
US9830918B2 (en) Enhanced soundfield coding using parametric component generation
EP3776541B1 (fr) Appareil, procédé ou programme d'ordinateur pour estimer une différence de temps entre canaux
WO2007026821A1 (fr) Dispositif de conformage d’énergie et procédé de conformage d’énergie
CA3017405C (fr) Appareil de codage permettant de traiter un signal d'entree et appareil de decodage permettant de traiter un signal code
WO2017206794A1 (fr) Procédé et dispositif d'extraction de paramètre de déphasage inter-canaux
US9373337B2 (en) Reconstruction of a high-frequency range in low-bitrate audio coding using predictive pattern analysis
CN109285553A (zh) 对高阶高保真立体声信号应用动态范围压缩的方法和设备
US9848272B2 (en) Decorrelator structure for parametric reconstruction of audio signals
Hu et al. Multi-step coding structure of spatial audio object coding
WO2016023322A1 (fr) Procédé de codage de signal acoustique multicanal, procédé et dispositif de décodage
US10950251B2 (en) Coding of harmonic signals in transform-based audio codecs
Gorlow et al. Informed separation of spatial images of stereo music recordings using second-order statistics
Gorlow et al. Reverse engineering stereo music recordings pursuing an informed two-stage approach
Wang et al. Critical band subspace-based speech enhancement using SNR and auditory masking aware technique
Gorlow et al. Informed separation of spatial images of stereo music recordings using low-order statistics
Cantzos Statistical enhancement methods for immersive audio environments and compressed audio
Zhu et al. Fast convolution for binaural rendering based on HRTF spectrum

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14899905

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14899905

Country of ref document: EP

Kind code of ref document: A1