CN105336333A - Multichannel sound signal coding and decoding method and device - Google Patents

Multichannel sound signal coding and decoding method and device Download PDF

Info

Publication number
CN105336333A
CN105336333A CN201410395806.5A CN201410395806A CN105336333A CN 105336333 A CN105336333 A CN 105336333A CN 201410395806 A CN201410395806 A CN 201410395806A CN 105336333 A CN105336333 A CN 105336333A
Authority
CN
China
Prior art keywords
sound signal
channel sound
frequency
channel
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410395806.5A
Other languages
Chinese (zh)
Other versions
CN105336333B (en
Inventor
潘兴德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd
Original Assignee
BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd filed Critical BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd
Priority to CN201410395806.5A priority Critical patent/CN105336333B/en
Priority to PCT/CN2014/095396 priority patent/WO2016023323A1/en
Publication of CN105336333A publication Critical patent/CN105336333A/en
Application granted granted Critical
Publication of CN105336333B publication Critical patent/CN105336333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)

Abstract

The invention relates to a multichannel sound signal coding and decoding method and device. The coding method includes: adopting time-frequency transformation to map first multichannel sound signals to be first frequency domain signals, or adopting sub-band filtering to map the first multichannel sound signals to be first sub-band signals; dividing the first frequency domain signals or the first sub-band signals into different time-frequency sub-bands; in each time-frequency sub-band, calculating first statistical characteristics of the first multichannel sound signals; according to the first statistical characteristics, estimating an optimized sub-space mapping model; adopting the optimized sub-space mapping model to map the first multichannel sound signals to be second multichannel sound signals; according to time, frequency and sound channels, performing perceptual coding on at least one group of the second multichannel sound signals and the optimized sub-space mapping model to acquire coded multichannel code stream. In a word, the coding method selects mapping models in a self-adaptive manner, and higher coding efficiency and coding quality can be realized.

Description

Multi-channel sound signal coding method, coding/decoding method and device
Technical field
The present invention relates to audio signal processing technique field, particularly relate to multi-channel sound signal coding method, coding/decoding method and device.
Background technology
Along with the development of science and technology, occurred the multiple coding techniques to voice signal, tut is commonly referred to as the appreciable signals of people's ear such as voice, music, natural sound and Prof. Du Yucang sound at interior digital audio.At present, a lot of Voice coding techniques has become industrial standard and has been widely applied, incorporate in daily life, conventional Voice coding techniques has the AC-3 of Dolby Labs, the DTS of Digital Theater System company, MP3 and AAC of mobile image expert group (MPEG) tissue, the WMA of Microsoft, and the ATRAC of Sony.
In order to the sound effect of reproducing stereo sound, multi-channel sound signal is played to user by the multiple sound channel of many employings now, the coding method of multi-channel sound signal, also from being representative with AC-3 and MP3 and waveform encoding techniques such as difference stereo (M/SStereo) and intensity stereo (IntensityStereo) etc., evolves to MP3Pro, ITUEAAC+, MPEGSurround, DolbyDD+ be representative parameter stereo (ParametricStereo) and parameter surround sound (ParametricSurround) technology.PS (comprising ParametricStereo and ParametricSurround) is from the psychoacoustic angle of ears, make full use of ears time/the psychologic acoustics spatial character such as phase differential (ITD/IPD), intensity difference at two ears (IID), ears correlativity (IC), realize the parameter coding of multi-channel sound signal.
PS technology generally mixes (downmix) by under multi-channel sound signal at coding side, generate 1 and sound channel signal, waveform coding (or waveform and parameter hybrid coding is adopted with sound channel signal, as EAAC+), and ITD/IPD, IID and IC parameter that is each sound channel is corresponding and sound channel signal carries out parameter coding.In decoding end, according to these parameters, from sound channel signal recover multi-channel signal.Also when encoding, multi-channel signal can be divided into groups, and at different sound channel groups employing PS decoding method as above.Also can adopt the mode of cascade, multichannel be carried out multistage PS coding.
Facts have proved, simple waveform coding (and sound channel) and PS coding techniques, although higher coding quality can be realized under lower code check; But under higher code check, PS technology but can not promotion signal quality further, is not suitable for the application scenario of high-fidelity.Its reason is, PS technology is only encoded and sound channel signal at coding side, and has lost residual error sound channel signal, causes to recover original signal completely during decoding.For this reason, MPEGSurround adopts the method for residual information coding, makes up the deficiency of PS technology.
But, no matter be traditional PS technology or MPEGSurround technology, all too relied on the psychoacoustic characteristics of ears, and have ignored the statistical property of multi-channel sound signal itself.Such as, traditional PS technology and MPEGSurround technology all do not utilize sound channel between statistical redundancy information.And, when MPEGSurround adopts residual information to encode, and still there is statistical redundancy between sound channel signal and residual error sound channel signal, thus the quality of code efficiency and coded signal cannot be taken into account.
Summary of the invention
The invention provides a kind of multi-channel sound signal coding method, coding/decoding method and device, object is to solve in the multi-channel sound signal coding method of prior art, there is statistical redundancy, cannot take into account the problem of the quality of code efficiency and coded signal.
For achieving the above object, first aspect, the invention provides a kind of multi-channel sound signal coding method, the method comprises: A) adopt time-frequency conversion, first multi-channel sound signal is mapped as the first frequency-region signal, or employing sub-band filter, is mapped as the first subband signal by the first multi-channel sound signal; B) described first frequency-region signal or described first subband signal are divided into different time-frequency subband; C), in each time-frequency subband in described different time-frequency subband, the first statistical property of described first multi-channel sound signal is calculated; D) according to described first statistical property, Estimation Optimization subspace mapping model; E) adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal; F) according to the difference of time, frequency and sound channel, perceptual coding is carried out at least one group in described second multi-channel sound signal and described optimization subspace mapping model, and is multiplexed into encoded multi-channel code stream.
Second aspect, the invention provides a kind of multi-channel sound signal code device, this device comprises: time-frequency map unit, for adopting time-frequency conversion, first multi-channel sound signal is mapped as the first frequency-region signal, or employing sub-band filter, is mapped as the first subband signal by the first multi-channel sound signal; Described first frequency-region signal or described first subband signal are divided into different time-frequency subband; Self-adaptation subspace mapping unit, in each time-frequency subband in the different time-frequency subbands of described time-frequency map unit division, calculates the first statistical property of described first multi-channel sound signal; According to described first statistical property, Estimation Optimization subspace mapping model; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal; Perceptual coding unit, for the difference according to time, frequency and sound channel, perceptual coding is carried out at least one group in the second multi-channel sound signal of described self-adaptation subspace mapping unit maps and described optimization subspace mapping model, and is multiplexed into encoded multi-channel code stream.
The third aspect, the invention provides a kind of multi-channel sound signal coding/decoding method, the method comprises: A) encoded multi-channel code stream is decoded, obtain at least one group in the second multi-channel sound signal and optimize subspace mapping model; B) adopt described optimization subspace mapping model, described second multi-channel sound signal is mapped back the first multi-channel sound signal; C) adopt inverse time-frequency conversion, described first multi-channel sound signal is mapped as time domain from frequency domain, or adopts inverse sub-band filter, described first multi-channel sound signal is mapped as time domain from subband domain.
Fourth aspect, the invention provides a kind of multi-channel sound signal decoding device, this device comprises: perception decoding unit, for decoding to encoded multi-channel code stream, obtaining at least one group in the second multi-channel sound signal and optimizing subspace mapping model; Subspace inverse mapping unit, the optimization subspace mapping model obtained for adopting described perception decoding unit, maps back the first multi-channel sound signal by the second multi-channel sound signal that described perception decoding unit obtains; Frequently map unit time, for adopting inverse time-frequency conversion, the first multi-channel sound signal that described subspace inverse mapping unit obtains is mapped as time domain from frequency domain, or adopts inverse sub-band filter, described first multi-channel sound signal is mapped as time domain from subband domain.
In the multi-channel sound signal coding method of the embodiment of the present invention, have employed self-adaptation subspace mapping, first by calculating the statistical property of multi-channel sound signal, thus Estimation Optimization subspace mapping model, then above-mentioned optimization subspace mapping model is adopted, multi-channel sound signal is mapped, then carries out perceptual coding.Therefore, the embodiment of the present invention is adaptively selected mapping model in coding, can better estimate and utilize the statistical property of signal between sound channel, and reduces the statistical redundancy between sound channel to greatest extent, while realizing higher code efficiency, ensure the quality of coded signal.
Accompanying drawing explanation
Fig. 1 is the multi-channel sound signal coding method process flow diagram in one embodiment of the invention;
Fig. 2 is the multi-channel sound signal coding method process flow diagram in another embodiment of the present invention;
Fig. 3 is the multi-channel sound signal coding method process flow diagram in another embodiment of the present invention;
Fig. 4 is the subspace mapping relation schematic diagram in one embodiment of the invention;
Fig. 5 is that pca model in one embodiment of the invention and ICA model feature contrast schematic diagram;
Fig. 6 is the time-frequency sub-band division schematic diagram in one embodiment of the invention;
Fig. 7 is the multi-channel sound signal coding/decoding method process flow diagram in one embodiment of the invention;
Fig. 8 is the multi-channel sound signal code device structural representation in one embodiment of the invention;
Fig. 9 is the multi-channel sound signal decoding device structural representation in one embodiment of the invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Multi-channel sound signal coding method in the embodiment of the present invention, be different from additive method of the prior art, take full advantage of statistical property and the psychoacoustic characteristics of multi-channel sound signal, while obtaining high code efficiency, ensure the quality of coded signal, by adopting the method for self-adaptation subspace mapping, statistical redundancy between maximum elimination multi-channel signal, the multiple subspace mapping model of creationary use, and in coding adaptively selected mapping model, can better estimate and utilize the statistical property of signal between sound channel, and the statistical redundancy reduced to greatest extent between sound channel, realize higher code efficiency.
Fig. 1 is the multi-channel sound signal coding method process flow diagram in one embodiment of the invention, and the method comprises:
Step 101, adopts time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or adopts sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal.
Wherein, the initial form of expression of the first multi-channel sound signal is time-domain signal u (m, t), is processed by above-mentioned mapping, can obtain multichannel frequency-region signal or subband signal x (m, k).Wherein, m is sound channel sequence number, and t is frame (or subframe) sequence number, and k is frequency or sub-band serial number.
In the embodiment of the present invention, time-frequency conversion can adopt the time-frequency conversion technology such as correction cosine transform (MDCT), cosine transform (DCT), Fourier transform (FFT) generally used; Sub-band filter can adopt the quadrature mirror filter bank (QMF PQMF CQMF), cosine modulated filter banks (CMF/MLT) technology that more generally use; Time-frequency conversion also can adopt the multiresolution analysis technology such as wavelet transformation (wavelet); The time-frequency of the embodiment of the present invention maps and can adopt one (as AC-3, AAC) in above three kinds of mapping methods or array configuration (as MP3, BellLabPAC).
Step 102, is divided into different time-frequency subband by the first frequency-region signal or the first subband signal.
Wherein, before step 101, the voice signal of coding can first be divided into frame to be encoded, then carries out time-frequency conversion or sub-band filter.If adopt larger frame length, frame data may be decomposed into multiple subframe again, then carry out time-frequency conversion or sub-band filter.After obtaining frequency domain or subband signal, multiple frequency subband can be formed by frequency order; The frequency-region signal that also multiple time-frequency conversion or sub-band filter can be obtained, composition two-dimensional time-frequency plane, carries out time-frequency region division in this plane; Further, this time-frequency region is projected at each sound channel time-frequency plane, time-frequency subband x to be encoded can be obtained i(t, k), i is the sequence number of this time-frequency subband, and t is frame (or subframe) sequence number.Suppose that each time-frequency subband is rectangular area, then time-frequency subband x irange of signal in (t, k) is: t i-1≤ t < t i, k i-1≤ k < k i, t i-1and t ifor initial sum abort frame (or subframe) sequence number of this subband, k i-1and k ifor the initial sum of this subband stops frequency or sub-band serial number.If the total number of time-frequency subband is N, then i≤N.Conveniently, the region of certain time-frequency subband can represent with (t, k).It should be noted that the signal that each time-frequency subband all comprises each sound channel and projects in this time-frequency region, when needing to refer in particular to the projection of certain sound channel in this time-frequency region, available x i(t, k, m) represents.
Step 103, in each time-frequency subband in different time-frequency subband, calculates the first statistical property of the first multi-channel sound signal.
Step 104, according to the first statistical property, Estimation Optimization subspace mapping model.
Particularly, one can be selected and optimize subspace mapping model, optimize the mapping coefficient of subspace mapping model according to the first statistical property self-adaptative adjustment; Or according to the first statistical property, between previously selected multiple different mappings model, self-adaptation is switched to one of them mapping model, using this mapping model as optimization subspace mapping model.
The first statistical property in the embodiment of the present invention, when assessing different models, identical statistic can be selected, such as first order statistic (average), second-order statistic (variance and related coefficient) and high-order statistic (High Order Moment) and variation thereof, usually more selection second-order statistic.More excellent, for different mapping models, different statistics can be selected, to obtain more excellent result.Such as, when assessing ICA model, adopt negentropy; When assessing pca model, adopting covariance matrix, namely adopting second-order statistic as the first statistical property.
Step 105, adopts and optimizes subspace mapping model, the first multi-channel sound signal is mapped as the second multi-channel sound signal.
Particularly, in different time-frequency subband, multi-channel sound signal x can be calculated ithe statistical property of (t, k), and Estimation Optimization subspace mapping model W i(t, k), adopts the mapping model estimated, multi-channel signal is mapped to new subspace, obtains one group of new multi-channel signal z i(t, k).
Step 106, according to the difference of time, frequency and sound channel, at least one group in the second multi-channel sound signal with optimize subspace mapping model and carry out perceptual coding, and is multiplexed into encoded multi-channel code stream.
Particularly, can by least one group of new multi-channel signal z i(t, k) and corresponding mapping model W i(t, k) carries out perceptual coding, and is multiplexed into encoded multi-channel code stream.
Wherein, said sensed coding is specifically as follows classification perceptual coding.
From above-mentioned processing procedure, in the multi-channel sound signal coding method of the embodiment of the present invention, have employed self-adaptation subspace mapping, first by calculating the statistical property of multi-channel sound signal, thus Estimation Optimization subspace mapping model, then adopt above-mentioned optimization subspace mapping model, multi-channel sound signal is mapped, then carries out perceptual coding.Therefore, the embodiment of the present invention is adaptively selected mapping model in coding, can better estimate and utilize the statistical property of signal between sound channel, and reduces the statistical redundancy between sound channel to greatest extent, while realizing higher code efficiency, ensure the quality of coded signal.
Consider in multi-channel sound signal, the acoustic constituents of some sound channel is significantly different with the acoustic constituents of other sound channels.Now, these sound channels can be divided into groups separately, adopt said method, it is optimized mapping model and extracts more accurate.Therefore, when the multi-channel sound signal for this type of is encoded, also can increase the step of a sound channel packet transaction, improve code efficiency.
Fig. 2 is the multi-channel sound signal coding method process flow diagram in another embodiment of the present invention, and in this embodiment after carrying out time-frequency mapping to multi-channel sound signal, add the step of a sound channel packet transaction, the method comprises:
Step 201, adopts time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or adopts sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal.
Step 202, is divided into different time-frequency subband by the first frequency-region signal or the first subband signal.
Wherein, the voice signal of coding can first be divided into frame to be encoded, then carries out time-frequency conversion or sub-band filter.If adopt larger frame length, frame data may be decomposed into multiple subframe again, then carry out time-frequency conversion or sub-band filter.After obtaining frequency domain or subband signal, multiple frequency subband can be formed by frequency order; The frequency-region signal that also multiple time-frequency conversion or sub-band filter can be obtained, composition two-dimensional time-frequency plane, carries out time-frequency region division in this plane, can obtain time-frequency subband to be encoded.
Step 203, in each time-frequency subband in different time-frequency subband, calculates the second statistical property of the first multi-channel sound signal, according to the second statistical property, the first multi-channel sound signal is divided into multiple grouping voice signal.
In the embodiment of the present invention, can in different time-frequency subband, calculate the statistical property of multi-channel sound signal x (m, k), then according to the statistical property of each channel sound composition, multi-channel signal is divided into one or more groups sound channel, and often group comprises at least one sound channel signal, for the grouping of a sound channel, directly carry out perceptual coding, for the grouping of more than one sound channel, perform follow-up process.
Second statistical property of the present invention, first order statistic (average), second-order statistic (variance and related coefficient) and high-order statistic (High Order Moment) and variation thereof can be adopted, usually more selection second-order statistic, particularly related coefficient.For saving calculated amount, the first statistical property also can be utilized as the judge benchmark of grouping, and now, the second statistical property can be identical with the first statistical property value.
For each grouping voice signal that step 203 divides, each grouping voice signal is performed step 204 to 207 as the first multi-channel sound signal.
Step 204, in each time-frequency subband in different time-frequency subband, calculates the first statistical property of the first multi-channel sound signal.
Step 205, according to the first statistical property, Estimation Optimization subspace mapping model.
Step 206, adopts and optimizes subspace mapping model, the first multi-channel sound signal is mapped as the second multi-channel sound signal.
In the embodiment of the present invention, can according to the statistical property of each channel sound composition, Estimation Optimization subspace mapping model W i(t, k); Adopt the mapping model estimated, multi-channel signal is mapped to new subspace, obtains one group of new multi-channel signal z i(t, k).
Step 207, according to the difference of time, frequency and sound channel, at least one group in the second multi-channel sound signal with optimize subspace mapping model and carry out perceptual coding, and is multiplexed into encoded multi-channel code stream.
Wherein, can by least one group of new multi-channel signal z i(t, k) and corresponding mapping model W i(t, k) carries out perceptual coding, by all perceptual coding information multiplexings, obtains encoded multi-channel code stream.
In addition, as an interchangeable scheme, particularly under lower code check, also can select after step 101 time-frequency maps, before step 102 divides different sub-band, divide into groups; This can bring an apparent benefit, namely transmits less grouping information, and under lower code check, the bit reduced shared by grouping information has more practicality.Now, can after execution step 101, first calculate the second statistical property of the first multi-channel sound signal, then according to the second statistical property, first multi-channel sound signal is divided into multiple grouping voice signal, for each grouping voice signal, each grouping voice signal is performed step 102 to 106 as the first multi-channel sound signal.
Fig. 3 is the multi-channel sound signal coding method process flow diagram in another embodiment of the present invention, in this embodiment, first carries out packet transaction to multi-channel sound signal, and then carry out the process such as time-frequency mapping for each grouping voice signal, the method comprises:
Step 301, calculates the 3rd statistical property of the first multi-channel sound signal, according to the 3rd statistical property, the first multi-channel sound signal is divided into multiple grouping voice signal.
Wherein, the statistical property of multi-channel sound signal u (m, t) can be calculated, and according to statistical property, multi-channel signal is divided into one or more groups sound channel, and often group comprises at least one sound channel signal.
3rd statistical property of the present invention, first order statistic (average), second-order statistic (variance and related coefficient) and high-order statistic (High Order Moment) and variation thereof can be adopted, usually more selection second-order statistic, particularly related coefficient.
For each grouping voice signal, each grouping voice signal is performed step 302 to 307 as the first multi-channel sound signal.
Step 302, adopts time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or adopts sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal.
Step 303, is divided into different time-frequency subband by the first frequency-region signal or the first subband signal.
Wherein, time-frequency conversion or sub-band filter can be adopted, multichannel time-domain signal u (m, t) after grouping is mapped as multichannel frequency-region signal or subband signal x (m, k), and the division of signal after being mapped by time-frequency is different time-frequency subbands.
Step 304, in each time-frequency subband in different time-frequency subband, calculates the first statistical property of the first multi-channel sound signal.
Step 305, according to the first statistical property, Estimation Optimization subspace mapping model.
Have employed self-adaptation subspace mapping in the embodiment of the present invention and carry out Estimation Optimization subspace mapping model, above-mentioned self-adaptation subspace mapping is different from existing multi-channel sound coding method, that innovates have employed subspace mapping (SubspaceMapping) method, namely according to the statistical property of signal, estimate the optimization subspace mapping model of multichannel, this model is an adaptive matrix of a linear transformation, subspace mapping method, the hyperspace statistical analysis method grown up in recent years can be adopted, as independent component analysis (IndependentComponentsAnalysis, ICA), principal component analysis (PCA) (PrincipalComponentsAnalysis, PCA), typical case's association analysis (CanonicalCorrelationAnalysis, and the method such as projection pursuit (ProjetionPursuit) CCA).
In prior art, the multichannel coding of Based PC A, the convenient dimension for reducing multi-channel encoder, but between reduction sound channel in statistical redundancy, be not but best practice.Therefore, the present invention proposes and a kind ofly more effectively take into account the statistical property of channel sound signal and the coding method of psychoacoustic characteristics, facts have proved, the inventive method obtains higher code efficiency and quality than existing method.
Step 306, adopts and optimizes subspace mapping model, the first multi-channel sound signal is mapped as the second multi-channel sound signal.
Wherein, in different time-frequency subband, multi-channel sound signal x can be calculated ithe statistical property of (t, k), and Estimation Optimization subspace mapping model W i(t, k); Adopt the mapping model estimated, multi-channel signal is mapped to new subspace, obtains one group of new multi-channel signal z i(t, k).
Step 307, according to the difference of time, frequency and sound channel, at least one group in the second multi-channel sound signal with optimize subspace mapping model and carry out perceptual coding, and is multiplexed into encoded multi-channel code stream.
Wherein, can by least one group of new multi-channel signal z i(t, k) and corresponding mapping model W i(t, k) carries out perceptual coding; By all perceptual coding information multiplexings, obtain encoded multi-channel code stream.
Perceptual coding in the embodiment of the present invention, can adopt following any one sound encoding system:
Waveform coding: as the perception quantification that adopts in MP3, AAC and Huffman entropy code, the perception vector quantization coding etc. adopted in the index-mantissa coding adopted in AC-3, OggVorbis and TwinVQ;
Parameter coding: the harmonic vector as adopted in the harmonic wave adopted in MPEGHILN, independent string composition and noise code, MPEGHVXC encourages in coding, AMRWB+ and adopts code excited and conversion code excitation (TCX) coding etc.;
Waveform-parameter hybrid coding: as the method medium and low frequencies such as MP3Pro, AAC+, AMRWB+ adopt waveform coding, high frequency adopts bandspreading parameter coding.
Self-adaptation subspace mapping in the embodiment of the present invention, is different from any existing method, and namely its self-adaptation can be embodied in a selected mapping model, according to the mapping coefficient of this model of statistical property self-adaptative adjustment between sound channel; Also be embodied in and switch, as the switching between ICA mapping method and PCA mapping method according to the self-adaptation of statistical property between sound channel between different mappings model.
Self-adaptation subspace mapping strategy of the present invention, for realizing object of the present invention, namely ensureing the quality of coded signal, having significant meaning while the code efficiency that encoded multi-channel signal acquisition is high.
Subspace mapping model can be described below:
1. atomic space mapping relations:
If it is s, s={s that M-ties up sound source vector 1, s 2..., s m,
X, x={x 1, x 2..., x mfor showing the measurement vector of subspace, and
x=As(1)
Wherein A is existing subspace mapping matrix.
2. new subspace mapping relation:
Z, z={z 1, z 2..., z mbe the measurement vector of new subspace, and
z=Wx(2)
With reference to the subspace mapping relation schematic diagram shown in Fig. 4, wherein, W is new subspace mapping matrix.And s, x, z are the vector of average scalar stochastic variable composition.
Self-adaptation subspace mapping of the present invention, namely finds the mapping matrix W that is optimized, and the new subspace measurement vector z that mapping is obtained is optimum, can obtain optimum code efficiency.Different according to the alternative condition of optimum, different optimization mapping matrixes can be had.This feature, meets the practical application of multi-channel encoder: the first very much, and become when the statistical property of multi-channel signal is, unlike signal component distributing may be laplacian distribution, also may be Gaussian distribution or other forms; The second, under different code rates and coding mode, require different to the performance (as orthogonality, correlativity etc.) of mapping matrix.
Without loss of generality, below for independent component analysis model (ICA) and principal component model (PCA), self-adaptation subspace mapping method of the present invention is described.
When each stochastic variable statistical iteration each other in hypothesis sound source vector s, and wherein only have at most one for Gaussian distribution, and when the optimum solution of the measurement vector z after mapping is source vector s (or only differing a scale-up factor with source vector s), subspace mapping model is just equivalent to independent component analysis model (ICA).Now
z=Wx=WAs
W -1=A(3)
Can by make non-gaussian distribution to estimate (as Kurtosis index, Negentropy index etc.) maximum, obtain mapping matrix W.Typically, FastICA algorithm realization ICA Model Mapping fast can be used, be specifically described as follows:
From information theory theory: in the stochastic variable of the variance such as all, the entropy of gaussian variable is maximum, therefore can measure non-Gaussian system with entropy, and Negentropy index (negentropy) is exactly the one correction form of entropy.Negentropy is defined as:
Ng(y)=H(y gauss)-H(y)(4)
Wherein, ygauss is one and has mutually homoscedastic Gaussian random variable with y, the differential entropy that H (y) is stochastic variable:
H(y)=-∫p y(ξ)lgp y(ξ)dξ(5)
The non-Gaussian system of y is stronger, and its differential entropy is less, and negentropy Ng (y) is larger.In actual applications, the calculating of negentropy adopts following formula:
Ng(y)={E[g(y)]-E[g(y gauss)]}2(6)
E [] is mean operation, and g () is nonlinear function.Without loss of generality, desirable g 1(y)=tanh (a 1y) (1≤a 1, or g≤2) 2(y)=yexp (-y 2/ 2) or g 3(y)=y 3deng nonlinear function.
FastICA algorithm, also known as fixed point (fixed-point) algorithm, proposed by people such as Univ Helsinki Finland Hyvarinen, namely will find a direction, make formula (2) z=Wx have maximum non-Gaussian system (negentropy is maximum).Basic calculating step is as follows:
1. pair measurement vector x centralization, makes its average be 0;
2. pair data carry out albefaction, even if x → z;
3. select the number m needing the component estimated, if iterations p ← 1;
4. select initial weight vector (random) W p;
5. make W p=E{zg (W p tz) }-E{g'(W p tz) } W p, g is nonlinear function;
6. W p = W p - &Sigma; j = 1 p - 1 ( W p T W j ) W j ;
7. make W p=W p/ || W p||;
8. if W pdo not restrain, return the 5th step;
9. make p=p+1, if p≤m, return the 4th step.
Finally obtain and map vector z and mapping matrix W.
When in hypothesis sound source vector s, each stochastic variable adds up irrelevant each other, and all meet Gaussian distribution, and the optimal conditions of z is that when making subspace channel information concentrate in minimum sound channel, spatial mappings model is just equivalent to principal component model (PCA).Now, do not require to isolate source vector from observation signal, by the eigenwert of the covariance matrix of calculating observation vector x and eigenvector, mapping matrix W can be obtained.Pca model is exactly conventional Karhunen-Loeve conversion in essence, can solve by svd (SVD) method.
The basic calculating step of pca model is as follows:
Step one, the covariance matrix C of calculating observation vector x;
Step 2, calculates the proper vector e of covariance matrix 1, e 2..., e mand eigenvalue λ 1, λ 2..., λ m, eigenwert is by the sequence of descending order;
Step 3, is mapped to measurement vector x among space that eigenvector opens, obtains and map vector z.
ICA model is suitable for blind separation and the classification of signal content very much, and the sound channel being conducive to multi-channel signal to be decomposed into multiple statistical iteration is encoded, and farthest removes the statistical redundancy between sound channel.And be orthogonal between mapping matrix vector in pca model, multi-channel signal composition can be concentrated in the least possible sound channel, be conducive to the dimension reducing coded signal under lower code check.
Fig. 5 is that pca model and ICA model feature contrast schematic diagram, and from the angle of mapping efficiency, for most of occasion, multi-channel signal composition does not meet the feature of omnidirectional distribution, and now, pca model can not obtain the highest mapping efficiency.And ICA model does not require the orthogonality of signal, and most voice signal meets (comprising subband voice signal) feature of laplacian distribution, and therefore, ICA model often can obtain very high mapping efficiency.
Can see from above-mentioned analysis, ICA model is different with CPA model feature, but there is very large complementarity.In the specific implementation, according to the parameter configuration of scrambler, following selection can be done:
The first, ICA coding mode: all adopt ICA coding
The second, PCA coding mode: all adopt PCA coding
The third, ICA and PCA hybrid coding pattern: adopt open loop or closed-loop search strategy, Dynamic Selection ICA or PCA coding mode.
Wherein, in ICA and PCA hybrid coding pattern, can according under specific code check, signal to noise ratio (S/N ratio) (SNR) or the masking noise ratio (MNR) of ICA and PCA two kinds of coding modes judge to adopt which kind of mode.The calculating of SNR and MNR can adopt general method.
Perceptual coding of the present invention, carries out perceptual coding by least one group of new multi-channel signal and corresponding mapping model.Can according to the target bit rate of present encoding, and the perceptual important degree of new multi-channel signal, select the signal content of coding and corresponding mapping model parameter.
Now, multi-channel signal to be encoded, along time, frequency and sound channel three dimensions, is divided into multiple subband.Adopt known psychoacoustic model (as Johnston model, MPEGModel1 and Model2), calculate the perceptual important degree (weight) of each subband respectively, determine subband number to be encoded and quantified precision.When mapping model is encoded, corresponding mapping matrix/vector of can encoding, other variations of this model of also can encoding, can also direct coding so as to calculating the statistical nature parameter of mapping matrix.
The present invention by the parameter of the selection of subspace mapping model between sound channel, mapping matrix calculate and coding, subband perceptual coding (i.e. T/F-sound channel) unify in rate distortion (RateDistortionTheory) coding framework; And according to constraint conditions such as code rate, psychologic acoustics masking effect, binaural hearing effects, realize the high-level efficiency coding of multi-channel signal.
Fig. 6 is time-frequency sub-band division schematic diagram, according to aforesaid method, in current coded frame, T/F-sound channel is divided into multiple time-frequency subband, supposes in time-frequency subband (t, k), subspace mapping model is T (t, k), can at K model T 1, T 2..., T kmiddle selection, as comprised ICA model and pca model; Mapping matrix is W (t, k), can estimate (as ICA and PCA method) by the statistical parameter between sound channel; The subband signal of perceptual coding is x (t, k, m), the subband signal x (t, k) namely in sound channel m; The signal-to-mask ratio SMR (t, k, m) of this subband signal, calculates by psychoacoustic model; Target bit is B bit; Adopt MNR (t, k, m) to be distortion evaluation criterion, then can adopt following coding strategy:
Given subband signal z (t, k, m), SMR (t, k, m) and target bit B, in K mapping model, select one and make MNR (t, k, m) maximum model, and encoding model sequence number T (t, k), mapping matrix W (t, k) with new subband signal z (t, k, m).
Self-adaptation subspace mapping of the present invention and perceptual coding match, and can be implemented in the adaptive coding under different encoding target conditions.Such as, for the multi-channel signal that noise is smaller, the mapping method of independent component analysis not only can be encoded to voice signal high-quality, even can realize the elimination of noise; And when encoder bit rate is lower, the mapping method of principal component analysis (PCA) may be more suitable for for complicated voice signal of encoding.
Self-adaptation subspace mapping of the present invention and perceptual coding method, also can provide gradable coding, namely multi-channel sound signal is only encoded once, obtains a sound code stream, transmission and the decoding of multi code Rate of Chinese character and quality can be provided, thus support the different application demand of polytype user.When supporting graduated encoding, perceptual coding module can be decomposed into following steps further:
Step one, selects most important at least one group of signal and corresponding mapping model, carry out perceptual coding, and the code check of this partial code streams is not higher than basal layer constrained code rate;
Step 2, at least one group of signal that selection second is important and corresponding mapping model, carry out perceptual coding, and the code check of this partial code streams is not higher than the first enhancement layer constrained code rate;
Step 3, at least one group of signal that selection the 3rd is important and corresponding mapping model, carry out perceptual coding, and the code check of this partial code streams is not higher than the second enhancement layer constrained code rate;
Step 4, by that analogy, until realize lossless coding, obtains N layer bit stream.
Step 5, all N layer bit streams are multiplexed into a compressive flow.
In the application scenario of graduated encoding, according to the compressive flow of services request from gradable code stream again compound, at least base layer code stream should be comprised, under higher code check, can by the multiplexing enhancement layer bitstream of importance degree order.
Fig. 7 is the multi-channel sound signal coding/decoding method process flow diagram in one embodiment of the invention, and the method comprises:
Step 701, decodes to encoded multi-channel code stream, obtains at least one group in the second multi-channel sound signal and optimizes subspace mapping model.
Step 702, adopts and optimizes subspace mapping model, the second multi-channel sound signal is mapped back the first multi-channel sound signal.
Step 703, adopts inverse time-frequency conversion, and the first multi-channel sound signal is mapped as time domain from frequency domain, or adopts inverse sub-band filter, and the first multi-channel sound signal is mapped as time domain from subband domain.
Wherein, when the first multi-channel sound signal is multiple grouping voice signal, before step 703, can also comprise: multiple grouping voice signal is carried out grouping and restore, obtain the 3rd multi-channel sound signal, the 3rd multi-channel sound signal is performed step 703 as the first multi-channel sound signal.
In the embodiment of the present invention, when the first multi-channel sound signal is when time domain is multiple grouping voice signal, after step 703, can also comprise: described multiple grouping voice signal is carried out grouping and restore, obtain the 4th multi-channel sound signal.
In addition, before step 701, can also comprise: demultiplexing process is carried out to encoded multi-channel code stream, obtain multiple layered code stream; Each layered code stream is performed step 701 as encoded multi-channel code stream; After all step 701 is performed to whole layered code stream, then seek unity of action step 702 and step 703.Fig. 8 is the multi-channel sound signal code device structural representation in one embodiment of the invention, and this device comprises:
Time-frequency map unit 801, for adopting time-frequency conversion, is mapped as the first frequency-region signal by the first multi-channel sound signal, or adopts sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal; Described first frequency-region signal or described first subband signal are divided into different time-frequency subband;
Self-adaptation subspace mapping unit 802, in each time-frequency subband in the different time-frequency subbands of described time-frequency map unit 801 division, calculates the first statistical property of described first multi-channel sound signal; According to described first statistical property, Estimation Optimization subspace mapping model; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal;
Perceptual coding unit 803, for the difference according to time, frequency and sound channel, perceptual coding is carried out at least one group in the second multi-channel sound signal of described self-adaptation subspace mapping unit 802 mapping and described optimization subspace mapping model, and is multiplexed into encoded multi-channel code stream.
Preferably, also comprise:
First sound channel grouped element, for in each time-frequency subband of described self-adaptation subspace mapping unit 802 in different time-frequency subband, before calculating the first statistical property of described first multi-channel sound signal, in each time-frequency subband in the different time-frequency subbands that described time-frequency map unit divides, calculate the second statistical property of described first multi-channel sound signal; According to described second statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
Described self-adaptation subspace mapping unit 802 and described perceptual coding unit 803 specifically for, for each grouping voice signal that described first sound channel grouped element divides, described each grouping voice signal is processed as described first multi-channel sound signal.
Preferably, also comprise:
Second sound channel grouped element, for adopting time-frequency conversion in described time-frequency map unit 801, first multi-channel sound signal is mapped as the first frequency-region signal, or employing sub-band filter, before first multi-channel sound signal is mapped as the first subband signal, calculate the 3rd statistical property of described first multi-channel sound signal; According to described 3rd statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
Described time-frequency map unit 801, described self-adaptation subspace mapping unit 802 and described perceptual coding unit 803 specifically for, for each grouping voice signal that described second sound channel grouped element divides, described each grouping voice signal is processed as described first multi-channel sound signal.
Preferably, described self-adaptation subspace mapping unit 802 specifically for: in each time-frequency subband in the different time-frequency subbands that described time-frequency map unit 801 divides, calculate the first statistical property of described first multi-channel sound signal; Optimize subspace mapping model, according to described first statistical property self-adaptative adjustment, optimize the mapping coefficient of subspace mapping model for selected one; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal.
Preferably, described self-adaptation subspace mapping unit 802 specifically for: in each time-frequency subband in the different time-frequency subbands that described time-frequency map unit 801 divides, calculate the first statistical property of described first multi-channel sound signal; According to described first statistical property, between previously selected multiple different mappings model, self-adaptation is switched to one of them mapping model, using this mapping model as optimization subspace mapping model; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal.
Preferably, the perceptual coding in described perceptual coding unit 803 is specially classification perceptual coding.
Fig. 9 is the multi-channel sound signal decoding device structural representation in one embodiment of the invention, and this device comprises:
Perception decoding unit 901, for decoding to encoded multi-channel code stream, obtaining at least one group in the second multi-channel sound signal and optimizing subspace mapping model;
Subspace inverse mapping unit 902, the optimization subspace mapping model obtained for adopting described perception decoding unit 901, maps back the first multi-channel sound signal by the second multi-channel sound signal that described perception decoding unit 901 obtains;
Frequently map unit 903 time, for adopting inverse time-frequency conversion, the first multi-channel sound signal that described subspace inverse mapping unit 902 obtains is mapped as time domain from frequency domain, or adopts inverse sub-band filter, described first multi-channel sound signal is mapped as time domain from subband domain.
Preferably, the first multi-channel sound signal that described subspace inverse mapping unit 902 obtains is multiple grouping voice signals, and described device also comprises:
First grouping restoration unit, inverse time-frequency conversion is adopted for the map unit 903 when described frequency, the first multi-channel sound signal that described subspace inverse mapping unit 902 obtains is mapped as time domain from frequency domain, or adopt inverse sub-band filter, before described first multi-channel sound signal is mapped as time domain from subband domain, described multiple grouping voice signal is carried out grouping to restore, obtain the 3rd multi-channel sound signal;
During described frequency map unit 903 specifically for, using described first grouping restoration unit obtain the 3rd multi-channel sound signal process as described first multi-channel sound signal.
Preferably, during described frequency, map unit 903 carries out the first multi-channel sound signal after mapping process is multiple grouping voice signals in time domain, and described device also comprises:
Second grouping restoration unit, inverse time-frequency conversion is adopted for the map unit 903 when described frequency, the first multi-channel sound signal that described subspace inverse mapping unit 902 obtains is mapped as time domain from frequency domain, or adopt inverse sub-band filter, by described first multi-channel sound signal from after subband domain is mapped as time domain, described multiple grouping voice signal is carried out grouping to restore, obtain the 4th multi-channel sound signal.
Preferably, described device also comprises:
Demultiplexing unit, decode for described perception decoding unit 901 pairs of encoded multi-channel code streams, before obtaining at least one group in the second multi-channel sound signal and optimizing subspace mapping model, demultiplexing process is carried out to encoded multi-channel code stream, obtains multiple layered code stream;
When described perception decoding unit 901, described subspace inverse mapping unit 902 and described frequency map unit 903 specifically for, each layered code stream obtained by described demultiplexing unit processes as encoded multi-channel code stream.
Professional should recognize further, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with electronic hardware, computer software or the combination of the two, in order to the interchangeability of hardware and software is clearly described, generally describe composition and the step of each example in the above description according to function.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.
The software module that the method described in conjunction with embodiment disclosed herein or the step of algorithm can use hardware, processor to perform, or the combination of the two is implemented.Software module can be placed in the storage medium of other form any known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
Above-described embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only the specific embodiment of the present invention; the protection domain be not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (21)

1. a multi-channel sound signal coding method, is characterized in that, described method comprises:
A) adopt time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or adopt sub-band filter, the first multi-channel sound signal is mapped as the first subband signal;
B) described first frequency-region signal or described first subband signal are divided into different time-frequency subband;
C), in each time-frequency subband in described different time-frequency subband, the first statistical property of described first multi-channel sound signal is calculated;
D) according to described first statistical property, Estimation Optimization subspace mapping model;
E) adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal;
F) according to the difference of time, frequency and sound channel, perceptual coding is carried out at least one group in described second multi-channel sound signal and described optimization subspace mapping model, and is multiplexed into encoded multi-channel code stream.
2. the method for claim 1, is characterized in that, in each time-frequency subband in described different time-frequency subband, before calculating the first statistical property of described first multi-channel sound signal, also comprises:
In each time-frequency subband in described different time-frequency subband, calculate the second statistical property of described first multi-channel sound signal; According to described second statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
For each grouping voice signal, described each grouping voice signal is performed step C as described first multi-channel sound signal) to F).
3. the method for claim 1, is characterized in that, described described first frequency-region signal or described first subband signal be divided into different time-frequency subband before, also comprise:
Calculate the second statistical property of described first multi-channel sound signal; According to described second statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
For each grouping voice signal, described each grouping voice signal is performed step B as described first multi-channel sound signal) to F).
4. the method for claim 1, is characterized in that, described employing time-frequency conversion, is mapped as the first frequency-region signal by the first multi-channel sound signal, or adopts sub-band filter, before the first multi-channel sound signal is mapped as the first subband signal, also comprises:
Calculate the 3rd statistical property of described first multi-channel sound signal; According to described 3rd statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
For each grouping voice signal, described each grouping voice signal is performed steps A as described first multi-channel sound signal) to F).
5. the method for claim 1, is characterized in that, described according to described first statistical property, Estimation Optimization subspace mapping model, specifically comprises:
Optimize subspace mapping model, according to described first statistical property self-adaptative adjustment, optimize the mapping coefficient of subspace mapping model for selected one.
6. the method for claim 1, is characterized in that, described according to described first statistical property, Estimation Optimization subspace mapping model, specifically comprises:
According to described first statistical property, between previously selected multiple different mappings model, self-adaptation is switched to one of them mapping model, using this mapping model as optimization subspace mapping model.
7. the method for claim 1, is characterized in that, described perceptual coding is specially classification perceptual coding.
8. a multi-channel sound signal code device, is characterized in that, described device comprises:
Time-frequency map unit, for adopting time-frequency conversion, is mapped as the first frequency-region signal by the first multi-channel sound signal, or adopts sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal; Described first frequency-region signal or described first subband signal are divided into different time-frequency subband;
Self-adaptation subspace mapping unit, in each time-frequency subband in the different time-frequency subbands of described time-frequency map unit division, calculates the first statistical property of described first multi-channel sound signal; According to described first statistical property, Estimation Optimization subspace mapping model; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal;
Perceptual coding unit, for the difference according to time, frequency and sound channel, perceptual coding is carried out at least one group in the second multi-channel sound signal of described self-adaptation subspace mapping unit maps and described optimization subspace mapping model, and is multiplexed into encoded multi-channel code stream.
9. device as claimed in claim 8, is characterized in that, also comprise:
First sound channel grouped element, for in each time-frequency subband of described self-adaptation subspace mapping unit in different time-frequency subband, before calculating the first statistical property of described first multi-channel sound signal, in each time-frequency subband in the different time-frequency subbands that described time-frequency map unit divides, calculate the second statistical property of described first multi-channel sound signal; According to described second statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
Described self-adaptation subspace mapping unit and described perceptual coding unit specifically for, for each grouping voice signal that described first sound channel grouped element divides, described each grouping voice signal is processed as described first multi-channel sound signal.
10. device as claimed in claim 8, is characterized in that, also comprise:
Second sound channel grouped element, for adopting time-frequency conversion in described time-frequency map unit, first multi-channel sound signal is mapped as the first frequency-region signal, or employing sub-band filter, before first multi-channel sound signal is mapped as the first subband signal, calculate the 3rd statistical property of described first multi-channel sound signal; According to described 3rd statistical property, described first multi-channel sound signal is divided into multiple grouping voice signal;
Described time-frequency map unit, described self-adaptation subspace mapping unit and described perceptual coding unit specifically for, for each grouping voice signal that described second sound channel grouped element divides, described each grouping voice signal is processed as described first multi-channel sound signal.
11. devices as claimed in claim 8, it is characterized in that, described self-adaptation subspace mapping unit specifically for: in each time-frequency subband in the different time-frequency subbands that described time-frequency map unit divides, calculate the first statistical property of described first multi-channel sound signal; Optimize subspace mapping model, according to described first statistical property self-adaptative adjustment, optimize the mapping coefficient of subspace mapping model for selected one; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal.
12. devices as claimed in claim 8, it is characterized in that, described self-adaptation subspace mapping unit specifically for: in each time-frequency subband in the different time-frequency subbands that described time-frequency map unit divides, calculate the first statistical property of described first multi-channel sound signal; According to described first statistical property, between previously selected multiple different mappings model, self-adaptation is switched to one of them mapping model, using this mapping model as optimization subspace mapping model; Adopt described optimization subspace mapping model, described first multi-channel sound signal is mapped as the second multi-channel sound signal.
13. devices as claimed in claim 8, it is characterized in that, the perceptual coding in described perceptual coding unit is specially classification perceptual coding.
14. 1 kinds of multi-channel sound signal coding/decoding methods, is characterized in that, described method comprises:
A) encoded multi-channel code stream is decoded, obtain at least one group in the second multi-channel sound signal and optimize subspace mapping model;
B) adopt described optimization subspace mapping model, described second multi-channel sound signal is mapped back the first multi-channel sound signal;
C) adopt inverse time-frequency conversion, described first multi-channel sound signal is mapped as time domain from frequency domain, or adopts inverse sub-band filter, described first multi-channel sound signal is mapped as time domain from subband domain.
15. methods as claimed in claim 14, is characterized in that, described first multi-channel sound signal is multiple grouping voice signals; At the time-frequency conversion that described employing is inverse, described first multi-channel sound signal is mapped as time domain from frequency domain, or adopts inverse sub-band filter, before described first multi-channel sound signal is mapped as time domain from subband domain, also comprise:
Described multiple grouping voice signal is carried out grouping to restore, obtain the 3rd multi-channel sound signal;
Described 3rd multi-channel sound signal is performed step C as described first multi-channel sound signal).
16. methods as claimed in claim 14, is characterized in that, described first multi-channel sound signal is multiple grouping voice signals in time domain; At the time-frequency conversion that described employing is inverse, described first multi-channel sound signal is mapped as time domain from frequency domain, or adopts inverse sub-band filter, by described first multi-channel sound signal from after subband domain is mapped as time domain, also comprise:
Described multiple grouping voice signal is carried out grouping to restore, obtain the 4th multi-channel sound signal.
17. methods as claimed in claim 14, is characterized in that, describedly decode to encoded multi-channel code stream, before obtaining at least one group in the second multi-channel sound signal and optimizing subspace mapping model, also comprise:
Demultiplexing process is carried out to encoded multi-channel code stream, obtains multiple layered code stream;
Using each layered code stream as encoded multi-channel code stream, perform steps A);
When all performing steps A to whole layered code stream) after, then the step B that seeks unity of action) and step C).
18. 1 kinds of multi-channel sound signal decoding devices, is characterized in that, described device comprises:
Perception decoding unit, for decoding to encoded multi-channel code stream, obtaining at least one group in the second multi-channel sound signal and optimizing subspace mapping model;
Subspace inverse mapping unit, the optimization subspace mapping model obtained for adopting described perception decoding unit, maps back the first multi-channel sound signal by the second multi-channel sound signal that described perception decoding unit obtains;
Frequently map unit time, for adopting inverse time-frequency conversion, the first multi-channel sound signal that described subspace inverse mapping unit obtains is mapped as time domain from frequency domain, or adopts inverse sub-band filter, described first multi-channel sound signal is mapped as time domain from subband domain.
19. devices as claimed in claim 18, is characterized in that, the first multi-channel sound signal that described subspace inverse mapping unit obtains is multiple grouping voice signals, and described device also comprises:
First grouping restoration unit, inverse time-frequency conversion is adopted for the map unit when described frequency, the first multi-channel sound signal that described subspace inverse mapping unit obtains is mapped as time domain from frequency domain, or adopt inverse sub-band filter, before described first multi-channel sound signal is mapped as time domain from subband domain, described multiple grouping voice signal is carried out grouping to restore, obtain the 3rd multi-channel sound signal;
During described frequency map unit specifically for, using described first grouping restoration unit obtain the 3rd multi-channel sound signal process as described first multi-channel sound signal.
20. devices as claimed in claim 18, is characterized in that, during described frequency, map unit carries out the first multi-channel sound signal after mapping process is multiple grouping voice signals in time domain, and described device also comprises:
Second grouping restoration unit, inverse time-frequency conversion is adopted for the map unit when described frequency, the first multi-channel sound signal that described subspace inverse mapping unit obtains is mapped as time domain from frequency domain, or adopt inverse sub-band filter, by described first multi-channel sound signal from after subband domain is mapped as time domain, described multiple grouping voice signal is carried out grouping to restore, obtain the 4th multi-channel sound signal.
21. devices as claimed in claim 18, it is characterized in that, described device also comprises:
Demultiplexing unit, for described perception decoding unit, encoded multi-channel code stream is decoded, before obtaining at least one group in the second multi-channel sound signal and optimizing subspace mapping model, demultiplexing process is carried out to encoded multi-channel code stream, obtains multiple layered code stream;
When described perception decoding unit, described subspace inverse mapping unit and described frequency map unit specifically for, each layered code stream obtained by described demultiplexing unit processes as encoded multi-channel code stream.
CN201410395806.5A 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device Active CN105336333B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410395806.5A CN105336333B (en) 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device
PCT/CN2014/095396 WO2016023323A1 (en) 2014-08-12 2014-12-29 Multichannel acoustic signal encoding method, decoding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410395806.5A CN105336333B (en) 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device

Publications (2)

Publication Number Publication Date
CN105336333A true CN105336333A (en) 2016-02-17
CN105336333B CN105336333B (en) 2019-07-05

Family

ID=55286819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410395806.5A Active CN105336333B (en) 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device

Country Status (2)

Country Link
CN (1) CN105336333B (en)
WO (1) WO2016023323A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108461086A (en) * 2016-12-13 2018-08-28 北京唱吧科技股份有限公司 A kind of real-time switching method and apparatus of audio
CN110660400A (en) * 2018-06-29 2020-01-07 华为技术有限公司 Coding method, decoding method, coding device and decoding device for stereo signal
TWI692719B (en) * 2019-03-21 2020-05-01 瑞昱半導體股份有限公司 Audio processing method and audio processing system
CN111682881A (en) * 2020-06-17 2020-09-18 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN108206022B (en) * 2016-12-16 2020-12-18 南京青衿信息科技有限公司 Codec for transmitting three-dimensional acoustic signals by using AES/EBU channel and coding and decoding method thereof
CN113873420A (en) * 2021-09-28 2021-12-31 联想(北京)有限公司 Audio data processing method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111599375B (en) * 2020-04-26 2023-03-21 云知声智能科技股份有限公司 Whitening method and device for multi-channel voice in voice interaction

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1647158A (en) * 2002-04-10 2005-07-27 皇家飞利浦电子股份有限公司 Coding of stereo signals
JP2007003702A (en) * 2005-06-22 2007-01-11 Ntt Docomo Inc Noise eliminator, communication terminal, and noise eliminating method
US20070185706A1 (en) * 2001-12-14 2007-08-09 Microsoft Corporation Quality improvement techniques in an audio encoder
CN101133441A (en) * 2005-02-14 2008-02-27 弗劳恩霍夫应用研究促进协会 Parametric joint-coding of audio sources
CN101401152A (en) * 2006-03-15 2009-04-01 法国电信公司 Device and method for encoding by principal component analysis a multichannel audio signal
CN101490744A (en) * 2006-11-24 2009-07-22 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
CN101667425A (en) * 2009-09-22 2010-03-10 山东大学 Method for carrying out blind source separation on convolutionary aliasing voice signals
CN102682779A (en) * 2012-06-06 2012-09-19 武汉大学 Double-channel encoding and decoding method for 3D audio frequency and codec
CN102918588A (en) * 2010-03-29 2013-02-06 弗兰霍菲尔运输应用研究公司 A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366749B (en) * 2012-03-28 2016-01-27 北京天籁传音数字技术有限公司 A kind of sound codec devices and methods therefor
CN103366751B (en) * 2012-03-28 2015-10-14 北京天籁传音数字技术有限公司 A kind of sound codec devices and methods therefor
US9396732B2 (en) * 2012-10-18 2016-07-19 Google Inc. Hierarchical deccorelation of multichannel audio
CN103077709B (en) * 2012-12-28 2015-09-09 中国科学院声学研究所 A kind of Language Identification based on total distinctive subspace mapping and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185706A1 (en) * 2001-12-14 2007-08-09 Microsoft Corporation Quality improvement techniques in an audio encoder
CN1647158A (en) * 2002-04-10 2005-07-27 皇家飞利浦电子股份有限公司 Coding of stereo signals
CN101133441A (en) * 2005-02-14 2008-02-27 弗劳恩霍夫应用研究促进协会 Parametric joint-coding of audio sources
JP2007003702A (en) * 2005-06-22 2007-01-11 Ntt Docomo Inc Noise eliminator, communication terminal, and noise eliminating method
CN101401152A (en) * 2006-03-15 2009-04-01 法国电信公司 Device and method for encoding by principal component analysis a multichannel audio signal
CN101490744A (en) * 2006-11-24 2009-07-22 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
CN101667425A (en) * 2009-09-22 2010-03-10 山东大学 Method for carrying out blind source separation on convolutionary aliasing voice signals
CN102918588A (en) * 2010-03-29 2013-02-06 弗兰霍菲尔运输应用研究公司 A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal
CN102682779A (en) * 2012-06-06 2012-09-19 武汉大学 Double-channel encoding and decoding method for 3D audio frequency and codec

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108461086A (en) * 2016-12-13 2018-08-28 北京唱吧科技股份有限公司 A kind of real-time switching method and apparatus of audio
CN108461086B (en) * 2016-12-13 2020-05-15 北京唱吧科技股份有限公司 Real-time audio switching method and device
CN108206022B (en) * 2016-12-16 2020-12-18 南京青衿信息科技有限公司 Codec for transmitting three-dimensional acoustic signals by using AES/EBU channel and coding and decoding method thereof
CN110660400A (en) * 2018-06-29 2020-01-07 华为技术有限公司 Coding method, decoding method, coding device and decoding device for stereo signal
CN110660400B (en) * 2018-06-29 2022-07-12 华为技术有限公司 Coding method, decoding method, coding device and decoding device for stereo signal
US11501784B2 (en) 2018-06-29 2022-11-15 Huawei Technologies Co., Ltd. Stereo signal encoding method and apparatus, and stereo signal decoding method and apparatus
US11776553B2 (en) 2018-06-29 2023-10-03 Huawei Technologies Co., Ltd. Audio signal encoding method and apparatus
TWI692719B (en) * 2019-03-21 2020-05-01 瑞昱半導體股份有限公司 Audio processing method and audio processing system
CN111682881A (en) * 2020-06-17 2020-09-18 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN111682881B (en) * 2020-06-17 2021-12-24 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN113873420A (en) * 2021-09-28 2021-12-31 联想(北京)有限公司 Audio data processing method and device

Also Published As

Publication number Publication date
WO2016023323A1 (en) 2016-02-18
CN105336333B (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN105336333A (en) Multichannel sound signal coding and decoding method and device
KR102219752B1 (en) Apparatus and method for estimating time difference between channels
TWI397903B (en) Economical loudness measurement of coded audio
CN110047496B (en) Stereo audio encoder and decoder
US8200351B2 (en) Low power downmix energy equalization in parametric stereo encoders
CN105518775B (en) Artifact cancellation for multi-channel downmix comb filters using adaptive phase alignment
EP3605847B1 (en) Multichannel signal encoding method and apparatus
TWI404429B (en) Method and apparatus for encoding/decoding multi-channel audio signal
US20080212803A1 (en) Apparatus For Encoding and Decoding Audio Signal and Method Thereof
US9514767B2 (en) Device, method and computer program for freely selectable frequency shifts in the subband domain
KR101445292B1 (en) Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window
CN103366749B (en) A kind of sound codec devices and methods therefor
Chen et al. Spatial parameters for audio coding: MDCT domain analysis and synthesis
CN101427307B (en) Method and apparatus for encoding/decoding multi-channel audio signal
KR100745688B1 (en) Apparatus for encoding and decoding multichannel audio signal and method thereof
CN110462733B (en) Coding and decoding method and coder and decoder of multi-channel signal
CN109036441B (en) Method and apparatus for applying dynamic range compression to high order ambisonics signals
WO2017206794A1 (en) Method and device for extracting inter-channel phase difference parameter
KR101569702B1 (en) residual signal encoding and decoding method and apparatus
US9848272B2 (en) Decorrelator structure for parametric reconstruction of audio signals
CN105336334A (en) Multichannel sound signal coding and decoding method and device
US10332527B2 (en) Method and apparatus for encoding and decoding audio signal
Jansson Stereo coding for the ITU-T G. 719 codec
Wang et al. Critical band subspace-based speech enhancement using SNR and auditory masking aware technique
Zhu et al. Fast convolution for binaural rendering based on HRTF spectrum

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant