CN105336333B - Multi-channel sound signal coding method, coding/decoding method and device - Google Patents

Multi-channel sound signal coding method, coding/decoding method and device Download PDF

Info

Publication number
CN105336333B
CN105336333B CN201410395806.5A CN201410395806A CN105336333B CN 105336333 B CN105336333 B CN 105336333B CN 201410395806 A CN201410395806 A CN 201410395806A CN 105336333 B CN105336333 B CN 105336333B
Authority
CN
China
Prior art keywords
sound signal
channel sound
frequency
signal
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410395806.5A
Other languages
Chinese (zh)
Other versions
CN105336333A (en
Inventor
潘兴德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd
Original Assignee
BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd filed Critical BEIJING TIANLAI CHUANYIN DIGITAL TECHNOLOGY Co Ltd
Priority to CN201410395806.5A priority Critical patent/CN105336333B/en
Priority to PCT/CN2014/095396 priority patent/WO2016023323A1/en
Publication of CN105336333A publication Critical patent/CN105336333A/en
Application granted granted Critical
Publication of CN105336333B publication Critical patent/CN105336333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Abstract

The present invention relates to a kind of multi-channel sound signal coding method, coding/decoding method and devices, the coding method includes: using time-frequency conversion, first multi-channel sound signal is mapped as the first frequency-region signal, or uses sub-band filter, the first multi-channel sound signal is mapped as the first subband signal;First frequency-region signal or the first subband signal are divided into different time-frequency subbands;In each time-frequency subband, the first statistical property of the first multi-channel sound signal is calculated;According to the first statistical property, Estimation Optimization subspace mapping model;Using optimization subspace mapping model, the first multi-channel sound signal is mapped as the second multi-channel sound signal;Encoded multi-channel code stream is obtained at least one set and optimization subspace mapping model progress perceptual coding in the second multi-channel sound signal according to time, frequency and sound channel.Therefore adaptively selected mapping model in coding of the embodiment of the present invention, higher code efficiency and coding quality may be implemented.

Description

Multi-channel sound signal coding method, coding/decoding method and device
Technical field
The present invention relates to audio signal processing technique field more particularly to multi-channel sound signal coding method, coding/decoding method and Device.
Background technique
With the development of science and technology, there is the coding techniques of a variety of pairs of voice signals, above sound be commonly referred to as voice, Digital audio including the appreciable signal of the human ears such as music, natural sound and artificial synthesized sound.Currently, many acoustic codings Technology has become industrial standard and is widely applied, and incorporates in daily life, common Voice coding techniques have Doby The AC-3 in laboratory, the DTS of Digital Theater System company, the MP3 and AAC of mobile image expert group (MPEG) tissue, Microsoft are public The WMA of the department and ATRAC of Sony.
For the sound effect of reproducing stereo sound, mostly use multiple sound channels that multi-channel sound signal is played to use now Family, the coding method of multi-channel sound signal also from using AC-3 and MP3 as representative and difference stereo (M/S Stereo) and by force Waveform encoding techniques such as stereo (Intensity Stereo) are spent, are evolved to MP3 Pro, ITU EAAC+, MPEG Surround, Dolby DD+ are the parameter stereo (Parametric Stereo) and parameter surround sound of representative (Parametric Surround) technology.PS (including Parametric Stereo and Parametric Surround) is from double The angle of ear psychologic acoustics is set out, and ears time/phase difference (ITD/IPD), intensity difference at two ears (IID), ears phase are made full use of The psychologic acoustics spatial characters such as closing property (IC), realize the parameter coding of multi-channel sound signal.
PS technology generally will mix (downmix) under multi-channel sound signal in coding side, generate 1 and sound channel signal, Waveform coding (or waveform and parameter hybrid coding, such as EAAC+) is used with sound channel signal, and each sound channel correspondence and sound channel are believed Number ITD/IPD, IID and IC parameter carry out parameter coding.Restored from sound channel signal in decoding end according to these parameters Multi-channel signal.Multi-channel signal can also be grouped in coding, and use PS encoding and decoding as above in different sound channel groups Method.Cascade mode can also be used, multichannel is subjected to multistage PS and is encoded.
It was verified that simple waveform coding (and sound channel) and PS coding techniques, although can be real under lower code rate Existing higher coding quality;But under higher code rate, PS technology but cannot further promotion signal quality, be not suitable for high-fidelity Application.The reason is that PS technology only encodes and sound channel signal in coding side, and residual error sound channel signal is lost, led It causes that original signal cannot be restored completely when decoding.For this purpose, the method that MPEG Surround uses residual information coding, to make up The deficiency of PS technology.
But either traditional PS technology or MPEG Surround technology, all excessively rely on the psychological sound of ears The statistical property learned characteristic, and have ignored multi-channel sound signal itself.For example, traditional PS technology and MPEG Surround Technology is all without utilizing the statistical redundancy information between sound channel pair.Moreover, when MPEG Surround is encoded using residual information, Statistical redundancy is still had between sound channel signal and residual error sound channel signal, to cannot be considered in terms of the matter of code efficiency and encoded signal Amount.
Summary of the invention
The present invention provides a kind of multi-channel sound signal coding method, coding/decoding method and devices, in order to solve In the multi-channel sound signal coding method of the prior art, there are statistical redundancy, code efficiency and encoded signal cannot be considered in terms of The problem of quality.
To achieve the above object, in a first aspect, the present invention provides a kind of multi-channel sound signal coding method, this method Include: A) time-frequency conversion is used, the first multi-channel sound signal is mapped as the first frequency-region signal, or use sub-band filter, First multi-channel sound signal is mapped as the first subband signal;B) by first frequency-region signal or first subband signal It is divided into different time-frequency subbands;C) in each time-frequency subband in the different time-frequency subbands, first multichannel is calculated First statistical property of voice signal;D) according to first statistical property, Estimation Optimization subspace mapping model;E institute) is used Optimization subspace mapping model is stated, first multi-channel sound signal is mapped as the second multi-channel sound signal;F) basis Time, frequency and sound channel difference, in second multi-channel sound signal at least one set and the optimization subspace reflect It penetrates model and carries out perceptual coding, and be multiplexed into encoded multi-channel code stream.
Second aspect, the present invention provides a kind of multi-channel sound signal code device, which includes: that time-frequency mapping is single First multi-channel sound signal is mapped as the first frequency-region signal, or use sub-band filter for using time-frequency conversion by member, First multi-channel sound signal is mapped as the first subband signal;First frequency-region signal or first subband signal are drawn It is divided into different time-frequency subbands;Adaptive subspace mapping unit, different time-frequencies for being divided in the time-frequency map unit In each time-frequency subband in band, the first statistical property of first multi-channel sound signal is calculated;According to first system Count characteristic, Estimation Optimization subspace mapping model;Using the optimization subspace mapping model, by first multi-channel sound Signal is mapped as the second multi-channel sound signal;Perceptual coding unit, for the difference according to time, frequency and sound channel, to institute At least one set and the optimization subspace stated in the second multi-channel sound signal of adaptive subspace mapping unit mapping are reflected It penetrates model and carries out perceptual coding, and be multiplexed into encoded multi-channel code stream.
The third aspect, the present invention provides a kind of multi-channel sound signal coding/decoding methods, this method comprises: A) it is more to encoding Sound channel code stream is decoded, and obtains at least one set and optimization subspace mapping model in the second multi-channel sound signal;B it) adopts With the optimization subspace mapping model, second multi-channel sound signal is mapped back into the first multi-channel sound signal;C) Using inverse time-frequency conversion, first multi-channel sound signal is mapped as time domain from frequency domain, or filter using inverse subband First multi-channel sound signal is mapped as time domain from subband domain by wave.
Fourth aspect, the present invention provides a kind of multi-channel sound signal decoding apparatus, which includes: that perception decoding is single Member, for being decoded, obtaining at least one set in the second multi-channel sound signal and optimizing sub empty to encoded multi-channel code stream Between mapping model;Subspace inverse mapping unit, the optimization subspace mapping model for being obtained using the perception decoding unit, The second multi-channel sound signal that the perception decoding unit obtains is mapped back into the first multi-channel sound signal;It is mapped when frequency single Member, for using inverse time-frequency conversion, the first multi-channel sound signal that the subspace inverse mapping unit is obtained is from frequency domain It is mapped as time domain, or uses inverse sub-band filter, first multi-channel sound signal is mapped as time domain from subband domain.
In the multi-channel sound signal coding method of the embodiment of the present invention, adaptive subspace mapping is used, is first passed through The statistical property of multi-channel sound signal is calculated, so that Estimation Optimization subspace mapping model, then empty using above-mentioned optimization Between mapping model, multi-channel sound signal is mapped, then carry out perceptual coding.Therefore the embodiment of the present invention is being compiled Code in adaptively selected mapping model, can preferably estimate and using signal between sound channel statistical property, and to greatest extent The statistical redundancy between sound channel is reduced, while realizing higher code efficiency, guarantees the quality of encoded signal.
Detailed description of the invention
Fig. 1 is the multi-channel sound signal coding method flow chart in one embodiment of the invention;
Fig. 2 is the multi-channel sound signal coding method flow chart in another embodiment of the present invention;
Fig. 3 is the multi-channel sound signal coding method flow chart in another embodiment of the present invention;
Fig. 4 is the subspace mapping relation schematic diagram in one embodiment of the invention;
Fig. 5 is pca model and ICA model feature contrast schematic diagram in one embodiment of the invention;
Fig. 6 is the time-frequency sub-band division schematic diagram in one embodiment of the invention;
Fig. 7 is the multi-channel sound signal coding/decoding method flow chart in one embodiment of the invention;
Fig. 8 is the multi-channel sound signal code device structural schematic diagram in one embodiment of the invention;
Fig. 9 is the multi-channel sound signal decoding apparatus structural schematic diagram in one embodiment of the invention.
Specific embodiment
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Multi-channel sound signal coding method in the embodiment of the present invention is different from other methods in the prior art, fills Divide the statistical property and psychoacoustic characteristics that multi-channel sound signal is utilized, while obtaining high code efficiency, protects The quality for demonstrate,proving encoded signal, by using the method for adaptive subspace mapping, to the greatest extent between elimination multi-channel signal Statistical redundancy, it is creative using a variety of subspace mapping models, and the adaptively selected mapping model in coding, it can be more preferable Estimation and statistical property using signal between sound channel, and reduce the statistical redundancy between sound channel to greatest extent, realize higher Code efficiency.
Fig. 1 is the multi-channel sound signal coding method flow chart in one embodiment of the invention, this method comprises:
Step 101, using time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or use First multi-channel sound signal is mapped as the first subband signal by sub-band filter.
Wherein, the initial form of expression of the first multi-channel sound signal is time-domain signal u (m, t), at above-mentioned mapping Reason, available multichannel frequency-region signal or subband signal x (m, k).Wherein, m is sound channel serial number, and t is frame (or subframe) serial number, K is frequency or sub-band serial number.
In the embodiment of the present invention, time-frequency conversion can be using amendment cosine transform (MDCT), the cosine transform generally used (DCT), the time-frequency conversions technology such as Fourier transformation (FFT);Sub-band filter can be using the orthogonal mirror image filtering more generally used Device group (QMF PQMF CQMF), cosine modulated filters group (CMF/MLT) technology;Time-frequency conversion can also use wavelet transformation (wavelet) the multiresolution analysis technology such as;The time-frequency mapping of the embodiment of the present invention can be using in three of the above mapping method One kind (such as AC-3, AAC) or combining form (such as MP3, Bell Lab PAC).
Step 102, the first frequency-region signal or the first subband signal are divided into different time-frequency subbands.
Wherein, before step 101, the voice signal of coding can first be divided into frame to be encoded, then carry out time-frequency change It changes or sub-band filter.If one frame data may be decomposed into again to multiple subframes, then carry out time-frequency change using biggish frame length It changes or sub-band filter.After obtaining frequency domain or subband signal, multiple frequency subbands can be formed by frequency order;It can also will be multiple The frequency-region signal that time-frequency conversion or sub-band filter obtain, forms two-dimensional time-frequency plane, carries out time-frequency region in this plane and draws Point;Further, which is projected in each sound channel time-frequency plane, time-frequency subband x to be encoded can be obtainedi(t, k), i It is the serial number of the time-frequency subband, t is frame (or subframe) serial number.Assuming that each time-frequency subband is rectangular area, then time-frequency subband xi Range of signal in (t, k) are as follows: ti-1≤ t < ti,ki-1≤ k < ki,ti-1And tiFor the starting of the subband and abort frame (or son Frame) serial number, ki-1And kiFor the starting of the subband and termination frequency or sub-band serial number.If time-frequency subband total number is N, i≤N. For the sake of convenient, the region of certain time-frequency subband can be used (t, k) to indicate.It should be noted that each time-frequency subband includes each sound channel X can be used when needing to refer in particular to projection of certain sound channel in the time-frequency region in the signal of time-frequency region projectioni(t, k, m) table Show.
Step 103, in each time-frequency subband in different time-frequency subbands, the first of the first multi-channel sound signal is calculated Statistical property.
Step 104, according to the first statistical property, Estimation Optimization subspace mapping model.
Specifically, an optimization subspace mapping model can be selected, according to the adaptive adjusting and optimizing of the first statistical property The mapping coefficient of subspace mapping model;Alternatively, according to the first statistical property, between previously selected multiple and different mapping models It is adaptively switched to one of mapping model, using the mapping model as optimization subspace mapping model.
The first statistical property in the embodiment of the present invention can choose identical statistic when assessing different models, Such as first order statistic (mean value), second-order statistic (variance and related coefficient) and high-order statistic (High Order Moment) and its transformation shape Formula, usually more selection second-order statistic.More preferably, for different mapping models, it can choose different statistics, with Obtain more preferably result.For example, when assessing ICA model, using negentropy;When assessing pca model, using covariance matrix, I.e. using second-order statistic as the first statistical property.
Step 105, using optimization subspace mapping model, the first multi-channel sound signal is mapped as the second multichannel sound Sound signal.
Specifically, multi-channel sound signal x can be calculated in different time-frequency subbandsiThe statistical property of (t, k), and estimate Meter optimization subspace mapping model WiMulti-channel signal is mapped to new subspace using the mapping model of estimation by (t, k), Obtain one group of new multi-channel signal zi(t,k)。
Step 106, according to the difference of time, frequency and sound channel, in the second multi-channel sound signal at least one set and Optimize subspace mapping model and carry out perceptual coding, and is multiplexed into encoded multi-channel code stream.
It specifically, can be by least one set of new multi-channel signal zi(t, k) and corresponding mapping model Wi(t, k) is carried out Perceptual coding, and it is multiplexed into encoded multi-channel code stream.
Wherein, above-mentioned perceptual coding is specifically as follows classification perceptual coding.
By above-mentioned treatment process it is found that in the multi-channel sound signal coding method of the embodiment of the present invention, use adaptive Subspace mapping is answered, the statistical property for calculating multi-channel sound signal is first passed through, thus Estimation Optimization subspace mapping model, so Above-mentioned optimization subspace mapping model is used afterwards, multi-channel sound signal is mapped, then carry out perceptual coding.It can from above See, adaptively selected mapping model in coding of the embodiment of the present invention, can preferably estimate and using signal between sound channel system Characteristic is counted, and reduces the statistical redundancy between sound channel to greatest extent, while realizing higher code efficiency, guarantees encoded signal Quality.
In view of in multi-channel sound signal, the acoustic constituents of the acoustic constituents of some sound channels and other sound channels are significantly not Together.At this point it is possible to which these sound channels are individually grouped, using the above method, it is more accurate to optimize mapping model extraction.Therefore, When being encoded for such multi-channel sound signal, the step of can also increasing a sound channel packet transaction, compiles to improve Code efficiency.
Fig. 2 is the multi-channel sound signal coding method flow chart in another embodiment of the present invention, in the embodiment After carrying out time-frequency mapping to multi-channel sound signal, the step of increasing a sound channel packet transaction, this method comprises:
Step 201, using time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or use First multi-channel sound signal is mapped as the first subband signal by sub-band filter.
Step 202, the first frequency-region signal or the first subband signal are divided into different time-frequency subbands.
Wherein, the voice signal of coding can first be divided into frame to be encoded, then carry out time-frequency conversion or sub-band filter.Such as Fruit uses biggish frame length, a frame data may be decomposed into again to multiple subframes, then carry out time-frequency conversion or sub-band filter.It obtains After obtaining frequency domain or subband signal, multiple frequency subbands can be formed by frequency order;It can also be by multiple time-frequency conversions or subband Filter obtain frequency-region signal, form two-dimensional time-frequency plane, this plane carry out time-frequency region division, can obtain to The time-frequency subband of coding.
Step 203, in each time-frequency subband in different time-frequency subbands, the second of the first multi-channel sound signal is calculated First multi-channel sound signal is divided into multiple grouping voice signals according to the second statistical property by statistical property.
In the embodiment of the present invention, the statistics that in different time-frequency subbands, can calculate multi-channel sound signal x (m, k) is special Property, then according to the statistical property of each channel sound ingredient, multi-channel signal is divided into one or more groups of sound channels, and every group includes At least one sound channel signal, the grouping for a sound channel, directly progress perceptual coding, the grouping for more than one sound channel, Execute subsequent processing.
Second statistical property of the invention can use first order statistic (mean value), second-order statistic (variance and phase relation Number) and high-order statistic (High Order Moment) and its variation, usually more selection second-order statistic, especially related coefficient. To save calculation amount, judge benchmark of first statistical property as grouping also can use, at this point, the second statistical property and first Statistical property value can be identical.
For each grouping voice signal that step 203 divides, using each grouping voice signal as the first multichannel sound Sound signal executes step 204 to 207.
Step 204, in each time-frequency subband in different time-frequency subbands, the first of the first multi-channel sound signal is calculated Statistical property.
Step 205, according to the first statistical property, Estimation Optimization subspace mapping model.
Step 206, using optimization subspace mapping model, the first multi-channel sound signal is mapped as the second multichannel sound Sound signal.
It, can be according to the statistical property of each channel sound ingredient, Estimation Optimization subspace mapping mould in the embodiment of the present invention Type Wi(t,k);Using the mapping model of estimation, multi-channel signal is mapped to new subspace, obtains one group of new multichannel Signal zi(t,k)。
Step 207, according to the difference of time, frequency and sound channel, in the second multi-channel sound signal at least one set and Optimize subspace mapping model and carry out perceptual coding, and is multiplexed into encoded multi-channel code stream.
Wherein it is possible to by least one set of new multi-channel signal zi(t, k) and corresponding mapping model Wi(t, k) is felt Know coding, by all perceptual coding information multiplexings, obtains encoded multi-channel code stream.
In addition, especially under lower code rate, also can choose in step 101 as an interchangeable scheme After frequency mapping, before step 102 division different sub-band, it is grouped;This can bring an obvious benefit, i.e. transmission is less Grouping information, under lower code rate, reduce grouping information shared by bit have more practicability.At this point it is possible to executing step After rapid 101, the second statistical property of the first multi-channel sound signal is first calculated, then according to the second statistical property, by first Multi-channel sound signal is divided into multiple grouping voice signals, for each grouping voice signal, by each grouping voice signal Step 102 is executed to 106 as the first multi-channel sound signal.
Fig. 3 is the multi-channel sound signal coding method flow chart in another embodiment of the present invention, in the embodiment, first Processing is grouped to multi-channel sound signal, then carries out the processing such as time-frequency mapping, the party for each grouping voice signal Method includes:
Step 301, the third statistical property for calculating the first multi-channel sound signal, according to third statistical property, by first Multi-channel sound signal is divided into multiple grouping voice signals.
Wherein it is possible to calculate the statistical property of multi-channel sound signal u (m, t), and according to statistical property, multichannel is believed Number it is divided into one or more groups of sound channels, and every group includes at least one sound channel signal.
Third statistical property of the invention can use first order statistic (mean value), second-order statistic (variance and phase relation Number) and high-order statistic (High Order Moment) and its variation, usually more selection second-order statistic, especially related coefficient.
For each grouping voice signal, step is executed using each grouping voice signal as the first multi-channel sound signal 302 to 307.
Step 302, using time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or use First multi-channel sound signal is mapped as the first subband signal by sub-band filter.
Step 303, the first frequency-region signal or the first subband signal are divided into different time-frequency subbands.
Wherein it is possible to which the multichannel time-domain signal u (m, t) after grouping is mapped as using time-frequency conversion or sub-band filter Multichannel frequency-region signal or subband signal x (m, k), and the signal after time-frequency is mapped is divided into different time-frequency subbands.
Step 304, in each time-frequency subband in different time-frequency subbands, the first of the first multi-channel sound signal is calculated Statistical property.
Step 305, according to the first statistical property, Estimation Optimization subspace mapping model.
Adaptive subspace mapping is used in the embodiment of the present invention carrys out Estimation Optimization subspace mapping model, it is above-mentioned adaptive Subspace mapping is answered to be different from existing multi-channel sound coding method, innovation uses subspace mapping (Subspace Mapping) method estimates that the optimization subspace mapping model of multichannel, the model are one that is, according to the statistical property of signal The adaptive matrix of a linear transformation, subspace mapping method can be using the hyperspace statistics credit to grow up in recent years Analysis method, such as independent component analysis (Independent Components Analysis, ICA), principal component analysis (Principal Components Analysis, PCA), typical association analysis (Canonical Correlation The methods of Analysis, CCA) and projection pursuit (Projetion Pursuit).
In the prior art, based on the multichannel coding of PCA, it is conveniently used for reducing the dimension of multi-channel encoder, but It is not but best practice between reduction sound channel in terms of statistical redundancy.Therefore, channel sound is more effectively taken into account the invention proposes a kind of The statistical property of signal and the coding method of psychoacoustic characteristics, in practice it has proved that, the method for the present invention obtains more than existing method High code efficiency and quality.
Step 306, using optimization subspace mapping model, the first multi-channel sound signal is mapped as the second multichannel sound Sound signal.
Wherein it is possible to calculate multi-channel sound signal x in different time-frequency subbandsiThe statistical property of (t, k), and estimate Optimize subspace mapping model Wi(t,k);Using the mapping model of estimation, multi-channel signal is mapped to new subspace, is obtained Obtain one group of multi-channel signal z newlyi(t,k)。
Step 307, according to the difference of time, frequency and sound channel, in the second multi-channel sound signal at least one set and Optimize subspace mapping model and carry out perceptual coding, and is multiplexed into encoded multi-channel code stream.
Wherein it is possible to by least one set of new multi-channel signal zi(t, k) and corresponding mapping model Wi(t, k) is felt Know coding;By all perceptual coding information multiplexings, encoded multi-channel code stream is obtained.
Perceptual coding in the embodiment of the present invention, can be using following any sound encoding system:
Waveform coding: such as the perception quantization used in MP3, AAC and Huffman entropy coding, index-tail for being used in AC-3 Perception vector quantization coding used in number encoder, OggVorbis and TwinVQ etc.;
Parameter coding: as used in the harmonic wave, independent string ingredient and noise code, MPEG HVXC that are used in MPEG HILN Harmonic vector excitation coding, in AMR WB+ using code excited and transformation code excited (TCX) coding etc.;
Waveform-parameter hybrid coding: as low frequency uses waveform coding, high frequency in the methods of MP3Pro, AAC+, AMR WB+ Using bandspreading parameter coding.
Adaptive subspace mapping in the embodiment of the present invention is different from any existing method, adaptively can It is embodied in a selected mapping model, the mapping coefficient of the model is adaptively adjusted according to statistical property between sound channel;Also it is embodied in According to adaptive switching of the statistical property between sound channel between different mappings model, such as between ICA mapping method and PCA mapping method Switching.
Adaptive subspace mapping strategy of the invention, for achieving the object of the present invention, i.e., in encoded multi-channel signal The quality for guaranteeing encoded signal while obtaining high code efficiency, there is significant meaning.
Subspace mapping model can be described as follows:
1. atomic space mapping relations:
If it is s, s={ s that M-, which ties up sound source vector,1,s2..., sM,
X, x={ x1,x2..., xMBe existing subspace measurement vector, and
X=As (1)
Wherein A is existing subspace mapping matrix.
2. new subspace mapping relationship:
Z, z={ z1,z2..., zMBe new subspace measurement vector, and
Z=Wx (2)
Referring to subspace mapping relation schematic diagram shown in Fig. 4, wherein W is new subspace mapping matrix.And s, x, z are equal For the vector for going mean value scalar stochastic variable to form.
Adaptive subspace mapping of the invention finds the mapping matrix W of an optimization, the new son for obtaining mapping is empty Between measurement vector z it is optimal, optimal code efficiency can be obtained.It is different according to optimal alternative condition, can have different excellent Change mapping matrix.This feature, meets the practical application of multi-channel encoder very much: first, the statistics of multi-channel signal is special Property is time-varying, and unlike signal component distributing may be laplacian distribution, it is also possible to Gaussian Profile or other forms;The Two, under different code rate and coding mode, the performance (such as orthogonality, correlation) of mapping matrix is required different.
Without loss of generality, below by taking independent component analysis model (ICA) and principal component model (PCA) as an example, explanation Adaptive subspace mapping method of the invention.
When assuming that each mutual statistical iteration of stochastic variable in sound source vector s, and wherein at most only one is Gauss Distribution, and map after measurement vector z optimal solution be source vector s (or differing only by a proportionality coefficient with source vector s) when, Subspace mapping model is just equivalent to independent component analysis model (ICA).At this time
Z=Wx=WAs
W-1=A (3)
It can be obtained by making the estimating of non-gaussian distribution (such as Kurtosis index, Negentropy index) maximum Mapping matrix W.Typically, FastICA algorithm can be used and realize quick ICA model mapping, be specifically described as follows:
From information theory theory: in the stochastic variable of the variances such as all, the entropy of gaussian variable is maximum, therefore can use Entropy measures non-Gaussian system, and Negentropy index (negentropy) is exactly a kind of amendment form of entropy.Negentropy is defined as:
Ng (y)=H (ygauss)-H(y) (4)
Wherein, ygauss, which is one, has mutually homoscedastic Gaussian random variable with y, and H (y) is the differential of stochastic variable Entropy:
H (y)=- ∫ py(ξ)lg py(ξ)dξ (5)
The non-Gaussian system of y is stronger, and differential entropy is smaller, and negentropy Ng (y) is bigger.In practical applications, the calculating of negentropy is adopted With following formula:
Ng (y)={ E [g (y)]-E [g (ygauss)]}2 (6)
E [] is mean operation, and g () is nonlinear function.Without loss of generality, it can use g1(y)=tanh (a1y)(1≤ a1≤ 2) or g2(y)=yexp (- y2/ 2) or g3(y)=y3Equal nonlinear functions.
FastICA algorithm, also known as fixed point (fixed-point) algorithm, are by Univ Helsinki Finland Hyvarinen etc. What people proposed, i.e., to find a direction, formula (2) z=Wx has maximum non-Gaussian system (negentropy is maximum).Basic calculating step It is rapid as follows:
1. pair measurement vector x centralization, makes its mean value 0;
2. pair data carry out albefaction, even if x → z;
3. the number m for the component that selection needs to estimate, if the number of iterations p ← 1;
4. selecting initial weight vector (at random) Wp
5. enabling Wp=E { zg (Wp Tz)}-E{g'(Wp Tz)}Wp, g is nonlinear function;
6.
7. enabling Wp=Wp/||Wp||;
8. if WpIt does not restrain, returns to step 5;
9. enabling p=p+1, if p≤m, step 4 is returned to.
Finally obtain mapping vector z and mapping matrix W.
When assuming that each stochastic variable counts unrelated between each other in sound source vector s, and meet Gaussian Profile, and z Optimal conditions is when concentrating on subspace channel information in least sound channel, and space reflection model is just equivalent to principal component analysis Model (PCA).At this point, not requiring to isolate source vector from observation signal, the covariance matrix of calculating observation vector x can be passed through Characteristic value and characteristic vector, obtain mapping matrix W.Pca model is substantially exactly common Karhunen-Loeve transformation, can To be solved with singular value decomposition (SVD) method.
Steps are as follows for the basic calculating of pca model:
Step 1, the covariance matrix C of calculating observation vector x;
Step 2 calculates the feature vector e of covariance matrix1、e2、…、eMAnd eigenvalue λ1、λ2、…、λM, characteristic value presses Descending sequence sequence;
Measurement vector x is mapped among the space of characteristic vector by step 3, obtains mapping vector z.
ICA model is highly suitable for blind separation and the classification of signal component, is conducive to for multi-channel signal being decomposed into more The sound channel of a statistical iteration is encoded, and removes the statistical redundancy between sound channel to the greatest extent.And the mapping matrix in pca model Be between vector it is orthogonal, multi-channel signal ingredient can be concentrated in sound channel as few as possible, be conducive in lower code rate The lower dimension for reducing encoded signal.
Fig. 5 is pca model and ICA model feature contrast schematic diagram, from the point of view of mapping efficiency, for most of occasions For, multi-channel signal ingredient and the characteristics of be unsatisfactory for omnidirectional distribution, at this point, pca model cannot obtain highest mapping efficiency. And ICA model is not required for the orthogonality of signal, also, most voice signals (including subband voice signal) meet La Pu The characteristics of Lars is distributed, therefore, ICA model can often obtain very high mapping efficiency.
It can see from above-mentioned analysis, ICA model is different with CPA model feature, but there is very big complementarity.Specific real Shi Shi can do following selection according to the parameter configuration of encoder:
The first, ICA coding mode: is all encoded using ICA
It second, PCA coding mode: is all encoded using PCA
The third, ICA and PCA hybrid coding mode: using open loop or closed-loop search strategy, and dynamic select ICA or PCA are compiled Pattern.
Wherein, can be according under specific code rate in ICA and PCA hybrid coding mode, ICA and two kinds of PCA codings Which kind of mode the signal-to-noise ratio (SNR) or masking noise ratio (MNR) of mode judge using.The calculating of SNR and MNR can be using logical Method.
At least one set of new multi-channel signal and corresponding mapping model are carried out perception volume by perceptual coding of the invention Code.Can according to the target bit rate of present encoding and the perceptual important degree of new multi-channel signal, select coding signal at Divide and corresponding mapping model parameter.
At this point, multi-channel signal to be encoded is divided into multiple subbands along three time, frequency and sound channel dimensions. Using known psychoacoustic model (such as Johnston model, MPEG Model1 and Model2), each subband is calculated separately Perceptual important degree (weight) determines subband number and quantified precision to be encoded.In mapping model coding, correspondence can be encoded Mapping matrix/vector, other variations of the model can also be encoded, also can with direct coding so as to calculate mapping matrix Statistical nature parameter.
The present invention is by the selection of subspace mapping model, the parameter calculating of mapping matrix between sound channel and encodes, the sense of subband Know in coding (i.e. T/F-sound channel) unified to one rate distortion (Rate Distortion Theory) coding framework;And According to constraint conditions such as code rate, psychologic acoustics masking effect, binaural hearing effects, realize that the high efficiency of multi-channel signal is compiled Code.
Fig. 6 is time-frequency sub-band division schematic diagram, according to method above-mentioned, in current coded frame, T/F-sound Road is divided into multiple time-frequency subbands, it is assumed that in time-frequency subband (t, k), subspace mapping model is T (t, k), can be at K Model T1、T2、…、TKMiddle selection such as includes ICA model and pca model;Mapping matrix is W (t, k), can be by between sound channel Statistical parameter estimates (such as ICA and PCA method);The subband signal of perceptual coding is x (t, k, m), i.e. subband signal in sound channel m x(t,k);The signal-to-mask ratio SMR (t, k, m) of the subband signal, can be calculated by psychoacoustic model;Target bit For B bit;It is distortion evaluation criterion using MNR (t, k, m), then can uses following coding strategy:
Given subband signal z (t, k, m), SMR (t, k, m) and target bit B select one in K mapping model Make MNR (t, k, m) maximum model, and encoding model serial number T (t, k), mapping matrix W (t, k) and new subband signal z (t, k,m)。
Adaptive subspace mapping of the invention and perceptual coding match, and may be implemented in different encoding target conditions Under adaptive coding.For example, multi-channel signal lesser for signal-to-noise ratio, the mapping method of independent component analysis not only can be with Voice signal high quality is encoded, it might even be possible to realize the elimination of noise;And when encoder bit rate is lower, principal component analysis is reflected Shooting method may be particularly suited for the complicated voice signal of coding.
Adaptive subspace mapping and perceptual coding method of the invention, can also provide gradable coding, i.e., more sound Road voice signal only encodes once, obtains a sound code stream, that is, can provide the transmission and decoding of multi code Rate of Chinese character and quality, to prop up Hold the different application demand of multiple types user.When supporting graduated encoding, perceptual coding module can be further broken into as Lower step:
Step 1 selects most important at least one set of signal and corresponding mapping model, carries out perceptual coding, and the portion The code rate of sub-bit stream is not higher than basal layer constrained code rate;
Step 2 selects the second important at least one set of signal and corresponding mapping model, carries out perceptual coding, and should The code rate of partial code streams is not higher than the first enhancement layer constrained code rate;
Step 3, at least one set of signal and corresponding mapping model for selecting third important carry out perceptual coding, and should The code rate of partial code streams is not higher than the second enhancement layer constrained code rate;
Step 4, and so on, until realizing lossless coding, obtain N layer bit stream.
Step 5, all N layer bit streams are multiplexed into a compression stream.
It, should be extremely according to service request from gradable code stream again compound compression stream in the application of graduated encoding Include less base layer code stream, under higher code rate, enhancement layer bitstream can be multiplexed by different degree sequence.
Fig. 7 is the multi-channel sound signal coding/decoding method flow chart in one embodiment of the invention, this method comprises:
Step 701, encoded multi-channel code stream is decoded, obtain the second multi-channel sound signal at least one set and Optimize subspace mapping model.
Step 702, using optimization subspace mapping model, the second multi-channel sound signal is mapped back into the first multichannel sound Sound signal.
Step 703, using inverse time-frequency conversion, the first multi-channel sound signal is mapped as time domain from frequency domain, or adopt With inverse sub-band filter, the first multi-channel sound signal is mapped as time domain from subband domain.
Wherein, it when the first multi-channel sound signal is multiple grouping voice signals, before step 703, can also wrap It includes: multiple grouping voice signals being grouped recovery, third multi-channel sound signal are obtained, by third multi-channel sound signal Step 703 is executed as the first multi-channel sound signal.
In the embodiment of the present invention, when the first multi-channel sound signal is when time domain is multiple grouping voice signals, in step After 703, it can also include: that the multiple grouping voice signal is grouped recovery, obtain the 4th multi-channel sound signal.
In addition, can also include: to carry out demultiplexing process to encoded multi-channel code stream, obtain multiple points before step 701 Layer bit stream;Step 701 is executed using each layered code stream as encoded multi-channel code stream;When being carried out step to whole layered code streams After 701, then seek unity of action step 702 and step 703.Fig. 8 is the multi-channel sound signal coding in one embodiment of the invention Apparatus structure schematic diagram, the device include:
First multi-channel sound signal is mapped as the first frequency domain letter for using time-frequency conversion by time-frequency map unit 801 Number, or sub-band filter is used, the first multi-channel sound signal is mapped as the first subband signal;By first frequency-region signal Or first subband signal is divided into different time-frequency subbands;
Adaptive subspace mapping unit 802, in the different time-frequency subbands that the time-frequency map unit 801 divides Each time-frequency subband in, calculate the first statistical property of first multi-channel sound signal;It is special according to first statistics Property, Estimation Optimization subspace mapping model;Using the optimization subspace mapping model, by first multi-channel sound signal It is mapped as the second multi-channel sound signal;
Perceptual coding unit 803, for the difference according to time, frequency and sound channel, to the adaptive subspace mapping At least one set and the optimization subspace mapping model in the second multi-channel sound signal that unit 802 maps carry out perception volume Code, and it is multiplexed into encoded multi-channel code stream.
Preferably, further include:
First sound channel grouped element, in the adaptive subspace mapping unit 802 in different time-frequency subbands In each time-frequency subband, before the first statistical property for calculating first multi-channel sound signal, mapped in the time-frequency single In each time-frequency subband in different time-frequency subbands that member divides, the second statistics for calculating first multi-channel sound signal is special Property;According to second statistical property, first multi-channel sound signal is divided into multiple grouping voice signals;
The adaptive subspace mapping unit 802 and the perceptual coding unit 803 are specifically used for, for described first Each grouping voice signal that sound channel grouped element divides, using each grouping voice signal as the first multichannel sound Sound signal is handled.
Preferably, further include:
Second sound channel grouped element, for using time-frequency conversion in the time-frequency map unit 801, by the first multichannel sound Sound signal is mapped as the first frequency-region signal, or uses sub-band filter, and the first multi-channel sound signal is mapped as the first subband Before signal, the third statistical property of first multi-channel sound signal is calculated;It, will be described according to the third statistical property First multi-channel sound signal is divided into multiple grouping voice signals;
The time-frequency map unit 801, the adaptive subspace mapping unit 802 and the perceptual coding unit 803 It is specifically used for, for each grouping voice signal that the second sound channel grouped element divides, each grouping sound is believed It number is handled as first multi-channel sound signal.
Preferably, the adaptive subspace mapping unit 802 is specifically used for: being divided in the time-frequency map unit 801 Different time-frequency subbands in each time-frequency subband in, calculate the first statistical property of first multi-channel sound signal;Choosing A fixed optimization subspace mapping model, adaptively adjusts the optimization subspace mapping model according to first statistical property Mapping coefficient;Using the optimization subspace mapping model, first multi-channel sound signal is mapped as more than second sound Road voice signal.
Preferably, the adaptive subspace mapping unit 802 is specifically used for: being divided in the time-frequency map unit 801 Different time-frequency subbands in each time-frequency subband in, calculate the first statistical property of first multi-channel sound signal;Root According to first statistical property, one of mapping mould is adaptively switched between previously selected multiple and different mapping models Type, using the mapping model as optimization subspace mapping model;Using the optimization subspace mapping model, more than described first Channel sound signal is mapped as the second multi-channel sound signal.
Preferably, the perceptual coding in the perceptual coding unit 803 is specially to be classified perceptual coding.
Fig. 9 is the multi-channel sound signal decoding apparatus structural schematic diagram in one embodiment of the invention, which includes:
Decoding unit 901 is perceived, for being decoded to encoded multi-channel code stream, is obtained in the second multi-channel sound signal At least one set and optimization subspace mapping model;
Subspace inverse mapping unit 902, the optimization subspace mapping mould for being obtained using the perception decoding unit 901 The second multi-channel sound signal that the perception decoding unit 901 obtains is mapped back the first multi-channel sound signal by type;
Map unit 903 when frequency obtain the subspace inverse mapping unit 902 for using inverse time-frequency conversion First multi-channel sound signal is mapped as time domain from frequency domain, or uses inverse sub-band filter, by first multi-channel sound Signal is mapped as time domain from subband domain.
Preferably, the first multi-channel sound signal that the subspace inverse mapping unit 902 obtains is multiple grouping sound Signal, described device further include:
First grouping restoration unit, will the son sky using inverse time-frequency conversion for the map unit 903 in the frequency Between obtained the first multi-channel sound signal of inverse mapping unit 902 from frequency domain be mapped as time domain, or use inverse sub-band filter, Before first multi-channel sound signal is mapped as time domain from subband domain, the multiple grouping voice signal is grouped It restores, obtains third multi-channel sound signal;
Map unit 903 is specifically used for when the frequency, the third multichannel sound that the first grouping restoration unit is obtained Sound signal is handled as first multi-channel sound signal.
Preferably, map unit 903 carries out mapping treated the first multi-channel sound signal in time domain and is when the frequency Multiple grouping voice signals, described device further include:
Second packet restoration unit, will the son sky using inverse time-frequency conversion for the map unit 903 in the frequency Between obtained the first multi-channel sound signal of inverse mapping unit 902 from frequency domain be mapped as time domain, or use inverse sub-band filter, After first multi-channel sound signal is mapped as time domain from subband domain, the multiple grouping voice signal is grouped It restores, obtains the 4th multi-channel sound signal.
Preferably, described device further include:
Demultiplexing unit is decoded encoded multi-channel code stream for the perception decoding unit 901, obtains more than second Before at least one set and optimization subspace mapping model in channel sound signal, encoded multi-channel code stream is carried out to demultiplex use Reason, obtains multiple layered code streams;
Map unit 903 is specific when the perception decoding unit 901, the subspace inverse mapping unit 902 and the frequency For each layered code stream that the demultiplexing unit obtains to be handled as encoded multi-channel code stream.
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution. Professional technician can use different methods to achieve the described function each specific application, but this realization It should not be considered as beyond the scope of the present invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processor The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium well known to interior.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (17)

1. a kind of multi-channel sound signal coding method, which is characterized in that the described method includes:
A time-frequency conversion) is used, the first multi-channel sound signal is mapped as the first frequency-region signal, or use sub-band filter, it will First multi-channel sound signal is mapped as the first subband signal;
B first frequency-region signal or first subband signal) are divided into different time-frequency subbands;
C) in each time-frequency subband in the different time-frequency subbands, the first system of first multi-channel sound signal is calculated Count characteristic;
D) according to first statistical property, one of mapping model is adaptively switched between multiple and different mapping models, Using the mapping model as optimization subspace mapping model;It is empty that optimization is adaptively adjusted according to first statistical property Between mapping model mapping coefficient;
E the optimization subspace mapping model) is used, first multi-channel sound signal is mapped as the second multi-channel sound Signal;
F) according to the difference of time, frequency and sound channel, at least one set of and described excellent in second multi-channel sound signal Beggar's space reflection model carries out perceptual coding, and is multiplexed into encoded multi-channel code stream.
2. the method as described in claim 1, which is characterized in that in each time-frequency subband in the different time-frequency subbands, Before the first statistical property for calculating first multi-channel sound signal, further includes:
In each time-frequency subband in the different time-frequency subbands, the second statistics of first multi-channel sound signal is calculated Characteristic;According to second statistical property, first multi-channel sound signal is divided into multiple grouping voice signals;
For each grouping voice signal, executed using each grouping voice signal as first multi-channel sound signal Step C) to F).
3. the method as described in claim 1, which is characterized in that described by first frequency-region signal or first subband Signal is divided into before different time-frequency subbands, further includes:
Calculate the second statistical property of first multi-channel sound signal;According to second statistical property, by described first Multi-channel sound signal is divided into multiple grouping voice signals;
For each grouping voice signal, executed using each grouping voice signal as first multi-channel sound signal Step B) to F).
4. the method as described in claim 1, which is characterized in that it is described to use time-frequency conversion, by the first multi-channel sound signal Be mapped as the first frequency-region signal, or use sub-band filter, by the first multi-channel sound signal be mapped as the first subband signal it Before, further includes:
Calculate the third statistical property of first multi-channel sound signal;According to the third statistical property, by described first Multi-channel sound signal is divided into multiple grouping voice signals;
For each grouping voice signal, executed using each grouping voice signal as first multi-channel sound signal Step A) to F).
5. the method as described in claim 1, which is characterized in that the perceptual coding is specially to be classified perceptual coding.
6. a kind of multi-channel sound signal code device, which is characterized in that described device includes:
Time-frequency map unit is used to use time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or Using sub-band filter, the first multi-channel sound signal is mapped as the first subband signal;By first frequency-region signal or described First subband signal is divided into different time-frequency subbands;
Adaptive subspace mapping unit, for each time-frequency in the different time-frequency subbands that the time-frequency map unit divides In subband, the first statistical property of first multi-channel sound signal is calculated;
According to first statistical property, one of mapping model is adaptively switched between multiple and different mapping models, it will The mapping model is as optimization subspace mapping model;The optimization subspace is adaptively adjusted according to first statistical property The mapping coefficient of mapping model;
Using the optimization subspace mapping model, first multi-channel sound signal is mapped as the second multi-channel sound letter Number;
Perceptual coding unit maps the adaptive subspace mapping unit for the difference according to time, frequency and sound channel The second multi-channel sound signal at least one set and optimization subspace mapping model progress perceptual coding, and be multiplexed into Encoded multi-channel code stream.
7. device as claimed in claim 6, which is characterized in that further include:
First sound channel grouped element, for each time-frequency in the adaptive subspace mapping unit in different time-frequency subbands In subband, before the first statistical property for calculating first multi-channel sound signal, divided in the time-frequency map unit In each time-frequency subband in different time-frequency subbands, the second statistical property of first multi-channel sound signal is calculated;According to First multi-channel sound signal is divided into multiple grouping voice signals by second statistical property;
The adaptive subspace mapping unit and the perceptual coding unit are specifically used for, and are grouped for first sound channel single Each grouping voice signal that member divides, carries out using each grouping voice signal as first multi-channel sound signal Processing.
8. device as claimed in claim 6, which is characterized in that further include:
Second sound channel grouped element, for using time-frequency conversion in the time-frequency map unit, by the first multi-channel sound signal Be mapped as the first frequency-region signal, or use sub-band filter, by the first multi-channel sound signal be mapped as the first subband signal it Before, calculate the third statistical property of first multi-channel sound signal;According to the third statistical property, more than described first Channel sound signal is divided into multiple grouping voice signals;
The time-frequency map unit, the adaptive subspace mapping unit and the perceptual coding unit are specifically used for, for Each grouping voice signal that the second sound channel grouped element divides, using each grouping voice signal as described first Multi-channel sound signal is handled.
9. device as claimed in claim 6, which is characterized in that the perceptual coding in the perceptual coding unit is specially to be classified Perceptual coding.
10. a kind of multi-channel sound signal coding/decoding method, which is characterized in that the described method includes:
A) encoded multi-channel code stream is decoded, obtains at least one set in the second multi-channel sound signal and optimization subspace Mapping model;
B the optimization subspace mapping model) is used, second multi-channel sound signal is mapped back into the first multi-channel sound Signal;
C inverse time-frequency conversion) is used, first multi-channel sound signal is mapped as time domain from frequency domain, or using inverse First multi-channel sound signal is mapped as time domain from subband domain by sub-band filter;
Wherein, the optimization subspace mapping model is according to the first statistical property, between multiple and different mapping models adaptively It is switched to one of mapping model, using the mapping model as optimization subspace mapping model;It is special according to first statistics Property adaptively adjust it is described optimization subspace mapping model mapping coefficient;
First statistical property is to use time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or Person uses sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal;
First frequency-region signal or first subband signal are divided into different time-frequency subbands;
In each time-frequency subband in the different time-frequency subbands, the first statistics of first multi-channel sound signal is calculated Characteristic.
11. method as claimed in claim 10, which is characterized in that first multi-channel sound signal is multiple grouping sound Signal;Described using inverse time-frequency conversion, first multi-channel sound signal is mapped as time domain from frequency domain, or use First multi-channel sound signal is mapped as before time domain by inverse sub-band filter from subband domain, further includes:
The multiple grouping voice signal is grouped recovery, obtains third multi-channel sound signal;
Step C is executed using the third multi-channel sound signal as first multi-channel sound signal).
12. method as claimed in claim 10, which is characterized in that first multi-channel sound signal is multiple points in time domain Group voice signal;Described using inverse time-frequency conversion, first multi-channel sound signal is mapped as time domain from frequency domain, or Person uses inverse sub-band filter, and first multi-channel sound signal is mapped as after time domain from subband domain, further includes:
The multiple grouping voice signal is grouped recovery, obtains the 4th multi-channel sound signal.
13. method as claimed in claim 10, which is characterized in that it is described that encoded multi-channel code stream is decoded, obtain the Before at least one set and optimization subspace mapping model in two multi-channel sound signals, further includes:
Demultiplexing process is carried out to encoded multi-channel code stream, obtains multiple layered code streams;
Using each layered code stream as encoded multi-channel code stream, step A is executed);
When being carried out step A to whole layered code streams) after, then the step B that seeks unity of action) and step C).
14. a kind of multi-channel sound signal decoding apparatus, which is characterized in that described device includes:
Decoding unit is perceived, for being decoded to encoded multi-channel code stream, is obtained in the second multi-channel sound signal at least One group and optimization subspace mapping model;
Subspace inverse mapping unit, the optimization subspace mapping model for being obtained using the perception decoding unit, will be described The second multi-channel sound signal that perception decoding unit obtains maps back the first multi-channel sound signal;
Map unit when frequency, for using inverse time-frequency conversion, the first multichannel that the subspace inverse mapping unit is obtained Voice signal is mapped as time domain from frequency domain, or uses inverse sub-band filter, by first multi-channel sound signal from subband Domain mapping is time domain;
Wherein, the optimization subspace mapping model is according to the first statistical property, between multiple and different mapping models adaptively It is switched to one of mapping model, using the mapping model as optimization subspace mapping model;It is special according to first statistics Property adaptively adjust it is described optimization subspace mapping model mapping coefficient;
First statistical property is to use time-frequency conversion, the first multi-channel sound signal is mapped as the first frequency-region signal, or Person uses sub-band filter, and the first multi-channel sound signal is mapped as the first subband signal;
First frequency-region signal or first subband signal are divided into different time-frequency subbands;
In each time-frequency subband in the different time-frequency subbands, the first statistics of first multi-channel sound signal is calculated Characteristic.
15. device as claimed in claim 14, which is characterized in that the first multichannel that the subspace inverse mapping unit obtains Voice signal is multiple grouping voice signals, described device further include:
First grouping restoration unit, in the frequency map unit use inverse time-frequency conversion, reflect the subspace is inverse It penetrates the first multi-channel sound signal that unit obtains and is mapped as time domain from frequency domain, or use inverse sub-band filter, by described the Before one multi-channel sound signal is mapped as time domain from subband domain, the multiple grouping voice signal is grouped recovery, is obtained Obtain third multi-channel sound signal;
Map unit is specifically used for when the frequency, and the third multi-channel sound signal that the first grouping restoration unit obtains is made It is handled for first multi-channel sound signal.
16. device as claimed in claim 14, which is characterized in that map unit carries out mapping treated first when the frequency Multi-channel sound signal is multiple grouping voice signals, described device in time domain further include:
Second packet restoration unit, in the frequency map unit use inverse time-frequency conversion, reflect the subspace is inverse It penetrates the first multi-channel sound signal that unit obtains and is mapped as time domain from frequency domain, or use inverse sub-band filter, by described the After one multi-channel sound signal is mapped as time domain from subband domain, the multiple grouping voice signal is grouped recovery, is obtained Obtain the 4th multi-channel sound signal.
17. device as claimed in claim 14, which is characterized in that described device further include:
Demultiplexing unit is decoded encoded multi-channel code stream for the perception decoding unit, obtains the second multichannel sound Before at least one set and optimization subspace mapping model in sound signal, demultiplexing process is carried out to encoded multi-channel code stream, is obtained Obtain multiple layered code streams;
Map unit is specifically used for when the perception decoding unit, the subspace inverse mapping unit and the frequency, by the solution Each layered code stream that Multiplexing Unit obtains is handled as encoded multi-channel code stream.
CN201410395806.5A 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device Active CN105336333B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410395806.5A CN105336333B (en) 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device
PCT/CN2014/095396 WO2016023323A1 (en) 2014-08-12 2014-12-29 Multichannel acoustic signal encoding method, decoding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410395806.5A CN105336333B (en) 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device

Publications (2)

Publication Number Publication Date
CN105336333A CN105336333A (en) 2016-02-17
CN105336333B true CN105336333B (en) 2019-07-05

Family

ID=55286819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410395806.5A Active CN105336333B (en) 2014-08-12 2014-08-12 Multi-channel sound signal coding method, coding/decoding method and device

Country Status (2)

Country Link
CN (1) CN105336333B (en)
WO (1) WO2016023323A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108461086B (en) * 2016-12-13 2020-05-15 北京唱吧科技股份有限公司 Real-time audio switching method and device
CN108206022B (en) * 2016-12-16 2020-12-18 南京青衿信息科技有限公司 Codec for transmitting three-dimensional acoustic signals by using AES/EBU channel and coding and decoding method thereof
CN115132214A (en) 2018-06-29 2022-09-30 华为技术有限公司 Coding method, decoding method, coding device and decoding device for stereo signal
TWI692719B (en) * 2019-03-21 2020-05-01 瑞昱半導體股份有限公司 Audio processing method and audio processing system
CN111599375B (en) * 2020-04-26 2023-03-21 云知声智能科技股份有限公司 Whitening method and device for multi-channel voice in voice interaction
CN111682881B (en) * 2020-06-17 2021-12-24 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN113873420B (en) * 2021-09-28 2023-06-23 联想(北京)有限公司 Audio data processing method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1647158A (en) * 2002-04-10 2005-07-27 皇家飞利浦电子股份有限公司 Coding of stereo signals
CN101133441A (en) * 2005-02-14 2008-02-27 弗劳恩霍夫应用研究促进协会 Parametric joint-coding of audio sources
CN101401152A (en) * 2006-03-15 2009-04-01 法国电信公司 Device and method for encoding by principal component analysis a multichannel audio signal
CN101490744A (en) * 2006-11-24 2009-07-22 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
CN101667425A (en) * 2009-09-22 2010-03-10 山东大学 Method for carrying out blind source separation on convolutionary aliasing voice signals
CN102682779A (en) * 2012-06-06 2012-09-19 武汉大学 Double-channel encoding and decoding method for 3D audio frequency and codec
CN102918588A (en) * 2010-03-29 2013-02-06 弗兰霍菲尔运输应用研究公司 A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7240001B2 (en) * 2001-12-14 2007-07-03 Microsoft Corporation Quality improvement techniques in an audio encoder
JP2007003702A (en) * 2005-06-22 2007-01-11 Ntt Docomo Inc Noise eliminator, communication terminal, and noise eliminating method
CN103366749B (en) * 2012-03-28 2016-01-27 北京天籁传音数字技术有限公司 A kind of sound codec devices and methods therefor
CN103366751B (en) * 2012-03-28 2015-10-14 北京天籁传音数字技术有限公司 A kind of sound codec devices and methods therefor
US9396732B2 (en) * 2012-10-18 2016-07-19 Google Inc. Hierarchical deccorelation of multichannel audio
CN103077709B (en) * 2012-12-28 2015-09-09 中国科学院声学研究所 A kind of Language Identification based on total distinctive subspace mapping and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1647158A (en) * 2002-04-10 2005-07-27 皇家飞利浦电子股份有限公司 Coding of stereo signals
CN101133441A (en) * 2005-02-14 2008-02-27 弗劳恩霍夫应用研究促进协会 Parametric joint-coding of audio sources
CN101401152A (en) * 2006-03-15 2009-04-01 法国电信公司 Device and method for encoding by principal component analysis a multichannel audio signal
CN101490744A (en) * 2006-11-24 2009-07-22 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
CN101667425A (en) * 2009-09-22 2010-03-10 山东大学 Method for carrying out blind source separation on convolutionary aliasing voice signals
CN102918588A (en) * 2010-03-29 2013-02-06 弗兰霍菲尔运输应用研究公司 A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal
CN102682779A (en) * 2012-06-06 2012-09-19 武汉大学 Double-channel encoding and decoding method for 3D audio frequency and codec

Also Published As

Publication number Publication date
CN105336333A (en) 2016-02-17
WO2016023323A1 (en) 2016-02-18

Similar Documents

Publication Publication Date Title
CN105336333B (en) Multi-channel sound signal coding method, coding/decoding method and device
US11735192B2 (en) Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
KR102219752B1 (en) Apparatus and method for estimating time difference between channels
ES2771200T3 (en) Postprocessor, preprocessor, audio encoder, audio decoder and related methods to improve transient processing
CN103903626B (en) Sound encoding device, audio decoding apparatus, voice coding method and tone decoding method
CN110047496B (en) Stereo audio encoder and decoder
TWI714046B (en) Apparatus, method or computer program for estimating an inter-channel time difference
Chen et al. Spatial parameters for audio coding: MDCT domain analysis and synthesis
WO2017206794A1 (en) Method and device for extracting inter-channel phase difference parameter
Gorlow et al. Informed separation of spatial images of stereo music recordings using second-order statistics
CN105336334B (en) Multi-channel sound signal coding method, decoding method and device
CN103489450A (en) Wireless audio compression and decompression method based on time domain aliasing elimination and equipment thereof
Gorlow et al. Informed separation of spatial images of stereo music recordings using low-order statistics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant