CN102682779B

CN102682779B - Double-channel encoding and decoding method for 3D audio frequency and codec

Info

Publication number: CN102682779B
Application number: CN2012101839630A
Authority: CN
Inventors: 胡瑞敏; 董石; 郑翔; 涂卫平; 杨玉红; 王晓晨; 高戈; 刘梦颖
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2012-06-06
Filing date: 2012-06-06
Publication date: 2013-07-24
Anticipated expiration: 2032-06-06
Also published as: CN102682779A

Abstract

The present invention provides a 3D audio-oriented binaural codec method and a codec. On the basis of the 3D audio-oriented binaural technology, the present invention uses more coding energy for coding according to the auditory characteristics of the human ear. In the main component of the system, different coding methods are used to encode different audio signals, and then a 3D audio-oriented two-channel codec method and corresponding codec are proposed. The method of the invention can reduce codec noise, make the reconstructed signal have a higher signal-to-noise ratio, and can better simulate 3D audio signals at the same time.

Description

Two-channel codec method and codec for 3D audio

技术领域 technical field

本发明涉及音频压缩技术领域，尤其涉及了一种面向3D音频的双声道编解码方法和编解码器。The invention relates to the technical field of audio compression, in particular to a 3D audio-oriented binaural codec method and a codec.

背景技术 Background technique

随着新世纪信息技术的迅猛发展，音频压缩技术得到的广泛应用。如今的3D音频技术，如5.1声道、7.1声道，甚至更多用于音效渲染的渠道越来越流行。多声道音频能够提供更加身临其境的真实听觉效果。但随着音频通道的不断增加，编码所产生的比特率也在线性增加，因而就需要更多的音频录制空间和更多的实时传输带宽，于是许多高效的编码技术应运而生，如下混和参数立体声编码。而针对上述技术同时也产生了许多立体声编码的音频编解码器，如PS、EAAC+、MPEG-Surround以及基于PCA的立体声音频编解码器等。在多声源、多方向的情况下，传统的音频编解码器的编解码结果并不能表现出更好的主观及客观音质。With the rapid development of information technology in the new century, audio compression technology has been widely used. Today's 3D audio technologies, such as 5.1 channels, 7.1 channels, and even more channels for sound rendering are becoming more and more popular. Multi-channel audio provides a more immersive and realistic listening experience. However, with the continuous increase of audio channels, the bit rate generated by encoding is also increasing linearly, thus requiring more audio recording space and more real-time transmission bandwidth, so many efficient encoding techniques have emerged as the times require, as follows Mixing parameters Stereo encoding. Meanwhile, many audio codecs for stereo encoding have also been produced for the above-mentioned technologies, such as PS, EAAC+, MPEG-Surround, and stereo audio codecs based on PCA. In the case of multiple sound sources and multiple directions, the codec results of traditional audio codecs cannot show better subjective and objective sound quality.

发明内容 Contents of the invention

为进一步提高音频编解码质量、降低编解码噪音、增强主观和客观音质，本发明提出了一种面向3D音频的双声道编解码方法和编解码器。In order to further improve audio codec quality, reduce codec noise, and enhance subjective and objective sound quality, the present invention proposes a 3D audio-oriented two-channel codec method and codec.

为解决上述技术问题，本发明采用如下的技术方案：In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

一、一种面向3D音频的双声道编码方法，包括步骤：One, a kind of binaural coding method for 3D audio, comprising steps:

S1.1对输入的双声道信号分别进行时频变换，将时域上的双声道信号转换成频域上的双声道信号；S1.1 performs time-frequency transformation on the input binaural signals respectively, and converts the binaural signals in the time domain into binaural signals in the frequency domain;

S1.2、对所述的频域上的双声道信号分别进行子带划分，得到双声道子带信号；S1.2. Dividing the binaural signals in the frequency domain into subbands respectively to obtain binaural subband signals;

S1.3、分别采用基于频域主成份和基于极坐标主成份的参数编码方法对所述的双声道子带信号逐一进行编码，以得到各双声道子带信号在上述两种编码方法下所产生的编码噪音能量；S1.3, using the parameter coding method based on the frequency domain principal component and the polar coordinate principal component to encode the two-channel sub-band signals one by one, so as to obtain each two-channel sub-band signal in the above two coding methods The encoding noise energy generated under

所述的采用基于极坐标主成份的参数编码方法对所述的双声道子带信号进行编码所得到的编码噪音能量

ε2.k为第k个双声道子带信号的编码噪音能量，ρ_k(j)为第k个双声道子带信号中第j个频点的信号幅值，

L_k(j)、R_k(j)分别为第k个左声道子带信号和右声道子带信号中第j个频点的信号，n为第k个双声道子带信号中频点的数量；The encoding noise energy obtained by encoding the two-channel subband signal by using the parameter encoding method based on the polar coordinate principal component

ε2.k is the coding noise energy of the kth binaural subband signal, ρ _k (j) is the signal amplitude of the jth frequency point in the kth binaural subband signal,

L _k (j), R _k (j) are the signals of the jth frequency point in the kth left channel subband signal and the right channel subband signal respectively, and n is the intermediate frequency of the kth two-channel subband signal the number of points;

S1.4、针对各双声道子带信号，选择较小编码噪音能量所对应的参数编码方法对该双声道子带信号进行进一步编码，若噪音能量相等，则选择基于频域主成分的参数编码方法对该双声道子带信号进行进一步编码；若采用基于频域主成分的参数编码方法进行进一步编码，则输出双声道子带信号的编码主成分序列、方向角以及噪音能量比；若采用基于极坐标主成分的参数编码方法进行进一步编码，则输出双声道子带信号的编码主成分序列、旋转半径以及噪音能量比；S1.4. For each binaural subband signal, select the parameter encoding method corresponding to the smaller encoding noise energy to further encode the binaural subband signal. If the noise energy is equal, select the parameter encoding method based on the frequency domain principal component The parametric encoding method further encodes the binaural subband signal; if the parameter encoding method based on the frequency domain principal component is used for further encoding, the encoded principal component sequence, direction angle and noise energy ratio of the binaural subband signal are output ; If the parameter encoding method based on the polar coordinate principal component is used for further encoding, the encoded principal component sequence, rotation radius and noise energy ratio of the binaural subband signal are output;

所述的采用基于极坐标主成分的参数编码方法所得到的编码主成分序列为：The encoded principal component sequence obtained by adopting the parameter encoding method based on the polar coordinate principal component is:

PC_k＝{PC_k(j)|j＝1,2,...,n}PC _k ＝{PC _k (j)|j＝1,2,...,n}

其中，PC_k为第k个双声道子带信号的主成分序列，PC_k(j)为第k个双声道子带信号中第j个频点的主要成分，

表示第k个双声道子带信号中第j个频点的方向角，

L_k(j)、R_k(j)分别为第k个左声道子带信号和右声道子带信号中第j个频点的信号，n为编号为k的子带中频点的数量；Among them, PC _k is the principal component sequence of the kth binaural subband signal, PC _k (j) is the main component of the jth frequency point in the kth binaural subband signal,

Indicates the direction angle of the jth frequency point in the kth binaural subband signal,

L _k (j), R _k (j) are the signals of the jth frequency point in the kth left channel subband signal and the right channel subband signal respectively, and n is the number of mid-frequency points in the subband numbered k ;

所述的采用基于极坐标主成分的参数编码方法所得到的旋转半径为：The radius of rotation obtained by using the parameter encoding method based on the polar coordinate principal component is:

${\overset{&OverBar; &OverBar;}{ρ ρ}}_{k k} = = \frac{{Σ Σ}_{j j = = 11}^{n no} \sqrt{{L L}_{k k}^{22} ((j j)) + + {R R}_{k k}^{11} ((j j))}}{n no}$

其中，

为第k个双声道子带信号的旋转半径，L_k(j)、R_k(j)分别为第k个左声道子带信号和右声道子带信号中第j个频点的信号，n为第k个双声道子带信号中频点的数量；in,

is the radius of rotation of the kth binaural subband signal, L _k (j) and R _k (j) are the jth frequency points in the kth left channel subband signal and the right channel subband signal respectively signal, n is the number of intermediate frequency points of the kth binaural subband signal;

所述的采用基于极坐标主成分的参数编码方法所得到的噪音能量比为：The noise energy ratio obtained by adopting the parameter coding method based on the polar coordinate principal component is:

$PAR PAR = = \frac{{π π}^{22}}{4848 {Σ Σ}_{j j = = 11}^{n no} {[[{ρ ρ}_{k k} ((j j)) - - \frac{11}{n no} {Σ Σ}_{j j = = 11}^{n no} {ρ ρ}_{k k} ((j j))]]}^{22}}$

其中，ρ_k(j)为第k个双声道子带信号中第j个频点的信号幅值，

L_k(j)、R_k(j)分别为第k个左声道子带信号和右声道子带信号中第j个频点的信号，n为第k个双声道子带信号中频点的数量；Among them, ρ _k (j) is the signal amplitude of the jth frequency point in the kth binaural subband signal,

S1.5、对所述的编码主成分序列进行下混，得到下混信号；S1.5. Downmixing the coded principal component sequence to obtain a downmix signal;

S1.6、采用核心编码器对所述的下混信号进行编码，得到编码码流，并将所述的方向角或旋转半径、和噪音能量比写入编码码流。S1.6. Encode the downmix signal by using the core encoder to obtain an encoded code stream, and write the direction angle or rotation radius and noise energy ratio into the encoded code stream.

二、一种面向3D音频的双声道编码器，包括：Two, a kind of binaural encoder for 3D audio, comprising:

时频变换模块，用来对输入的双声道信号分别进行时频变换，将时域上的双声道信号转换成频域上的双声道信号；A time-frequency conversion module is used to perform time-frequency conversion on the input binaural signals respectively, and convert the binaural signals in the time domain into binaural signals in the frequency domain;

子带划分模块，用来对所述的频域上的双声道信号分别进行子带划分，得到双声道子带信号；The sub-band division module is used to divide the binaural signals in the frequency domain into sub-bands respectively to obtain binaural sub-band signals;

编码噪音能量计算模块，用来分别采用基于频域主成份和基于极坐标主成份的参数编码方法对所述的双声道子带信号逐一进行编码，以得到各双声道子带信号在上述两种编码方法下所产生的编码噪音能量；所述的采用基于极坐标主成份的参数编码方法对所述的双声道子带信号进行编码所得到的编码噪音能量

ε_2.k为第k个双声道子带信号的编码噪音能量，ρ_k(j)为第k个双声道子带信号中第j个频点的信号幅值，

L_k(j)、R_k(j)分别为第k个左声道子带信号和右声道子带信号中第j个频点的信号，n为第k个双声道子带信号中频点的数量；The coding noise energy calculation module is used to encode the two-channel sub-band signals one by one by adopting the parameter coding method based on the frequency-domain principal component and the polar-coordinate principal component, so as to obtain each two-channel sub-band signal in the above-mentioned Encoding noise energy produced under two encoding methods; the encoding noise energy obtained by encoding the two-channel subband signal by using the parameter encoding method based on polar coordinate principal components

ε _2.k is the coding noise energy of the kth binaural subband signal, ρ _k (j) is the signal amplitude of the jth frequency point in the kth binaural subband signal,

参数编码模块，用来针对各双声道子带信号，选择较小编码噪音能量所对应的参数编码方法对该双声道子带信号进行进一步编码，若噪音能量相等，则选择基于频域主成分的参数编码方法对该双声道子带信号进行进一步编码；若采用基于频域主成分的参数编码方法进行进一步编码，则输出双声道子带信号的编码主成分序列、方向角以及噪音能量比；若采用基于极坐标主成分的参数编码方法进行进一步编码，则输出双声道子带信号的编码主成分序列、旋转半径以及噪音能量比；The parameter coding module is used for each binaural sub-band signal, and selects the parameter coding method corresponding to the smaller coding noise energy to further encode the binaural sub-band signal. The parametric coding method of the component is used to further encode the binaural subband signal; if the parameter coding method based on the frequency domain principal component is used for further coding, the encoded principal component sequence, direction angle and noise of the binaural subband signal are output Energy ratio; if the parameter encoding method based on the polar coordinate principal component is used for further encoding, the encoded principal component sequence, rotation radius and noise energy ratio of the binaural subband signal are output;

PC_k＝{PC_k(j)|j＝1,2,...,n}PC _k ＝{PC _k (j)|j＝1,2,...,n}

表示第k个双声道子带信号中第j个频点的方向角，

其中，

下混模块，用来对所述的编码主成分序列进行下混，得到下混信号；A downmix module, used to downmix the coded principal component sequence to obtain a downmix signal;

核心编码器，用来对所述的下混信号进行编码，得到编码码流，并将所述的方向角或旋转半径、和噪音能量比写入编码码流。The core encoder is used to encode the downmix signal to obtain a coded code stream, and write the direction angle or rotation radius and noise energy ratio into the coded code stream.

三、一种面向3D音频的双声道解码方法，包括步骤：Three, a kind of binaural decoding method for 3D audio, comprising steps:

S2.1采用核心解码器对编码码流进行解码，得到解码信号；S2.1 Use the core decoder to decode the encoded code stream to obtain the decoded signal;

S2.2对所述的解码信号进行子带划分，得到解码子带信号；S2.2 Divide the decoded signal into subbands to obtain decoded subband signals;

S2.3采用与编码所用参数编码方法相应的参数解码方法、并结合编码码流中的方向角或旋转半径、噪音能力比对所述的解码子带信号进行解码，得到重建的频域子带信号；S2.3 Use the parameter decoding method corresponding to the parameter encoding method used for encoding, and combine the direction angle or rotation radius in the encoded code stream, and the noise capability ratio to decode the decoded subband signal to obtain the reconstructed frequency domain subband Signal;

S2.4合并所述的重建的频域子带信号得到重建的频域信号；S2.4 Merging the reconstructed frequency-domain sub-band signals to obtain a reconstructed frequency-domain signal;

S2.5对所述的频域信号进行时频逆变换，将频域信号转换成时域信号，恢复出重建的音频信号。S2.5 Perform time-frequency inverse transformation on the frequency domain signal, convert the frequency domain signal into a time domain signal, and restore the reconstructed audio signal.

上述的参数解码方法为基于频域主成份的参数解码方法或基于极坐标主成份的参数解码方法。The above parameter decoding method is a parameter decoding method based on frequency domain principal components or a parameter decoding method based on polar coordinate principal components.

所述的利用基于频域主成份的参数解码方法对所述的解码子带信号进行解码，得到重建的频域子带信号，具体为：根据编码码流中的噪音能量比，产生一个与原始信号具有相同能量的白噪声，结合编码码流中的主要成分序列和方向角，将所述的解码子带信号进行恢复，得到重建的频域子带信号。The described decoded subband signal is decoded by using the parameter decoding method based on the frequency domain principal component to obtain the reconstructed frequency domain subband signal, specifically: according to the noise energy ratio in the coded code stream, a The signal has white noise with the same energy, combined with the main component sequence and direction angle in the encoded code stream, the decoded sub-band signal is restored to obtain the reconstructed frequency-domain sub-band signal.

所述的利用基于极坐标主成份的参数解码方法对所述的解码子带信号进行解码，得到重建的频域子带信号，具体为：根据编码码流中的噪音能量比，产生一个与原始信号具有相同能量的白噪声，结合编码码流中的主要成分序列和旋转半径，将所述的解码子带信号进行恢复，得到重建的频域子带信号。The described decoded subband signal is decoded by using the parametric decoding method based on the polar coordinate principal component to obtain a reconstructed frequency domain subband signal, specifically: according to the noise energy ratio in the coded code stream, a The signal has white noise with the same energy, combined with the main component sequence and rotation radius in the encoded code stream, the decoded sub-band signal is restored to obtain the reconstructed frequency-domain sub-band signal.

四、一种面向3D音频的双声道解码器，包括：Four, a kind of two-channel decoder facing 3D audio, comprising:

核心解码器，用来对编码码流进行解码，得到解码信号；The core decoder is used to decode the encoded code stream to obtain the decoded signal;

子带划分模块，用来对所述的解码信号进行子带划分，得到解码子带信号；A subband division module, used to divide the decoded signal into subbands to obtain decoded subband signals;

参数解码模块，用来采用与编码所用参数编码方法相应的参数解码方法、并结合编码码流中的方向角或旋转半径、噪音能力比对所述的解码子带信号进行解码，得到重建的频域子带信号；The parameter decoding module is used to decode the decoded sub-band signal by using a parameter decoding method corresponding to the parameter encoding method used for encoding, and combining the direction angle or rotation radius and noise capability ratio in the encoded code stream to obtain the reconstructed frequency domain subband signal;

子带合并模块，用来合并所述的重建的频域子带信号得到重建的频域信号；A subband combining module, used to combine the reconstructed frequency domain subband signals to obtain a reconstructed frequency domain signal;

时频逆变换模块，用来对所述的频域信号进行时频逆变换，将频域信号转换成时域信号，恢复出重建的音频信号。The time-frequency inverse transform module is used to perform time-frequency inverse transform on the frequency domain signal, convert the frequency domain signal into a time domain signal, and restore the reconstructed audio signal.

上述参数解码模块进一步包括基于频域主成份的参数解码模块和基于极坐标主成份的参数解码模块。The above parameter decoding module further includes a parameter decoding module based on frequency domain principal components and a parameter decoding module based on polar coordinate principal components.

所述的基于频域主成份的参数解码模块，用来根据编码码流中的噪音能量比，产生一个与原始信号具有相同能量的白噪声，结合编码码流中的主要成分序列和方向角，将所述的解码子带信号进行恢复，得到重建的频域子带信号。The parameter decoding module based on frequency-domain principal components is used to generate a white noise with the same energy as the original signal according to the noise energy ratio in the coded code stream, combined with the main component sequence and direction angle in the coded code stream, The decoded sub-band signal is restored to obtain a reconstructed frequency-domain sub-band signal.

所述的基于极坐标主成份的参数解码模块，用来根据编码码流中的噪音能量比，产生一个与原始信号具有相同能量的白噪声，结合编码码流中的主要成分序列和旋转半径，将所述的解码子带信号进行恢复，得到重建的频域子带信号。The parameter decoding module based on the polar coordinate principal component is used to generate a white noise having the same energy as the original signal according to the noise energy ratio in the encoded code stream, combined with the principal component sequence and the radius of rotation in the encoded code stream, The decoded sub-band signal is restored to obtain a reconstructed frequency-domain sub-band signal.

本发明在面向3D音频的双声道技术基础上，根据人耳听觉特性，将更多的编码能量用于编码的主成分中，并针对不同的音频信号采用不同的编码方法进行编码，进而提出一种面向3D音频的双声道编解码方法及相应的编解码器。本发明方法能降低编解码噪音，使重建信号具有更高的信噪比，同时能更好的模拟3D音频信号。Based on the 3D audio-oriented binaural technology, the present invention uses more coding energy in the main component of coding according to the auditory characteristics of the human ear, and uses different coding methods for different audio signals to code, and then proposes A binaural codec method for 3D audio and a corresponding codec. The method of the invention can reduce codec noise, make the reconstructed signal have a higher signal-to-noise ratio, and can better simulate 3D audio signals at the same time.

附图说明 Description of drawings

图1是本发明编码方法的流程图；Fig. 1 is the flow chart of coding method of the present invention;

图2是本发明解码方法的流程图；Fig. 2 is the flowchart of decoding method of the present invention;

图3是本发明编码方法中的子带划分的流程图；Fig. 3 is the flowchart of subband division in the encoding method of the present invention;

图4是本发明编码方法中编码方法选择的流程图；Fig. 4 is the flowchart of coding method selection in the coding method of the present invention;

图5是本发明的基于极坐标主成分的参数编码方法示意图；Fig. 5 is a schematic diagram of the parameter encoding method based on the polar coordinate principal component of the present invention;

图6是本发明解码方法中解码方法选择的流程图；Fig. 6 is a flowchart of decoding method selection in the decoding method of the present invention;

图7是本发明解码方法中的参数解码的流程图。Fig. 7 is a flowchart of parameter decoding in the decoding method of the present invention.

具体实施方式 Detailed ways

本发明提出了一种面向3D音频的双声道编码方法，以及相应的双声道解码方法，具体实施时，可以由本领域技术人员采用计算机软件手段根据所提供技术方案实现音频自动编解码。由于在编解码应用中，往往还可以将编解码软件方法固化形成编解码装置，所以，本发明还提供了相应的面向3D音频的双声道编码器和解码器。The present invention proposes a binaural encoding method for 3D audio and a corresponding binaural decoding method. During specific implementation, those skilled in the art can use computer software means to realize automatic audio encoding and decoding according to the provided technical solution. Since in the codec application, the codec software method can often be solidified to form a codec device, so the present invention also provides a corresponding 3D audio-oriented two-channel encoder and decoder.

以下将结合附图对本发明的具体实施方式做详细说明，以使本发明的技术方案和有益效果更为清楚。The specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings, so as to make the technical solutions and beneficial effects of the present invention clearer.

本发明中为了采用基于频域主成分的参数编码方法来分析空间音频信号，编码方案中利用最小均方误差（MMSE）将两个声道合并为一个声道，然后只有这一个声道被核心编码器所编码。在解码时，利用方向角、主要成分和次要成分的环境噪音能量比（PAR）进行信号的重建，其中，环境噪音产生一个类似于原始信号能量的白噪声来模拟原始信号。但对于3D的多声道信号，子带划分时，一些子带是由小的均匀的子带合并而来，其中包含了很多左右声道能量比有差异的子带。由于这些子带能更好的模拟多个不同方向的声源，因此在基于频域主成分的参数编码方式中，只用一个方向角和PAR来传输下混的信道并不合理。针对上述问题，本发明提出了一种基于极坐标的参数编码方法，在极坐标中进行主要成分和次要成分的参数编码，以旋转半径和PAR来进行信号的重建，来更好的模拟3D音频信号，使其有更高的信噪比。In the present invention, in order to use the parameter coding method based on the frequency domain principal component to analyze the spatial audio signal, the minimum mean square error (MMSE) is used in the coding scheme to combine the two channels into one channel, and then only this one channel is used by the core encoded by the encoder. During decoding, the signal is reconstructed using the orientation angle, the ambient noise power ratio (PAR) of the principal component and the secondary component, where the ambient noise produces a white noise similar to the energy of the original signal to simulate the original signal. However, for a 3D multi-channel signal, when sub-bands are divided, some sub-bands are merged from small uniform sub-bands, which include many sub-bands with different energy ratios between left and right channels. Since these subbands can better simulate multiple sound sources in different directions, it is unreasonable to use only one direction angle and PAR to transmit the downmixed channel in the parametric coding method based on the frequency domain principal component. In view of the above problems, the present invention proposes a parameter encoding method based on polar coordinates, in which the parameter encoding of the main component and the secondary component is performed in polar coordinates, and the signal is reconstructed with the radius of rotation and PAR to better simulate 3D audio signal to have a higher signal-to-noise ratio.

本发明的面向3D音频的双声道编码方法，具体流程图参见图1，包括如下步骤：The two-channel encoding method for 3D audio of the present invention, the specific flow chart is referring to Fig. 1, comprises the following steps:

步骤1.1，对输入的双声道信号分别进行时频变换，将时域上的双声道信号转换成频域上的双声道信号Step 1.1, perform time-frequency transformation on the input binaural signal respectively, and convert the binaural signal in the time domain into a binaural signal in the frequency domain

双声道信号由左声道信号l和右声道信号r组成，本步骤的具体实施为：采用快速傅里叶变换（FFT）将时域上的左声道信号l和右声道信号r分别转换成频域上的左声道信号L和右声道信号R。The two-channel signal is composed of the left channel signal l and the right channel signal r. The specific implementation of this step is: using the fast Fourier transform (FFT) to transform the left channel signal l and the right channel signal r in the time domain They are respectively converted into the left channel signal L and the right channel signal R in the frequency domain.

步骤1.2，对频域上的左声道信号L和右声道信号R进行子带划分，得到左、右声道子带信号，图3为本步骤的一种具体实施的流程图。In step 1.2, divide the left channel signal L and the right channel signal R into subbands in the frequency domain to obtain left and right channel subband signals. FIG. 3 is a flow chart of a specific implementation of this step.

本步骤的具体实施为：The specific implementation of this step is:

采用基于等效矩形带宽（ERB）的划分方法将频域上的左声道信号L和右声道信号R分别划分为64个子带，再根据人耳听觉特性和编码器的需求，分别对左声道信号L和右声道信号R的子带进行合并或再细分、或既进行合并又进行再细分，得到最终的左声道子带信号和右声道信号。The left channel signal L and the right channel signal R in the frequency domain are divided into 64 subbands by using the division method based on the equivalent rectangular bandwidth (ERB). The subbands of the channel signal L and the right channel signal R are combined or subdivided, or both are combined and subdivided to obtain the final subband signal of the left channel and the right channel signal.

由于人耳对低频的声音比较敏感，而对高频的声音的感知较差，因此，可对左声道信号L和右声道信号R的64个子带进行进一步处理：可以对其中的低频子带进行再细分，或对高频子带进行合并，或者既对低频子带进行再细分又对高频子带进行合并。在本具体实施中将64个子带信号中的3个低频子带再细分为16个子带，将61个高频子带合并为4个子带，最终得到20个子带信号，以下的操作就是针对所得的20个子带信号来进行。上述的低频和高频的范围，是在具体实施时，根据需要人为来规定的。Since the human ear is more sensitive to low-frequency sounds, but less sensitive to high-frequency sounds, the 64 sub-bands of the left channel signal L and the right channel signal R can be further processed: the low-frequency sub-bands can be processed Subdividing the subbands, or merging the high frequency subbands, or both subdividing the low frequency subbands and merging the high frequency subbands. In this specific implementation, the 3 low-frequency sub-bands of the 64 sub-band signals are subdivided into 16 sub-bands, and the 61 high-frequency sub-bands are merged into 4 sub-bands, and finally 20 sub-band signals are obtained. The following operations are aimed at The resulting 20 subband signals are performed. The ranges of the above-mentioned low frequency and high frequency are artificially defined according to needs during specific implementation.

步骤1.3，分别采用基于频域主成分的参数编码方法(PCA)和基于极坐标主成分的参数编码方法（PC-PCA）对步骤1.2所得的左声道子带信号和右声道子带信号进行编码，分别求出上述两种参数编码方法的编码噪音能量。In step 1.3, the left channel sub-band signal and the right channel sub-band signal obtained in step 1.2 are respectively processed by the parametric coding method (PCA) based on the frequency domain principal component and the parametric coding method (PC-PCA) based on the polar coordinate principal component Encoding is carried out, and the encoding noise energies of the above two parameter encoding methods are obtained respectively.

本步骤的具体实施为：The specific implementation of this step is:

1）采用基于频域主成分的参数编码方法对左声道子带信号和右声道子带信号进行编码，以求得基于频域主成分的参数编码方法所产生的编码噪音能量。1) The left channel subband signal and the right channel subband signal are encoded by the parametric coding method based on the frequency domain principal component to obtain the coding noise energy generated by the parametric coding method based on the frequency domain principal component.

假设步骤1.2所得到的左声道子带信号L_k和右声道子带信号R_k数量均为N，将第k个左声道子带信号和右声道子带信号分别表示为L_k、R_k，k＝1,2,...,N，并假设左声道子带信号L_k和右声道子带信号R_k中各含有n个频点，则子带信号L_k和R_k可以看成由n个频点的信号所组成的序列，L_k＝{L_k(j)|j＝1,2,...,n}和R_k＝{R_k(j)|j＝1,2,...,n}，L_k(j)和R_k(j)分别为子带信号L_k和R_k中第j个频点的信号。本步骤是逐一针对各子带信号L_k和R_k，k＝1,2,...,N，来获取基于频域主成分的参数编码方法所产生的编码噪音能量。Assuming that the number of left channel subband signal L _k and right channel subband signal R _k obtained in step 1.2 is N, the kth left channel subband signal and right channel subband signal are expressed as L _k , R _k , k=1, 2,..., N, and assuming that the left channel sub-band signal L _k and the right channel sub-band signal R _k each contain n frequency points, then the sub-band signals L _k and R _k can be regarded as a sequence composed of signals of n frequency points, L _k ={L _k (j)|j=1,2,...,n} and R _k ={R _k (j)| j=1,2,...,n}, L _k (j) and R _k (j) are the signals of the jth frequency point in the subband signals L _k and R _k respectively. This step is to obtain the coding noise energy generated by the parameter coding method based on the frequency domain principal component for each sub-band signal L _k and R _k one by one, k=1, 2, . . . , N.

下面将以子带信号L_k和R_k为例，进一步说明基于频域主成分的参数编码方法所产生的编码噪音能量的获取：The following will take sub-band signals L _k and R _k as examples to further illustrate the acquisition of coding noise energy generated by the parameter coding method based on frequency domain principal components:

a）计算L_k和R_k序列构成的协方差矩阵R_k：a) Calculate the covariance matrix R _k composed of L _k and R _k sequences:

${R R}_{k k} = = [\begin{matrix} {r r}_{ll ll} & {r r}_{lr lr} \\ {r r}_{rl rl} & {r r}_{rr rr} \end{matrix}] - - - - - - ((11))$

其中，in,

r_ll＝cov[L_k,L_k]，r_lr＝r_rl＝cov[L_k,R_k]，r_rr＝cov[R_k,R_k]；r _ll =cov[L _k ,L _k ], r _lr =r _rl =cov[L _k ,R _k ], r _rr =cov[R _k ,R _k ];

b）求协方差矩阵R_k的特征值λ₁和λ₂：b) Find the eigenvalues λ ₁ and λ ₂ of the covariance matrix R _k :

${λ λ}_{11} = = \frac{11}{22} [[{r r}_{ll ll} + + {r r}_{rr rr} + + \sqrt{{(({r r}_{ll ll} - - {r r}_{rr rr}))}^{22} + + {(({22 r r}_{lr lr}))}^{22}}]] - - - - - - ((22))$

${λ λ}_{22} = = \frac{11}{22} [[{r r}_{ll ll} + + {r r}_{rr rr} + + \sqrt{{(({r r}_{ll ll} - - {r r}_{rr rr}))}^{22} + + {(({22 r r}_{lr lr}))}^{22}}]] - - - - - - ((33))$

c）根据特征值λ₁和λ₂分别得到基于频域主成分的参数编码方法（PCA）的主要成分能量E_p和次要成分能量E_s：c) According to the eigenvalues λ ₁ and λ ₂ , the main component energy E _p and the secondary component energy E _s of the parameter coding method (PCA) based on the frequency domain principal components are obtained respectively:

E_p＝max(λ₁,λ₂) （4）E _p ＝max(λ ₁ ,λ ₂ ) (4)

E_s＝min(λ₁,λ₂) （5）E _s =min(λ ₁ ,λ ₂ ) (5)

则，基于频域主成分的参数编码方法所产生的编码噪声能量ε₁＝E_s＝min(λ₁,λ₂)。Then, the encoding noise energy ε ₁ =E _s =min(λ ₁ ,λ ₂ ) generated by the parameter encoding method based on the frequency domain principal components.

2）采用基于极坐标主成分的参数编码方法对左声道子带信号和右声道子带信号进行编码，以求得基于极坐标主成分的参数编码方法所产生的编码噪音能量。2) The left channel sub-band signal and the right channel sub-band signal are encoded by the parametric coding method based on the polar coordinate principal component, so as to obtain the coding noise energy generated by the parametric coding method based on the polar coordinate principal component.

基于极坐标主成分的参数编码方式是在基于频域主成分参数编码方式的基础上自创的，两者的编码原理相同，但所采用的坐标不同，基于频域主成分参数编码方式采用的是直角坐标系，而基于极坐标主成分的参数编码方式则采用的是极坐标系。The parameter coding method based on the polar coordinate principal component is self-created on the basis of the frequency domain principal component parameter coding method. The coding principles of the two are the same, but the coordinates used are different. The parameter coding method based on the frequency domain principal component is a rectangular coordinate system, while the parameter encoding method based on the polar coordinate principal component uses a polar coordinate system.

假设步骤1.2所得到的左声道子带信号L_k和右声道子带信号R_k数量均为N，将第k个左声道子带信号和右声道子带信号分别表示为L_k、R_k，k＝1,2,...,N，并假设左声道子带信号L_k和右声道子带信号R_k中含有n个频点，则子带信号L_k和R_k可以看出由n个频点的信号所组成的序列，L_k＝{L_k(j)|j＝1,2,...,n}和R_k＝{R_k(j)|j＝1,2,...,n}，L_k(j)和R_k(j)分别为子带信号Lk和Rk中第j个频点的信号。本步骤是逐一针对各子带信号L_k和R_k，k＝1,2,...,N，来获取基于极坐标主成分的参数编码方法所产生的编码噪音能量。Assuming that the number of left channel subband signal L _k and right channel subband signal R _k obtained in step 1.2 is N, the kth left channel subband signal and right channel subband signal are expressed as L _k , R _k , k=1,2,...,N, and assuming that the left channel sub-band signal L _k and the right channel sub-band signal R _k contain n frequency points, then the sub-band signals L _k and R _k can be seen as a sequence composed of signals at n frequency points, L _k ={L _k (j)|j=1,2,...,n} and R _k ={R _k (j)|j =1,2,...,n}, L _k (j) and R _k (j) are the signals of the jth frequency point in the subband signals Lk and Rk respectively. This step is to obtain the encoding noise energy generated by the parameter encoding method based on the polar coordinate principal component for each sub-band signal L _k and R _k one by one, k=1, 2, . . . , N.

下面将以子带信号L_k和R_k为例，进一步说明本步骤：The following will take the sub-band signals L _k and R _k as examples to further illustrate this step:

a）为了能在极坐标系中进行主成分参数编码，逐一将子带信号L_k和R_k中各频点的子带信号L_k(j)和Rk(j)引入极坐标系中组成2个新的随机变量ρ_k(j)和

如图5所示，其中，j＝1,2,...,n，L_k(j)、Rk(j)表示子带信号L_k和R_k中第j个频点的信号，ρ_k(j)表示子带信号Lk和Rk中第j个频点的信号的幅值，

表示子带信号L_k和R_k中第j个频点的方向角：a) In order to encode the principal component parameters in the polar coordinate system, the sub-band signals L _k (j) and Rk (j) of each frequency point in the sub-band signals L _k and R _k are introduced into the polar coordinate system one by one to form 2 a new random variable ρ _k (j) and

As shown in Figure 5, where j=1, 2,..., n, L _k (j), Rk (j) represent the signal of the jth frequency point in the sub-band signals L _k and R _k , ρ _k (j) represents the amplitude of the signal of the jth frequency point in the sub-band signals Lk and Rk,

Indicates the direction angle of the jth frequency point in the subband signals L _k and R _k :

将子带信号L_k和R_k中各频点的信号幅值构成ρ_k序列，将子带信号L_k和R_k中各频点所对应的方向角构成

序列：The signal amplitudes of each frequency point in the sub-band signals L _k and R _k form a ρ _k sequence, and the direction angles corresponding to each frequency point in the sub-band signals L _k and R _k form

sequence:

ρ_k＝{ρ_k(j)|j＝1,2,...,n} （7）ρ _k ＝{ρ _k (j)|j＝1,2,...,n} (7)

b）计算ρ_k和

序列构成的协方差矩阵R_k，b) Calculate ρ _k and

The covariance matrix R _k composed of sequences,

其中，in,

c）求协方差矩阵R_k（9）的特征值λ₁、λ₂，并根据λ₁、λ₂得出基于极坐标主成分的参数编码方法（PC-PCA）的主要成分能量

和次要成分能量E_ρ：c) Calculate the eigenvalues λ ₁ and λ ₂ of the covariance matrix R _k (9), and obtain the principal component energy of the parameter encoding method (PC-PCA) based on the polar coordinate principal component based on λ ₁ and λ ₂

and the minor component energy E _ρ :

${E E.}_{ρ ρ} = = {λ λ}_{11} = = {Σ Σ}_{j j = = 11}^{n no} {[[{ρ ρ}_{k k} ((j j)) - - \frac{{Σ Σ}_{j j = = 11}^{n no} ((j j))))}{n no}]]}^{22} - - - - - - ((1111))$

则，基于极坐标主成分的参数编码方式的编码噪声能量ε₂＝E_ρ。Then, the encoding noise energy ε ₂ =E _ρ of the parameter encoding method based on the polar coordinate principal component.

分别采用上述基于频域主成分和基于极坐标主成分的参数编码方法逐一对N个子带信号Lk和Rk求解编码噪音能量，最终得到N组编码噪声能量。The above parametric coding methods based on frequency-domain principal components and polar coordinate principal components are used to solve the coding noise energy for the N subband signals Lk and Rk one by one, and finally N groups of coding noise energies are obtained.

步骤1.4，根据上述两种参数编码方法所产生的编码噪声能量大小来选择最优的参数编码方法，并采用所选的参数编码方法对左、右声道子带信号（L_k和R_k）进行进一步编码Step 1.4: Select the optimal parameter coding method according to the coding noise energy generated by the above two parameter coding methods, and use the selected parameter coding method to process the left and right channel subband signals (L _k and R _k ) for further encoding

本步骤中选择最优参数编码方式的具体实施为：The specific implementation of selecting the optimal parameter encoding method in this step is as follows:

选择编码噪声能量较小的参数编码方法，并输出该参数编码方法对应的模式mode，再采用被选参数编码方法对步骤1.2所得的左、右声道信号进行进一步编码。Select a parameter encoding method with less coding noise energy, and output the mode mode corresponding to the parameter encoding method, and then use the selected parameter encoding method to further encode the left and right channel signals obtained in step 1.2.

假设采用基于频域主成分和基于极坐标主成分的参数编码方法对子带信号Lk和Rk编码所产生的编码噪声能量为ε₁、ε₂，下面仍然以子带信号L_k和R_k为例说明本步骤的具体实施：Assuming that the coding noise energy generated by encoding the subband signals Lk and Rk by using the parameter coding method based on the frequency domain principal component and the polar coordinate principal component is ε ₁ , ε ₂ , the subband signals L _k and R _k are still taken as An example to illustrate the specific implementation of this step:

1）若ε₁≤ε₂，则输出mode=0，此时，采用基于频域主成分的参数编码方法对子带信号L_k和R_k进行进一步编码：1) If ε ₁ ≤ ε ₂ , then output mode=0. At this time, the sub-band signals L _k and R _k are further encoded by the parameter encoding method based on the frequency domain principal component:

根据式（1）的协方差矩阵R_k得出子带信号L_k和R_k的方向角

According to the covariance matrix R _k of formula (1), the direction angles of the subband signals L _k and R _k are obtained

采用基于频域主成分的参数编码方法对子带信号L_k和R_k进行进一步编码，得到编码后的主要成分序列PC_k和次要成分序列A_k，PC_k＝{PC_k(j)|j＝1,2,...,n}，A_k＝{A_k(j)|j＝1,2,...,n}，PC_k(j)为子带信号L_k和R_k中第j个频点的主要成分，A_k(j)为子带信号L_k和R_k中第j个频点的次要成分，其中：The sub-band signals L _k and R _k are further coded by the parameter coding method based on frequency domain principal components, and the coded principal component sequence PC _k and secondary component sequence A _k are obtained, PC _k = {PC _k (j)| j=1,2,...,n}, A _k ={A _k (j)|j=1,2,...,n}, PC _k (j) is the sub-band signals L _k and R _k The main component of the jth frequency point in A _k (j) is the secondary component of the jth frequency point in the subband signals L _k and R _k , where:

$(\begin{matrix} {cos cos θ θ}_{k k} & sin sin {θ θ}_{k k} \\ - - sin sin {θ θ}_{k k} & cos cos {θ θ}_{k k} \end{matrix}) (\begin{matrix} {L L}_{k k} ((j j)) \\ {R R}_{k k} ((j j)) \end{matrix}) = = (\begin{matrix} {PC PC}_{k k} ((j j)) \\ {A A}_{k k} ((j j)) \end{matrix}) - - - - - - ((1212))$

L_k(j)、R_k(j)分别为子带信号L_k和R_k中第j个频点的信号，θ_k表示子带信号L_k和R_k的方向角，k＝1,2,...,N，j＝1,2,...,n。L _k (j) and R _k (j) are the signals of the jth frequency point in the sub-band signals L _k and R _k respectively, θ _k represents the direction angle of the sub-band signals L _k and R _k , k=1,2 ,...,N, j=1,2,...,n.

采用上述方法对所有子带逐一进行解码，并输出各子带的主要成分序列PC_k、方向角θ_k，以及噪音能量比PAR（即E_p和E_s之比）。All subbands are decoded one by one by the above method, and the main component sequence PC _k , direction angle θ _k , and noise energy ratio PAR (that is, the ratio of E _p to E _s ) of each subband are output.

2）若ε₁＞ε₂，则输出mode=1，此时，采用基于极坐标主成分的参数编码方法对子带信号L_k和R_k进行进一步编码：2) If ε ₁ >ε ₂ , then output mode=1. At this time, the sub-band signals L _k and R _k are further encoded using the parameter encoding method based on the polar coordinate principal component:

采用基于极坐标主成分的参数编码方法对子带信号L_k和R_k进行进一步编码，得到编码后的主要成分序列PC_k和次要成分序列A_k，PC_k＝{PC_k(j)|j＝1,2,...,n}，A_k＝{A_k(j)|j＝1,2,...,n}，PC_k(j)为子带信号L_k和R_k中第j个频点的主要成分，A_k(j)为子带信号L_k和R_k中第j个频点的次要成分：The sub-band signals L _k and R _k are further encoded by the parameter encoding method based on the polar coordinate principal component, and the encoded principal component sequence PC _k and secondary component sequence A _k are obtained, PC _k = {PC _k (j)| j=1,2,...,n}, A _k ={A _k (j)|j=1,2,...,n}, PC _k (j) is the sub-band signals L _k and R _k The main component of the jth frequency point in A _k (j) is the secondary component of the jth frequency point in the subband signals L _k and R _k :

其中，L_k(j)、R_k(j)分别为子带信号L_k和R_k中第j个频点的信号，

表示子带信号中L_k和R_k第j个频点的方向角，

的值如式（6）所示，k＝1,2,...,N，j＝1,2,...,n。Among them, L _k (j) and R _k (j) are the signals of the jth frequency point in the sub-band signals L _k and R _k respectively,

Indicates the direction angle of the jth frequency point of L _k and R _k in the subband signal,

The value of is shown in formula (6), k=1,2,...,N, j=1,2,...,n.

求解子带信号L_k和R_k的旋转半径

旋转半径

为子带信号L_k和R_k各频点的信号幅值的平均值，即：Find the radius of rotation for the subband signals L _k and R _k

radius of rotation

is the average value of the signal amplitudes of each frequency point of the sub-band signals L _k and R _k , that is:

${\overset{&OverBar; &OverBar;}{ρ ρ}}_{k k} = = \frac{{Σ Σ}_{j j = = 11}^{n no} \sqrt{{L L}_{k k}^{22} ((j j)) + + {R R}_{k k}^{11} ((j j))}}{n no} - - - - - - ((1414))$

采用上述方法对所有子带逐一进行解码，并输出各子带的主要成分序列PC_k、旋转半径ρ_k，以及PAR（即E_ρ和

之比）。Use the above method to decode all subbands one by one, and output the main component sequence PC _k , rotation radius ρ _k , and PAR (ie E _ρ and

Ratio).

步骤1.3和1.4均是以各子带信号为基础来进行编码的，针对每个子带信号均会计算一个基于频域主成分的参数编码方法的编码噪音能量ε₁和一个基于极坐标主成分的参数编码方法的编码噪音能量ε₂，每个子带信号均进行一次ε₁和ε₂大小的比较，并选择较小的编码噪音能量所对应的参数编码方法对该子带进行进一步编码。步骤1.3和1.4的过程如图3所示。Steps 1.3 and 1.4 are encoded based on each sub-band signal. For each sub-band signal, a coding noise energy ε ₁ based on the frequency-domain principal component-based parameter coding method and a polar-coordinate principal component-based For the coding noise energy ε ₂ of the parametric coding method, each sub-band signal is compared with ε ₁ and ε ₂ once, and the parameter coding method corresponding to the smaller coding noise energy is selected to further code the sub-band. The process of steps 1.3 and 1.4 is shown in Figure 3.

步骤1.5，对步骤1.4所产生的所有主要成分序列PC_k进行信号下混，得到下混后的信号m，k＝1,2,...,N；Step 1.5, performing signal down-mixing on all principal component sequences PC _k generated in step 1.4, to obtain down-mixed signal m, k=1,2,...,N;

步骤1.6，将步骤1.5所得的下混信号m传入核心编码器进行编码，得到编码后的码流，若是采用的基于极坐标主成分的参数编码方法进行编码，则将旋转半径ρ_k、PAR及mode值写入编码码流中；若是采用的基于极频域主成分的参数编码方法进行编码，则将方向角θ_k、PAR及mode值写入编码码流中。In step 1.6, pass the downmix signal m obtained in step 1.5 into the core encoder for encoding, and obtain the encoded code stream. If the parameter encoding method based on the polar coordinate principal component is used for encoding, the rotation radius ρ _k , PAR and mode values are written into the coded code stream; if the parameter coding method based on the principal component in the extreme frequency domain is used for coding, the direction angle θ _k , PAR and mode values are written into the coded code stream.

本发明还提供了一种面向3D音频的双声道编码方法，具体流程图参见图2，包括如下步骤：The present invention also provides a two-channel encoding method for 3D audio, the specific flow chart is shown in Figure 2, including the following steps:

步骤2.1，对编码端所得的编码码流进行解码，得到解码信号mStep 2.1, decode the encoded code stream obtained by the encoding end to obtain the decoded signal m

具体实施时，将编码码流输入核心解码器，利用核心解码器解码得到解码信号m。During specific implementation, the coded code stream is input into the core decoder, and the core decoder is used to decode to obtain the decoded signal m.

步骤2.2，对步骤2.1中获得的解码信号m进行子带划分，得到解码子带信号Step 2.2, divide the decoded signal m obtained in step 2.1 into subbands to obtain decoded subband signals

具体实施时，将核心解码器输出的解码信号m划分成子带序列P(N)，其中，N为子带数量，等同于编码方法中的N值。During specific implementation, the decoded signal m output by the core decoder is divided into subband sequences P(N), where N is the number of subbands, which is equivalent to the N value in the encoding method.

步骤2.3，根据编码码流中的模式mode值选择相应的解码模式，结合编码码流中的方向角或旋转半径、噪音能量比进行解码工作，得到重建的频域子带信号，如图6和图7所示。Step 2.3, select the corresponding decoding mode according to the mode mode value in the encoded code stream, and perform decoding work in combination with the direction angle or rotation radius and noise energy ratio in the encoded code stream to obtain the reconstructed frequency domain subband signal, as shown in Figure 6 and Figure 7 shows.

本步骤的具体实施为：The specific implementation of this step is:

1）若mode=0，则选择基于频域主成分的参数解码方法：1) If mode=0, select the parameter decoding method based on the frequency domain principal component:

根据编码码流中的噪音能量比PAR，产生一个与原始信号具有相同能量的白噪声，结合编码码流中的主要成分序列和方向角，利用基于频域主成份的参数解码方法将步骤2.2中所得的子带序列P(N)进行恢复，得到解码后的子带信号，即重建的频域子带信号

和According to the noise energy ratio PAR in the encoded code stream, a white noise with the same energy as the original signal is generated, combined with the main component sequence and direction angle in the encoded code stream, the parameter decoding method based on the frequency domain principal component is used to convert the The obtained subband sequence P(N) is restored to obtain the decoded subband signal, that is, the reconstructed frequency domain subband signal

and

${\overset{^^}{L L}}_{11},, {\overset{^^}{L L}}_{22},, . . . . . .,, {\overset{^^}{L L}}_{N N} . .$

2）若mode=1，则选择基于极坐标主成分的参数解码方法：2) If mode=1, select the parameter decoding method based on the polar coordinate principal component:

根据编码码流中的噪音能量比PAR，产生一个与原始信号具有相同能量的白噪声，结合编码码流中的主要成分序列和旋转半径，利用基于极坐标主成份的参数解码方法将步骤2.2中所得的子带序列P(N)进行恢复，得到解码后的子带信号，即重建的频域子带信号 ${\hat{R}}_{1}, {\hat{R}}_{2}, . . ., {\hat{R}}_{N} .$ 和 ${\hat{L}}_{1}, {\hat{L}}_{2}, . . ., {\hat{L}}_{N} .$ According to the noise energy ratio PAR in the encoded code stream, a white noise with the same energy as the original signal is generated, combined with the main component sequence and rotation radius in the encoded code stream, the parameter decoding method based on the polar coordinate principal component is used to convert the The obtained subband sequence P(N) is restored to obtain the decoded subband signal, that is, the reconstructed frequency domain subband signal ${\hat{R}}_{1}, {\hat{R}}_{2}, . . ., {\hat{R}}_{N} .$ and ${\hat{L}}_{1}, {\hat{L}}_{2}, . . ., {\hat{L}}_{N} .$

步骤2.4，合并步骤2.3所得到的重建的频域子带信号得到重建的频域信号

和

Step 2.4, combining the reconstructed frequency-domain sub-band signals obtained in step 2.3 to obtain a reconstructed frequency-domain signal

and

步骤2.5，对步骤2.4所得到的重建的频域声道信号进行时频逆变换，恢复出重建的时域信号

和具体实施时，可采用现有技术，如FFT（快速傅里叶变换）变换来实现，本发明不予赘述。Step 2.5, performing time-frequency inverse transform on the reconstructed frequency-domain channel signal obtained in step 2.4, recovering the reconstructed time-domain signal

and During specific implementation, existing technologies such as FFT (Fast Fourier Transform) transformation can be used for implementation, which will not be described in detail in the present invention.

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention belongs can make various modifications or supplements to the described specific embodiments or adopt similar methods to replace them, but they will not deviate from the spirit of the present invention or go beyond the definition of the appended claims range.

Claims

1. A binaural encoding method for 3D audio, characterized in that, comprising steps:

S1.1. Perform time-frequency transformation on the input binaural signals respectively, and convert the binaural signals in the time domain into binaural signals in the frequency domain;

S1.2. Dividing the binaural signals in the frequency domain into subbands respectively to obtain binaural subband signals;

S1.3, using the parameter coding method based on the frequency domain principal component and the polar coordinate principal component to encode the two-channel sub-band signals one by one, so as to obtain each two-channel sub-band signal in the above two coding methods The encoding noise energy generated under

The encoding noise energy obtained by encoding the two-channel subband signal by using the parameter encoding method based on the polar coordinate principal component ε _2.k is the coding noise energy of the kth binaural subband signal, ρ _k (j) is the signal amplitude of the jth frequency point in the kth binaural subband signal,

R _k (j) is the signal of the jth frequency point in the kth left channel subband signal and the right channel subband signal respectively, and n is the quantity of the frequency point in the kth two-channel subband signal;

S1.4. For each binaural subband signal, select the parameter encoding method corresponding to the smaller encoding noise energy to further encode the binaural subband signal. If the noise energy is equal, select the parameter encoding method based on the frequency domain principal component The parametric encoding method further encodes the binaural subband signal; if the parameter encoding method based on the frequency domain principal component is used for further encoding, the encoded principal component sequence, direction angle and noise energy ratio of the binaural subband signal are output ; If the parameter encoding method based on the polar coordinate principal component is used for further encoding, the encoded principal component sequence, rotation radius and noise energy ratio of the binaural subband signal are output;

The encoded principal component sequence obtained by adopting the parameter encoding method based on the polar coordinate principal component is:

PC _k ={PC _k (j)|j=1,2,...,n}

Among them, PC _k is the principal component sequence of the kth binaural subband signal, PC _k (j) is the main component of the jth frequency point in the kth binaural subband signal,

The radius of rotation obtained by using the parameter encoding method based on the polar coordinate principal component is:

\overset{&OverBar; &OverBar;}{{ρ ρ}_{k k}} = = \frac{{Σ Σ}_{j j = = 11}^{n no} \sqrt{{L L}_{k k}^{22} ((j j)) + + {R R}_{k k}^{22} ((j j))}}{n no}

in,

The noise energy ratio obtained by adopting the parameter coding method based on the polar coordinate principal component is:

PAR PAR = = \frac{{π π}^{22}}{4848 {Σ Σ}_{j j = = 11}^{n no} {[[{ρ ρ}_{k k} ((j j)) - - \frac{11}{n no} {Σ Σ}_{j j = = 11}^{n no} {ρ ρ}_{k k} ((j j))]]}^{22}}

Among them, ρ _k (j) is the signal amplitude of the jth frequency point in the kth binaural subband signal,

S1.5. Downmixing the coded principal component sequence to obtain a downmix signal;

S1.6. Encode the downmix signal by using the core encoder to obtain an encoded code stream, and write the direction angle or rotation radius and noise energy ratio into the encoded code stream.

2. A dual-channel encoder for 3D audio, characterized in that it comprises:

A time-frequency conversion module is used to perform time-frequency conversion on the input binaural signals respectively, and convert the binaural signals in the time domain into binaural signals in the frequency domain;

The sub-band division module is used to divide the binaural signals in the frequency domain into sub-bands respectively to obtain binaural sub-band signals;

The coding noise energy calculation module is used to encode the two-channel sub-band signals one by one by adopting the parameter coding method based on the frequency-domain principal component and the polar-coordinate principal component, so as to obtain each two-channel sub-band signal in the above-mentioned Encoding noise energy produced under two encoding methods; the encoding noise energy obtained by encoding the two-channel subband signal by using the parameter encoding method based on polar coordinate principal components

The parameter coding module is used for each binaural sub-band signal, and selects the parameter coding method corresponding to the smaller coding noise energy to further encode the binaural sub-band signal. The parametric coding method of the component is used to further encode the binaural subband signal; if the parameter coding method based on the frequency domain principal component is used for further coding, the encoded principal component sequence, direction angle and noise of the binaural subband signal are output Energy ratio; if the parameter encoding method based on the polar coordinate principal component is used for further encoding, the encoded principal component sequence, rotation radius and noise energy ratio of the binaural subband signal are output;

PC _k ={PC _k (j)|j=1,2,...,n}

\overset{&OverBar; &OverBar;}{{ρ ρ}_{k k}} = = \frac{{Σ Σ}_{j j = = 11}^{n no} \sqrt{{L L}_{k k}^{22} ((j j)) + + {R R}_{k k}^{22} ((j j))}}{n no}

in,

PAR PAR = = \frac{{π π}^{22}}{4848 {Σ Σ}_{j j = = 11}^{n no} {[[{ρ ρ}_{k k} ((j j)) - - \frac{11}{n no} {Σ Σ}_{j j = = 11}^{n no} {ρ ρ}_{k k} ((j j))]]}^{22}}

A downmix module, used to downmix the coded principal component sequence to obtain a downmix signal;

The core encoder is used to encode the downmix signal to obtain a coded code stream, and write the direction angle or rotation radius and noise energy ratio into the coded code stream.

3. A binaural decoding method for 3D audio, characterized in that it comprises steps:

S2.1. Using a core decoder to decode the coded code stream obtained by using the coding method according to claim 1 to obtain a decoded signal;

S2.2. Dividing the decoded signal into subbands to obtain decoded subband signals;

S2.3. Using the parameter decoding method corresponding to the parameter encoding method used for encoding, and combining the direction angle or rotation radius in the encoded code stream, and the noise capability ratio to decode the decoded sub-band signal, obtain the reconstructed frequency domain sub-band signal with signal;

S2.4. Merging the reconstructed frequency-domain sub-band signals to obtain a reconstructed frequency-domain signal;

S2.5. Perform time-frequency inverse transformation on the frequency domain signal, convert the frequency domain signal into a time domain signal, and recover the reconstructed audio signal.

4. the binaural decoding method facing 3D audio according to claim 3, characterized in that:

The parameter decoding method described in step S2.3 is a parameter decoding method based on frequency domain principal components or a parameter decoding method based on polar coordinate principal components.

5. the two-channel decoding method facing 3D audio according to claim 4, is characterized in that:

The described decoded subband signal is decoded by using the parameter decoding method based on the frequency domain principal component to obtain a reconstructed frequency domain subband signal, specifically:

According to the noise energy ratio in the encoded code stream, a white noise with the same energy as the original signal is generated, combined with the main component sequence and direction angle in the encoded code stream, the decoded sub-band signal is restored to obtain the reconstructed frequency domain subband signal.

6. the two-channel decoding method facing 3D audio according to claim 4, is characterized in that:

The described decoded subband signal is decoded by using the parameter decoding method based on the polar coordinate principal component to obtain a reconstructed frequency domain subband signal, specifically:

According to the noise energy ratio in the encoded code stream, a white noise with the same energy as the original signal is generated, combined with the main component sequence and rotation radius in the encoded code stream, the decoded sub-band signal is restored to obtain the reconstructed frequency domain subband signal.

7. A two-channel decoder for 3D audio, characterized in that it comprises:

A core decoder, used to decode the encoded code stream obtained by using the encoding method as claimed in claim 1, to obtain a decoded signal;

A sub-band division module is used to divide the decoded signal into sub-bands to obtain decoded sub-band signals;

The parameter decoding module is used to decode the decoded sub-band signal by using a parameter decoding method corresponding to the parameter encoding method used for encoding, and combining the direction angle or rotation radius and noise capability ratio in the encoded code stream to obtain the reconstructed frequency domain subband signal;

A subband combining module, used to combine the reconstructed frequency domain subband signals to obtain a reconstructed frequency domain signal;

The time-frequency inverse transform module is used to perform time-frequency inverse transform on the frequency domain signal, convert the frequency domain signal into a time domain signal, and restore the reconstructed audio signal.

8. the two-channel decoder facing 3D audio according to claim 7, is characterized in that:

The parameter decoding module further includes a parameter decoding module based on frequency domain principal components and a parameter decoding module based on polar coordinate principal components.

9. the two-channel decoder facing 3D audio according to claim 8, is characterized in that:

The parameter decoding module based on frequency-domain principal components is used to generate a white noise with the same energy as the original signal according to the noise energy ratio in the coded code stream, combined with the main component sequence and direction angle in the coded code stream, The decoded sub-band signal is restored to obtain a reconstructed frequency-domain sub-band signal.

10. the two-channel decoder facing 3D audio according to claim 8, is characterized in that:

The parameter decoding module based on the polar coordinate principal component is used to generate a white noise having the same energy as the original signal according to the noise energy ratio in the encoded code stream, combined with the principal component sequence and the radius of rotation in the encoded code stream, The decoded sub-band signal is restored to obtain a reconstructed frequency-domain sub-band signal.