CN103971692A

CN103971692A - Audio processing method, device and system

Info

Publication number: CN103971692A
Application number: CN201310031782.0A
Authority: CN
Inventors: 杨磊; 王立众; 洪準晟
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2013-01-28
Filing date: 2013-01-28
Publication date: 2014-08-06

Abstract

The invention discloses an audio processing method, device and system. The method includes the steps that the average value of a left sound channel signal and a right sound channel signal is obtained, and the obtained lower mixed signal is coded to obtain a main code stream; MLT is conducted on the left sound channel signal and the right sound channel signal respectively to obtain a left sound channel MLT coefficient sequence and a right sound channel MLT coefficient sequence, and the left sound channel MLT coefficient sequence and the right sound channel MLT coefficient sequence are respectively divided into a low frequency sequence and a high frequency sequence; a low frequency sideband signal is obtained according to the difference between the low frequency sequence of the left sound channel MLT coefficient sequence and the low frequency sequence of the right sound channel MLT coefficient sequence, and quantization coding is conducted on amplitude and position of the low frequency sideband signal to obtain a low frequency code stream; an intensity stereo coefficient is obtained according to the high frequency sequence of the left sound channel MLT coefficient sequence and the high frequency sequence of the right sound channel MLT coefficient sequence, and quantization coding is conducted on the amplitude of the intensity stereo coefficient to obtain a high frequency code stream; the main code stream is mixed with an auxiliary code stream composed of the low frequency code stream and the high frequency code stream, and the mixed code stream is output.

Description

Audio processing method, device and system

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to an audio processing method, apparatus and system.

Background

In the stereo audio technology, an encoding end (specifically, a stereo encoder) may perform encoding processing on an analog stereo signal by using a stereo encoding standard to obtain a digital code stream, and a decoding end (specifically, a stereo decoder) may perform corresponding decoding processing on the digital code stream by using a corresponding stereo decoding standard to reduce the digital code stream into an analog stereo signal. Compared with a single sound channel, a stereo sound has better advantages in the aspects of improving sound quality and enhancing the presence effect, faithfully reproducing the direction and spatial distribution of each sound source in an actual sound field and the like, so that the stereo audio technology is widely applied to various audio and video communication services, for example, a video conference. The stereo audio technology can enable users in a video conference to communicate better, enable the communication of the users to be more natural, and improve the conference efficiency. In addition, the advantages of low complexity and small bandwidth occupation of the stereo audio technology can also allow users to use various portable devices, such as mobile phones, tablet computers and the like to access the video conference, so that the users can communicate through the video conference at any time and any place, the working efficiency is greatly improved, and the stereo audio system has a wide market prospect.

At present, the existing stereo Coding/decoding standards (or algorithms) mainly include AMR-WB + (adaptive multi-rate-Wideband Coding) and HEAAC v2 (High-Efficiency advanced audio Coding version2, second edition of High-performance advanced audio Coding), but both standards generate a large delay when Coding in a stereo mode, wherein the delay of the AMR-WB + standard can reach 108-325 ms, and the delay of the HEAAC v2 can even reach 386-513 ms. Therefore, when the method is applied to an audio/video communication service with a high real-time requirement, for example, a video conference, the high delay of the existing stereo encoding/decoding standard cannot meet the requirements of the audio/video communication service with the high real-time requirement, and may bring a great adverse effect on the voice interaction of the audio/video communication service such as the video conference.

Disclosure of Invention

The application provides an audio processing method, device and system, which aim to solve the problem of large time delay of the existing stereo coding/decoding standards such as AMR-WB + and HEAAC v 2.

The technical scheme of the application is as follows:

in one aspect, an audio processing method is provided, including:

obtaining an average value of a left channel signal and a right channel signal of an input stereo signal to be coded to obtain a down-mixed signal, and coding the down-mixed signal to obtain a main code stream;

respectively modulating, overlapping and transforming an MLT (maximum level transform) to a left channel signal and a right channel signal to obtain a left channel MLT coefficient sequence and a right channel MLT coefficient sequence, and respectively dividing the left channel MLT coefficient sequence and the right channel MLT coefficient sequence into a low-frequency sequence and a high-frequency sequence, wherein the frequency of the low-frequency sequence is less than or equal to a preset first frequency value, and the frequency of the high-frequency sequence is greater than the first frequency value;

acquiring a low-frequency sideband signal according to the difference between the low-frequency sequence of the left channel MLT coefficient sequence and the low-frequency sequence of the right channel MLT coefficient sequence, and carrying out quantization coding on the amplitude and the position of the low-frequency sideband signal to obtain a low-frequency code stream;

obtaining an intensity stereo coefficient according to a high-frequency sequence of the left channel MLT coefficient sequence and a high-frequency sequence of the right channel MLT coefficient sequence, and carrying out quantization coding on the amplitude of the intensity stereo coefficient to obtain a high-frequency code stream, wherein the intensity stereo coefficient is used for representing the root mean square of the ratio of the energy of the left channel signal to the energy of the stereo signal and the root mean square of the ratio of the energy of the right channel signal to the energy of the stereo signal;

and mixing the main code stream and the auxiliary code stream to obtain a stereo code stream, and outputting the stereo code stream, wherein the auxiliary code stream consists of a low-frequency code stream and a high-frequency code stream.

In another aspect, an audio processing method is further provided, including:

separating an input stereo code stream to be decoded to obtain a main code stream and an auxiliary code stream, and decoding the separated main code stream to obtain a decoded down-mix signal, wherein the stereo code stream to be decoded is the output stereo code stream;

decoding and inverse quantizing the separated auxiliary code stream to obtain the amplitude of a sideband low-frequency coefficient and an intensity stereo coefficient, and performing inverse modulation and lapped transform IMLT on the sideband low-frequency coefficient to obtain a decoded low-frequency sideband signal;

acquiring a decoded signal of a left channel low frequency band and a decoded signal of a right channel low frequency band according to the decoded low frequency sideband signal and the low frequency part of the decoded down-mix signal;

acquiring a decoded signal of a left channel high frequency band and a decoded signal of a right channel high frequency band according to the amplitude of the intensity stereo coefficient and the high frequency part of the decoded downmix signal;

and acquiring the sum of the decoded signal of the left channel low frequency band and the decoded signal of the left channel high frequency band to obtain a decoded left channel signal, acquiring the sum of the decoded signal of the right channel low frequency band and the decoded signal of the right channel high frequency band to obtain a decoded right channel signal, and outputting the decoded left channel signal and the decoded right channel signal.

In another aspect, an audio processing apparatus is provided, including:

the down-mixing module is used for obtaining the average value of a left channel signal and a right channel signal of an input stereo signal to be coded to obtain a down-mixed signal;

the main coding module is used for coding the down-mixing signal input by the down-mixing module to obtain a main code stream;

the modulation overlap transform MLT module is used for respectively carrying out MLT on the left channel signal and the right channel signal to obtain a left channel MLT coefficient sequence and a right channel MLT coefficient sequence, and respectively dividing the left channel MLT coefficient sequence and the right channel MLT coefficient sequence into a low-frequency sequence and a high-frequency sequence, wherein the frequency of the low-frequency sequence is less than or equal to a preset first frequency value, and the frequency of the high-frequency sequence is greater than the first frequency value;

the first acquisition module is used for acquiring a low-frequency sideband signal according to the difference between the low-frequency sequence of the left channel MLT coefficient sequence and the low-frequency sequence of the right channel MLT coefficient sequence input by the MLT module; the stereo signal acquisition module is further used for acquiring an intensity stereo coefficient according to the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence, wherein the intensity stereo coefficient is used for representing the root mean square of the ratio of the energy of the left channel signal to the energy of the stereo signal and the root mean square of the ratio of the energy of the right channel signal to the energy of the stereo signal;

the quantization coding module is used for performing quantization coding on the amplitude and the position of the low-frequency sideband signal input by the first acquisition module to obtain a low-frequency code stream; the first acquisition module is used for acquiring intensity stereo coefficients of the audio signal and outputting the intensity stereo coefficients to the second acquisition module;

and the mixing module is used for mixing the main code stream input by the main coding module and the auxiliary code stream input by the quantization coding module to obtain a stereo code stream and outputting the stereo code stream.

In another aspect, an audio processing apparatus is provided, including:

the separation module is used for separating the input stereo code stream to be decoded to obtain a main code stream and an auxiliary code stream, wherein the stereo code stream to be decoded is the output stereo code stream;

the main decoding module is used for decoding the main code stream input by the separation module to obtain a decoded down-mixing signal;

the inverse quantization decoding module is used for decoding and inverse quantizing the auxiliary code stream input by the separation module to obtain the amplitude of the sideband low-frequency coefficient and the intensity stereo coefficient;

the inverse modulation lapped transform IMLT module is used for carrying out IMLT on the sideband low-frequency coefficient input by the inverse quantization decoding module to obtain a decoded low-frequency sideband signal;

a second obtaining module, configured to obtain a decoded signal of a left channel low frequency band and a decoded signal of a right channel low frequency band according to the decoded low frequency sideband signal input by the IMLT module and the low frequency portion of the decoded downmix signal input by the main decoding module; the left channel high-frequency decoding module is used for decoding the left channel high-frequency band signal and the right channel high-frequency band signal according to the amplitude of the intensity stereo coefficient input by the inverse quantization decoding module and the high-frequency part of the decoded down-mixed signal input by the main decoding module; and the left channel decoding module is further configured to obtain a sum of the decoded signal of the left channel low frequency band and the decoded signal of the left channel high frequency band to obtain a decoded left channel signal, obtain a sum of the decoded signal of the right channel low frequency band and the decoded signal of the right channel high frequency band to obtain a decoded right channel signal, and output the decoded left channel signal and the decoded right channel signal.

In yet another aspect, an audio processing system is provided, including: the two audio processing devices are described above.

In the technical scheme of the application, based on domain transform coding and intensity stereo theory, MLT transform is respectively performed on a left channel signal and a right channel signal to obtain a left channel MLT coefficient sequence and a right channel coefficient sequence, then the two MLT coefficient sequences are respectively divided into a high frequency sequence and a low frequency sequence, and the high frequency sequence is further divided into a plurality of high frequency sub-bands, so that the left channel signal and the right channel signal are respectively divided into a plurality of sub-bands (the low frequency sequence can also be regarded as one sub-band), an MLT domain transform coding method is used for the low frequency sequence, and an intensity stereo method is used for the high frequency sub-bands. Thus, the method of the present application has better performance than AMRWB + and HEAACv2, while the delay is lower than AMR-WB + and HEAACv 2.

Drawings

Fig. 1 is a flowchart of a stereo encoding method in an audio processing method according to a first embodiment of the present application;

fig. 2 is a flowchart of a stereo decoding method in an audio processing method according to a first embodiment of the present application;

fig. 3 is a schematic structural diagram of an audio processing apparatus that can be used as a stereo encoder according to a second embodiment of the present application;

fig. 4 is a schematic structural diagram of an audio processing apparatus that can be used as a stereo decoder according to a second embodiment of the present application.

Detailed Description

In order to solve the problem of the prior art that the time delay of stereo coding/decoding standards such as AMR-WB + and heaaacv 2 is large, the following embodiments of the present application provide an audio processing method, an audio processing apparatus to which the method can be applied, and an audio processing system.

Example one

The audio processing method of the present embodiment includes two parts, one part being an encoding process of a stereo signal and the other part being a corresponding decoding process. These two sections will be described in detail below.

Coding process for mono and stereo signals

In the audio processing method of the present embodiment, the encoding process (also referred to as a stereo encoding method) for the stereo signal may be performed by an encoding end, as shown in fig. 1, and the encoding process includes the following steps:

step S102, obtaining an average value of a left channel signal and a right channel signal of an input stereo signal to be coded to obtain a down-mixed signal, and coding the down-mixed signal to obtain a main code stream;

in practical implementation, the process of obtaining the average value of the left channel signal and the right channel signal of the stereo signal may also be referred to as down-mixing the stereo signal. The stereo signal may be down-mixed according to the following equation (1):

x_{Mixed} = \frac{x_{L} + x_{R}}{2} - - - (1)

wherein x is_MixedRepresenting the downmix signal, x_LRepresenting the left channel signal, x_RRepresenting the right channel signal.

After down-mixing the stereo signals, the obtained down-mixed signals may be encoded using any one of mono encoders, such as an AMR-WB encoder and a SILK encoder (an ultra wideband audio encoder proposed by Skype), and the encoded code stream may be referred to as a main code stream.

Step S104, respectively carrying out Modulated Lapped Transform (MLT) on the left channel signal and the right channel signal to obtain a left channel MLT coefficient sequence and a right channel MLT coefficient sequence, and respectively dividing the left channel MLT coefficient sequence and the right channel MLT coefficient sequence into a low frequency sequence and a high frequency sequence, wherein the frequency of the low frequency sequence is less than or equal to a preset first frequency value, and the frequency of the high frequency sequence is greater than the first frequency value;

in step S104, MLT is performed on the left channel signal to obtain a plurality of MLT coefficients, and a sequence of these MLT coefficients may be referred to as a left channel MLT coefficient sequence, and is denoted as X_L. Meanwhile, MLT is performed on the right channel signal to obtain a plurality of MLT coefficients, and a sequence composed of these MLT coefficients may be referred to as a right channel MLT coefficient sequence and is denoted as X_R. Then, taking a preset first frequency value as a boundary, and dividing the left channel MLT coefficient sequence X_LThe method is divided into two parts: the low-frequency sequence and the high-frequency sequence are divided by the first frequency value, and the right channel MLT coefficient sequence X is divided by the first frequency value_RThe division into two parts: low-frequency sequence and high-frequency sequence, left channel MLT coefficient sequence X_LLow frequency sequence and right channel MLT coefficient sequence X_RIs in the frequency range of (0, the first frequency value]In the range of (1), left channel MLT coefficient sequence X_LHigh frequency sequence and right channel MLT coefficient sequence X_RThe frequency range of the high-frequency sequence of (c) is within the range of (the first frequency value, + ∞).

In practical implementation, the first frequency value can be set according to practical situations, for example, a preferred value is 2 kHz.

Step S106, acquiring a low-frequency sideband signal according to the difference between the low-frequency sequence of the left channel MLT coefficient sequence and the low-frequency sequence of the right channel MLT coefficient sequence, and carrying out quantization coding on the amplitude and the position of the low-frequency sideband signal to obtain a low-frequency code stream;

in step S106, the method for obtaining a low-frequency sideband signal according to the difference between the low-frequency sequence of the left channel MLT coefficient sequence and the low-frequency sequence of the right channel MLT coefficient sequence may include the following steps 1-2:

step 1: according to the followingCalculating the low frequency sequence of the left channel MLT coefficient sequence by the following formula (2)Low frequency sequence of right channel MLT coefficient sequenceDifference of difference

X_{S}^{Lo} = \frac{X_{L}^{Lo} - X_{R}^{Lo}}{2} - - - (2)

For example,

X_{L}^{Lo} = {X_{L}^{Lo} (j)}, j = 1, . . ., M 1,

X_{R}^{Lo} = {X_{R}^{Lo} (j)}, j = 1, . . ., M 1,

then:

X_{S}^{Lo} = {\frac{X_{L}^{Lo} (j) - X_{R}^{Lo} (j)}{2}}, j = 1, . . ., M 1 .

it is clear that,is also a sequence.

Step 2: fromAnd taking the MLT coefficients with the maximum value in a preset number, and taking the MLT coefficients with the maximum value in the preset number as the low-frequency sideband signals.

Since the MLT transform has better energy concentration, the method can be used for obtaining the MLT transformTake out M with the largest value^Lo(the predetermined number may be M^LoRepresents) MLT coefficients, using this M^LoThe MLT coefficients approximately represent the low frequency sideband signal.

Step S108, obtaining an intensity stereo coefficient according to the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence, and carrying out quantization coding on the amplitude of the intensity stereo coefficient to obtain a high-frequency code stream, wherein the intensity stereo coefficient is used for representing the root mean square of the ratio of the energy of the left channel signal to the energy of the stereo signal and the root mean square of the ratio of the energy of the right channel signal to the energy of the stereo signal;

in this step S108, the method for obtaining intensity stereo coefficients according to the high frequency sequence of the left channel MLT coefficient sequence and the high frequency sequence of the right channel MLT coefficient sequence may include the following steps 1-2:

step 1: dividing a high-frequency sequence of a left channel MLT coefficient sequence and a high-frequency sequence of a right channel MLT coefficient sequence into H high-frequency sub-bands according to a preset dividing mode, wherein H is a natural number larger than 1, and each high-frequency sub-band comprises at least one MLT coefficient;

specifically, the high-frequency sequence of the left channel MLT coefficient sequence is divided into H high-frequency subbands according to a preset dividing mode, and the high-frequency sequence of the right channel MLT coefficient sequence is divided into H high-frequency subbands at the same time, where the number of MLT coefficients included in the H high-frequency subbands may be the same or different, and the application does not limit the number. Obviously, each high frequency subband is also an MLT coefficient sequence.

In practical implementation, the predetermined dividing manner may be set according to the quality of the audio, for example, defining the frequency range of each high-frequency subband, so as to divide the high-frequency sequence into H high-frequency subbands, which may define: the index of the first MLT coefficient (i.e. the index of the starting MLT coefficient) of the ith high-frequency subband among the H high-frequency subbands is P_iIndicating that the index of the last MLT coefficient (i.e., the index of the terminated MLT coefficient) is Q_iWherein i is 1. Wherein, the index of the MLT coefficient may specifically be that the MLT coefficient is atFor example, if the sequence contains 32 coefficients in a sequence, the indexes of the 32 coefficients may be 1,2,3, …, and 32, respectively.

Step 2: the intensity stereo coefficient is calculated according to the following equations (3) and (4)And

wherein, X_L(k) Representing the MLT coefficient, X, with index k in the left channel MLT coefficient sequence_R(k) Represents the MLT coefficient of index k in the right channel MLT coefficient sequence,an index of a first MLT coefficient representing an ith high frequency subband (i.e., an ith high frequency subband among H high frequency subbands into which a high frequency sequence of the left channel MLT coefficient sequence or a high frequency sequence of the right channel MLT coefficient sequence is divided),and k, i is a variable, and represents the index of the last MLT coefficient in the ith high-frequency subband.

Then, the high frequency code stream obtained by performing quantization coding on the amplitude of the intensity stereo coefficient should include: to pairIs quantized and coded to obtainFirst code stream and pairAnd carrying out quantization coding on the amplitude of the first code stream to obtain a second code stream.

And step S110, mixing the main code stream and the auxiliary code stream to obtain a stereo code stream, and outputting the stereo code stream, wherein the auxiliary code stream consists of a low-frequency code stream and a high-frequency code stream.

In an actual implementation process, there may be a plurality of ways to mix the main code stream and the auxiliary code stream, for example, arranging the auxiliary code stream behind the main code stream, and the like, which is not limited in the present application.

The stereo code stream is a final code stream obtained by coding an input stereo signal.

In the stereo encoding method of this embodiment, based on the domain transform coding and the intensity stereo theory, MLT transform is performed on a left channel signal and a right channel signal respectively to obtain a left channel MLT coefficient sequence and a right channel coefficient sequence, then the two MLT coefficient sequences are divided into a high frequency sequence and a low frequency sequence, respectively, and the high frequency sequence is further divided into a plurality of high frequency subbands, so that the left channel signal and the right channel signal are divided into a plurality of subbands (the low frequency sequence can also be regarded as one subband), the MLT domain transform coding method is used for the low frequency sequence, and the intensity stereo method is used for the high frequency subbands. Therefore, the stereo coding method of the present embodiment has better performance than AMR WB + and heaaac v2, while the delay is lower than AMR-WB + and heaaac v 2.

Two, corresponding decoding process

In the audio processing method of this embodiment, a process of decoding a stereo stream (also referred to as a stereo decoding method) may be executed by a decoding end, as shown in fig. 2, where the decoding process includes the following steps:

step S202, separating the input stereo code stream to be decoded to obtain a main code stream and an auxiliary code stream, and decoding the separated main code stream to obtain a decoded down-mix signal, wherein the stereo code stream to be decoded is the stereo code stream output in the step S110;

as can be seen from the stereo encoding process, the auxiliary code stream further includes: the high-frequency code stream and the low-frequency code stream further include: the first code stream and the second code stream.

In practical implementation, any monaural decoder corresponding to the monaural encoder used in step S102 may be used to decode the main code stream.

Step S204, decoding and Inverse quantizing the separated auxiliary code stream to obtain the amplitude of the sideband low-frequency coefficient and the intensity stereo coefficient, and performing IMLT (Inverse Modulated overlapped Transform) on the sideband low-frequency coefficient to obtain a decoded low-frequency sideband signal

Decoding and inverse quantizing a low-frequency code stream in the auxiliary code stream to obtain a sideband low-frequency coefficient; decoding and inverse quantizing the high-frequency code stream in the auxiliary code stream to obtain the amplitude of the intensity stereo coefficientAndfurthermore, decoding and inverse quantizing the first code stream in the high-frequency code stream to obtainThe second code stream is decoded and dequantizedi＝1,...,H。

Step S206, obtaining a decoded signal of a left channel low frequency band and a decoded signal of a right channel low frequency band according to the decoded low frequency sideband signal and the low frequency part of the decoded down-mix signal;

in this step S206, the low-frequency sideband signal is decoded fromAnd a low frequency part of the decoded downmix signal, the method of acquiring the decoded signal of the left channel low frequency band and the decoded signal of the right channel low frequency band may include the following steps 1-2:

step 1: obtaining a low frequency portion of a decoded downmix signalFor example, the following two methods can be used for obtaining:

the first method is as follows: performing Fast Fourier Transform (FFT) on the decoded down-mix signal to obtain an FFT coefficient sequence, and dividing the FFT coefficient sequence into a low frequency sequence and a high frequency sequence, wherein the frequency of the low frequency sequence is less than or equal to a preset second frequency value, and the frequency of the high frequency sequence is greater than the second frequency value; the low frequency sequence of the FFT coefficient sequence is subjected to IFFT (Inverse Fast Fourier Transform, also called Inverse Fast Fourier Transform) to obtain a low frequency portion of the decoded downmix signal

The FFT coefficient sequence is a sequence composed of FFT coefficients obtained by performing FFT on the decoded downmix signal, and then, the FFT coefficient sequence is divided into two parts with a preset second frequency value as a boundary: a low frequency sequence and a high frequency sequence, wherein the frequency of the low frequency sequence is in the range of (0, second frequency value ], and the frequency of the high frequency sequence is in the range of (second frequency value, + ∞).

Second, down-mixing of decoding using a low-pass filterFiltering the signal to obtain a low frequency portion of the decoded downmix signalWherein the frequency range of the low-pass filter is (0, second frequency value]。

Step 2: calculating a decoded signal of a left channel low frequency band according to the following equations (5) and (6)And a decoded signal of a low frequency band of a right channel

x_{L}^{Lo, Dec} = x_{Mixed}^{Lo, Dec} + x_{S}^{Lo, Dec} - - - (5)

x_{R}^{Lo, Dec} = x_{Mixed}^{Lo, Dec} - x_{S}^{Lo, Dec} - - - (6)

Wherein,representing the decoded low-frequency sideband signal,representing the low frequency part of the decoded down-mix signal.

Step S208, acquiring a decoding signal of a left sound channel high frequency band and a decoding signal of a right sound channel high frequency band according to the high frequency code stream in the separated auxiliary code stream and the high frequency part of the decoded down-mixing signal;

in this step S208, the method of obtaining the decoded signal of the left channel highband and the decoded signal of the right channel highband according to the magnitude of the intensity stereo coefficient and the high frequency part of the decoded downmix signal may include the following steps 1-2:

step 1: splitting a high frequency part of a decoded downmix signal into H sub-partsFor example, the following two ways can be used for division:

the first method is as follows: dividing the high-frequency sequence of the FFT coefficient sequence into H lower mixed sub-bands according to a preset dividing mode, wherein each lower mixed sub-band comprises at least one FFT coefficient, and performing IFFT on the H lower mixed sub-bands respectively to obtaini＝1,...,H；Is obtained by IFFT on the ith downmix subband,i.e., the ith sub-part in the high frequency part of the decoded downmix signal;

the preset dividing manner in this embodiment should be the same as the preset dividing manner in step 1 in step S108.

Second, the decoded downmix signal is filtered using an H-bandpass filter to obtain H sub-parts of the high frequency part of the downmix signal

The frequency range of the H band-pass filter should coincide with the frequency range of each sub-band defined in the predetermined division scheme in the first scheme.

Step 2: calculating a decoded signal of a left channel high frequency band according to the following equation (7)And decoding signal of right channel high frequency band

<math> <mrow> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>L</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <mi>s</mi> <msubsup> <mi>f</mi> <mi>L</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>x</mi> <mrow> <mi>Mixed</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>R</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <msubsup> <mi>sf</mi> <mi>R</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>x</mi> <mrow> <mi>Mixed</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mi>L</mi> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msubsup> <mi>x</mi> <mrow> <mi>L</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mi>R</mi> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msubsup> <mi>x</mi> <mrow> <mi>R</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math>

Wherein,andrepresenting the magnitude of the intensity stereo coefficient.

Thus, using the one obtained in step 1Obtained in step S204Andobtaining the high frequency part of the left channel signal according to the above formula (7)And decoding signal of right channel high frequency band

Step S210, obtaining the decoding signal of the left channel low frequency bandDecoding signal of high frequency band of left sound channelSumming to obtain a decoded left channel signal(also called a decoded signal of the left channel), and a decoded signal of the right channel low frequency band is obtainedDecoding signal of high frequency band of right sound channelSumming to obtain a decoded right channel signal(also called a decoded signal of the right channel), and outputs a decoded left channel signalAnd a decoded right channel signal

Experiments show that the time delay of stereo coding and decoding is reduced to 30ms by using the method of the embodiment of the application, and for the voice signal with the bandwidth of 8kHz, the MOS (Mean Opinion Score) of the left channel and the right channel is improved compared with the AMR-WB + under the condition that the AMR-WB is used for coding the down-mixed signal.

Example two

With respect to the first embodiment, the present application provides an audio processing apparatus to which the stereo encoding method in the above-mentioned method can be applied and an audio processing apparatus to which the stereo decoding method in the above-mentioned method can be applied, where the audio processing apparatus to which the stereo encoding method in the above-mentioned method can be applied may specifically be a stereo encoder, and the audio processing apparatus to which the stereo decoding method in the above-mentioned method can be applied may specifically be a stereo decoder.

As shown in fig. 3, an audio processing apparatus that can be used as a stereo encoder includes the following modules: a down-mix module 10, a main encoding module 20, an MLT module 30, a first acquisition module 40, a quantization encoding module 50, and a mix module 60, wherein:

a down-mixing module 10, configured to obtain an average value of a left channel signal and a right channel signal of an input stereo signal to be encoded to obtain a down-mixed signal;

a main encoding module 20, configured to encode the downmix signal input by the downmix module 10 to obtain a main code stream;

an MLT module 30, configured to perform MLT on the left channel signal and the right channel signal respectively to obtain a left channel MLT coefficient sequence and a right channel MLT coefficient sequence, and divide the left channel MLT coefficient sequence and the right channel MLT coefficient sequence into a low frequency sequence and a high frequency sequence, where a frequency of the low frequency sequence is less than or equal to a preset first frequency value, and a frequency of the high frequency sequence is greater than the first frequency value;

a first obtaining module 40, configured to obtain a low-frequency sideband signal according to a difference between a low-frequency sequence of the left channel MLT coefficient sequence and a low-frequency sequence of the right channel MLT coefficient sequence input by the MLT module 30; the stereo signal acquisition module is further used for acquiring an intensity stereo coefficient according to the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence, wherein the intensity stereo coefficient is used for representing the root mean square of the ratio of the energy of the left channel signal to the energy of the stereo signal and the root mean square of the ratio of the energy of the right channel signal to the energy of the stereo signal;

the quantization coding module 50 is configured to perform quantization coding on the amplitude and the position of the low-frequency sideband signal input by the first obtaining module 40 to obtain a low-frequency code stream; the first acquisition module 40 is also used for carrying out quantization coding on the amplitude of the intensity stereo coefficient input by the first acquisition module 40 to obtain a high-frequency code stream, and outputting an auxiliary code stream consisting of a low-frequency code stream and the high-frequency code stream;

and the mixing module 60 is configured to mix the main code stream input by the main encoding module 10 and the auxiliary code stream input by the quantization encoding module 50 to obtain a stereo code stream, and output the stereo code stream.

In order to obtain the low-frequency sideband signal according to the difference between the low-frequency sequence of the left channel MLT coefficient sequence and the low-frequency sequence of the right channel MLT coefficient sequence input by the MLT module 30, the first obtaining module 40 further comprises the following units:

a first calculating unit for calculating a low frequency sequence of the left channel MLT coefficient sequence according to the above formula (2)Low frequency sequence of right channel MLT coefficient sequenceDifference of difference

An extraction unit for calculating from the first calculation unitAnd taking the MLT coefficients with the maximum value in a preset number, and taking the MLT coefficients with the maximum value in the preset number as the low-frequency sideband signals.

In order to obtain the intensity stereo coefficients according to the high frequency sequence of the left channel MLT coefficient sequence and the high frequency sequence of the right channel MLT coefficient sequence, the first obtaining module 40 further includes the following units:

the high-frequency sub-band dividing unit is used for dividing a high-frequency sequence of the left channel MLT coefficient sequence and a high-frequency sequence of the right channel MLT coefficient sequence into H high-frequency sub-bands according to a preset dividing mode, wherein H is a natural number larger than 1, and each high-frequency sub-band comprises at least one MLT coefficient;

a second calculation unit for calculating intensity stereo coefficients according to the above-mentioned formulas (3) and (4)And

then, the quantization coding module 50 performs quantization coding on the amplitude of the intensity stereo coefficient input by the first obtaining module 40 to obtain a high-frequency code stream, and then performs quantization coding on the high-frequency code stream calculated by the second calculating unitThe amplitude of the first code stream is quantized and coded to obtain a first code streamThe amplitude of the first code stream and the amplitude of the second code stream are quantized and coded to obtain a second code stream, wherein the first code stream and the second code stream form a high-frequency code stream.

As shown in fig. 4, an audio processing apparatus that can be used as a stereo decoder includes the following modules: a separation module 101, a main decoding module 102, an inverse quantization decoding module 103, an IMLT module 104, and a second obtaining module 105, wherein:

the separation module 101 is configured to separate an input stereo code stream to be decoded to obtain a main code stream and an auxiliary code stream, where the stereo code stream to be decoded is a stereo code stream that can be output by an audio processing device of a stereo encoder;

a main decoding module 102, configured to decode the main code stream input by the separation module 101 to obtain a decoded downmix signal;

the inverse quantization decoding module 103 is configured to decode and inverse quantize the auxiliary code stream input by the separation module 101 to obtain the amplitudes of the sideband low-frequency coefficients and the intensity stereo coefficients;

an IMLT module 104, configured to perform IMLT on the sideband low-frequency coefficient input by the inverse quantization decoding module 103 to obtain a decoded low-frequency sideband signal;

a second obtaining module 105, configured to obtain a decoded signal of a left channel low frequency band and a decoded signal of a right channel low frequency band according to the decoded low frequency sideband signal input by the IMLT module 104 and the low frequency part of the decoded downmix signal input by the main decoding module 102; the left channel high-frequency decoding module is further configured to obtain a left channel high-frequency decoded signal and a right channel high-frequency decoded signal according to the amplitude of the intensity stereo coefficient input by the inverse quantization decoding module 103 and the high-frequency part of the decoded downmix signal input by the main decoding module 102; and the left channel decoding module is further configured to obtain a sum of the decoded signal of the left channel low frequency band and the decoded signal of the left channel high frequency band to obtain a decoded left channel signal, obtain a sum of the decoded signal of the right channel low frequency band and the decoded signal of the right channel high frequency band to obtain a decoded right channel signal, and output the decoded left channel signal and the decoded right channel signal.

In order to obtain the decoded signal of the left channel lowband and the decoded signal of the right channel lowband from the decoded low-frequency sideband signal input by the IMLT module 104 and the low-frequency part of the decoded downmix signal input by the main decoding module 102, the second obtaining module 105 further comprises the following units:

an FFT unit, configured to perform FFT on the decoded downmix signal input by the main decoding module 102 to obtain an FFT coefficient sequence, and divide the FFT coefficient sequence into a low frequency sequence and a high frequency sequence, where a frequency of the low frequency sequence is less than or equal to a preset second frequency value, and a frequency of the high frequency sequence is greater than the second frequency value;

an IFFT unit, configured to perform IFFT on the low-frequency sequence of the FFT coefficient sequence input by the FFT unit to obtain a low-frequency portion of the decoded downmix signal;

a third calculating unit for calculating the decoded signal of the left channel low frequency band according to the above equations (5) and (6)And a decoded signal of a low frequency band of a right channel

In order to obtain the decoded signal of the left channel highband and the decoded signal of the right channel highband according to the magnitude of the intensity stereo coefficient input from the inverse quantization decoding module 103 and the high frequency part of the decoded downmix signal input from the main decoding module 102, the second obtaining module 105 may further include the following units:

the down-mixing sub-band dividing unit is used for dividing the high-frequency sequence of the FFT coefficient sequence input by the FFT unit into H down-mixing sub-bands according to a preset dividing mode, and each down-mixing sub-band comprises at least one FFT coefficient;

IFFT unit for performing H down-mixed sub-bands inputted by the down-mixed sub-band division unitIFFT to obtain1, wherein,the IFFT is carried out on the ith lower mixed sub-band;

a fourth calculating unit for calculating the decoded signal of the left channel high frequency band according to the above formula (7)And decoding signal of right channel high frequency band

In addition, an embodiment of the present application further provides an audio processing system, where the system includes: the two audio processing devices described above, i.e., an audio processing device that can function as a stereo encoder and an audio processing device that can function as a stereo decoder. In an actual implementation process, the system may specifically be a stereo codec, or may be a system composed of a stereo encoder and a stereo decoder, which is not limited in this application.

In summary, the above embodiments of the present application can achieve the following technical effects:

based on domain transform coding and intensity stereo theory, respectively performing MLT (multi level transform) transformation on a left channel signal and a right channel signal to obtain a left channel MLT coefficient sequence and a right channel coefficient sequence, then respectively dividing the two MLT coefficient sequences into a high-frequency sequence and a low-frequency sequence, and further dividing the high-frequency sequence into a plurality of high-frequency sub-bands, so that the left channel signal and the right channel signal are respectively divided into a plurality of sub-bands (the low-frequency sequence can also be regarded as one sub-band), the low-frequency sequence is encoded by using MLT domain transform, and the high-frequency sub-band is subjected to intensity stereo. Thus, the method of the present application has better performance than AMRWB + and HEAAC v2, while the delay is lower than AMR-WB + and HEAAC v 2.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. An audio processing method, comprising:

respectively modulating, overlapping and transforming the left channel signal and the right channel signal to obtain a left channel MLT coefficient sequence and a right channel MLT coefficient sequence, and respectively dividing the left channel MLT coefficient sequence and the right channel MLT coefficient sequence into a low frequency sequence and a high frequency sequence, wherein the frequency of the low frequency sequence is less than or equal to a preset first frequency value, and the frequency of the high frequency sequence is greater than the first frequency value;

obtaining an intensity stereo coefficient according to the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence, and performing quantization coding on the amplitude of the intensity stereo coefficient to obtain a high-frequency code stream, wherein the intensity stereo coefficient is used for representing the root mean square of the ratio of the energy of the left channel signal to the energy of the stereo signal and the root mean square of the ratio of the energy of the right channel signal to the energy of the stereo signal;

and mixing the main code stream and the auxiliary code stream to obtain a stereo code stream, and outputting the stereo code stream, wherein the auxiliary code stream consists of the low-frequency code stream and the high-frequency code stream.

2. The method of claim 1, wherein the method of obtaining the low frequency sideband signal from the difference between the low frequency sequence of the left channel MLT coefficient sequence and the low frequency sequence of the right channel MLT coefficient sequence comprises:

calculating a low frequency sequence of the left channel MLT coefficient sequence according to the following formulaLow frequency sequence of the right channel MLT coefficient sequenceDifference of difference

FromAnd taking the MLT coefficients with the maximum value in a preset number, and taking the MLT coefficients with the maximum value in the preset number as the low-frequency sideband signals.

3. The method according to claim 1 or 2, wherein the method for obtaining intensity stereo coefficients from the high frequency sequence of the left channel MLT coefficient sequence and the high frequency sequence of the right channel MLT coefficient sequence comprises:

dividing the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence into H high-frequency sub-bands according to a preset dividing mode, wherein H is a natural number larger than 1, and each high-frequency sub-band comprises at least one MLT coefficient;

the intensity stereo coefficient is calculated according to the following formulaAnd

wherein, X_L(k) Represents an MLT coefficient, X, with an index of k in the left channel MLT coefficient sequence_R(k) Represents the MLT coefficients of the right channel MLT coefficient sequence indexed by k,indicating the index of the first MLT coefficient in the ith high-frequency sub-band obtained by division,representing the last MLT series in the ith high frequency subbandThe index of the number, k, i, is a variable.

4. An audio processing method, comprising:

separating an input stereo code stream to be decoded to obtain a main code stream and an auxiliary code stream, and decoding the separated main code stream to obtain a decoded down-mix signal, wherein the stereo code stream to be decoded is the stereo code stream according to any one of claims 1 to 3;

decoding and inverse quantizing the separated auxiliary code stream to obtain the amplitude of a sideband low-frequency coefficient and an intensity stereo coefficient, and performing Inverse Modulation and Lapped Transform (IMLT) on the sideband low-frequency coefficient to obtain a decoded low-frequency sideband signal;

acquiring a decoded signal of a left channel low frequency band and a decoded signal of a right channel low frequency band according to the decoded low frequency sideband signal and the low frequency part of the decoded downmix signal;

obtaining a decoded signal of a left channel high frequency band and a decoded signal of a right channel high frequency band according to the amplitude of the intensity stereo coefficient and the high frequency part of the decoded downmix signal;

5. The method of claim 4, wherein the method of obtaining the left channel lowband decoded signal and the right channel lowband decoded signal from the decoded low frequency sideband signal and the low frequency portion of the decoded downmix signal comprises:

performing Fast Fourier Transform (FFT) on the decoded down-mixed signal to obtain an FFT coefficient sequence, and dividing the FFT coefficient sequence into a low-frequency sequence and a high-frequency sequence, wherein the frequency of the low-frequency sequence is less than or equal to a preset second frequency value, and the frequency of the high-frequency sequence is greater than the second frequency value;

performing Inverse Fast Fourier Transform (IFFT) on the low-frequency sequence of the FFT coefficient sequence to obtain a low-frequency part of the decoded down-mix signal;

calculating a decoded signal of a low frequency band of a left channel according to the following formulaAnd a decoded signal of a low frequency band of a right channel

x_{L}^{Lo, Dec} = x_{Mixed}^{Lo, Dec} + x_{S}^{Lo, Dec},

x_{R}^{Lo, Dec} = x_{Mixed}^{Lo, Dec} - x_{S}^{Lo, Dec},

Wherein,representing the decoded low-frequency sideband signal,representing a low frequency part of the decoded downmix signal.

6. The method of claim 5, wherein the obtaining the left channel highband decoded signal and the right channel highband decoded signal according to the magnitude of the intensity stereo coefficient and the high frequency portion of the decoded downmix signal comprises:

dividing the high-frequency sequence of the FFT coefficient sequence into H lower mixed sub-bands according to a preset dividing mode, wherein each lower mixed sub-band comprises at least one FFT coefficient, and performing IFFT on the H lower mixed sub-bands respectively to obtain the FFT coefficient sequence1, wherein,the IFFT is carried out on the ith lower mixed sub-band;

calculating the decoded signal of the left channel high frequency band according to the following formulaAnd decoding signal of right channel high frequency band

<math> <mrow> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>L</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <mi>s</mi> <msubsup> <mi>f</mi> <mi>L</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>x</mi> <mrow> <mi>Mixed</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>R</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <msubsup> <mi>sf</mi> <mi>R</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>x</mi> <mrow> <mi>Mixed</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>H</mi> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mi>L</mi> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msubsup> <mi>x</mi> <mrow> <mi>L</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mi>R</mi> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msubsup> <mi>x</mi> <mrow> <mi>R</mi> <mo>,</mo> <mi>i</mi> </mrow> <mrow> <mi>hi</mi> <mo>,</mo> <mi>Dec</mi> </mrow> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>

Wherein,andrepresenting the magnitude of the intensity stereo coefficient.

7. An audio processing apparatus, comprising:

a modulation overlap transform (MLT) module, configured to perform MLT on the left channel signal and the right channel signal respectively to obtain a left channel MLT coefficient sequence and a right channel MLT coefficient sequence, and divide the left channel MLT coefficient sequence and the right channel MLT coefficient sequence into a low frequency sequence and a high frequency sequence, where a frequency of the low frequency sequence is less than or equal to a preset first frequency value, and a frequency of the high frequency sequence is greater than the first frequency value;

a first obtaining module, configured to obtain a low-frequency sideband signal according to a difference between a low-frequency sequence of a left channel MLT coefficient sequence and a low-frequency sequence of a right channel MLT coefficient sequence input by the MLT module; the stereo signal processing device is further configured to obtain an intensity stereo coefficient according to the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence, where the intensity stereo coefficient is used to represent a root mean square of a ratio of the energy of the left channel signal to the energy of the stereo signal and a root mean square of a ratio of the energy of the right channel signal to the energy of the stereo signal;

8. The apparatus of claim 7, wherein the first obtaining module comprises:

a first calculating unit for calculating a low frequency sequence of the left channel MLT coefficient sequence according to the following formulaLow frequency sequence of the right channel MLT coefficient sequenceDifference of difference

9. The apparatus of claim 8, wherein the first obtaining module further comprises:

the high-frequency sub-band dividing unit is used for dividing the high-frequency sequence of the left channel MLT coefficient sequence and the high-frequency sequence of the right channel MLT coefficient sequence into H high-frequency sub-bands according to a preset dividing mode, wherein H is a natural number larger than 1, and each high-frequency sub-band comprises at least one MLT coefficient;

second oneA calculation unit for calculating the intensity stereo coefficient according to the following formulaAnd

wherein, X_L(k) Represents an MLT coefficient, X, with an index of k in the left channel MLT coefficient sequence_R(k) Represents the MLT coefficients of the right channel MLT coefficient sequence indexed by k,the index of the first MLT coefficient in the ith high-frequency subband obtained by dividing the high-frequency sequence of the left channel MLT coefficient sequence or the high-frequency sequence of the right channel MLT coefficient sequence by the high-frequency subband dividing unit,and k and i are variables, and the index of the last MLT coefficient in the ith high-frequency sub-band is represented.

10. An audio processing apparatus, comprising:

a separation module, configured to separate an input stereo code stream to be decoded to obtain a main code stream and an auxiliary code stream, where the stereo code stream to be decoded is the stereo code stream according to any one of claims 7 to 9;

the Inverse Modulation Lapped Transform (IMLT) module is used for carrying out IMLT on the sideband low-frequency coefficient input by the inverse quantization decoding module to obtain a decoded low-frequency sideband signal;

11. The apparatus of claim 10, wherein the second obtaining module comprises:

the Fast Fourier Transform (FFT) unit is used for carrying out FFT on the decoded down-mixing signal input by the main decoding module to obtain an FFT coefficient sequence, and dividing the FFT coefficient sequence into a low-frequency sequence and a high-frequency sequence, wherein the frequency of the low-frequency sequence is less than or equal to a preset second frequency value, and the frequency of the high-frequency sequence is greater than the second frequency value;

an inverse fast fourier transform IFFT unit configured to perform IFFT on the low-frequency sequence of the FFT coefficient sequence input by the FFT unit to obtain a low-frequency portion of the decoded downmix signal;

a third calculating unit for calculating the decoded signal of the left channel low frequency band according to the following formulaAnd a decoded signal of a low frequency band of a right channel

x_{L}^{Lo, Dec} = x_{Mixed}^{Lo, Dec} + x_{S}^{Lo, Dec},

x_{R}^{Lo, Dec} = x_{Mixed}^{Lo, Dec} - x_{S}^{Lo, Dec},

12. The apparatus of claim 11, wherein the second obtaining module further comprises:

an IFFT unit, further configured to perform IFFT on the H lower mixed subbands input by the lower mixed subband dividing unit, respectively, to obtain1, wherein,the IFFT is carried out on the ith lower mixed sub-band;

a fourth calculating unit for calculating the decoded signal of the left channel high frequency band according to the following formulaAnd decoding signal of right channel high frequency band

Wherein,andrepresenting the magnitude of the intensity stereo coefficient.

13. An audio processing system, comprising: the audio processing apparatus according to any one of claims 7 to 9 and the audio processing apparatus according to any one of claims 10 to 12.