CN1677493A

CN1677493A - Intensified audio-frequency coding-decoding device and method

Info

Publication number: CN1677493A
Application number: CNA2004100463349A
Authority: CN
Inventors: 潘兴德; 安德斯·叶瑞特; 朱晓明; 麦可·舒克; 任为民; 王磊; 豪格·何瑞施; 邓昊; 佛里德理克·海恩
Original assignee: BEIJING FUGUO DIGITAL TECHN Co Ltd; Coding Technology Ltd; GONGYU DIGITAL TECHNOLOGY Co Ltd BEIJNG
Current assignee: BEIJING FUGUO DIGITAL TECHN Co Ltd; Coding Technology Ltd; GONGYU DIGITAL TECHNOLOGY Co Ltd BEIJNG
Priority date: 2004-04-01
Filing date: 2004-06-03
Publication date: 2005-10-05

Abstract

An enhanced audio encoding device is consisted of a signal type analyzing module, a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, a frequency-domain linear prediction and vector quantization module, and a bit-stream multiplexing module, in which the signal type analyzing module is used to carry out a signal type analysis for the input audio signals, and outputs to the psychoacoustical analyzing module, the time-frequency mapping module and the bit-stream multiplexing module; the frequency-domain linear prediction and vector quantization module carries out a linear prediction and a multilevel vector quantization to the frequency-domain coefficient, which outputs a residual sequence to the quantization and entropy encoding module for processing, and at the same time outputs the edge information to the bit-stream multiplexing module. The invention is applicable to the Hi-Fi compression encoding of audio signals with the configuration of multiple sampling rates and sound channels, and it supports audio signals with the sampling range of 8kHz to 192kHz. Meanwhile, it supports all possible sound channel configurations and supports audio encoding/decoding with a wide range of target code rate.

Description

Enhanced audio coding and decoding device and method

Technical Field

The invention relates to the technical field of audio coding and decoding, in particular to an enhanced audio coding and decoding device and method based on a perception model.

Background

To obtain high fidelity digital audio signals, the digital audio signals are subjected to audio coding or audio compression for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, e.g. there is little difference between the original input audio signal and the encoded output audio signal.

In the early eighties of the twentieth century, the advent of CDs embodied the advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and strong robustness. However, these advantages are all at the expense of very high data rates. For example, the sampling rate required for the digitization of stereo signals of CD quality is 44.1kHz and each sample needs to be uniformly quantized with 15 bits, so that the uncompressed data rate reaches 1.41Mb/s, which high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost. In order to maintain high quality audio signals, new networks and wireless multimedia digital audio systems are therefore required that must reduce the rate of data without compromising the quality of the audio. In view of the above problems, various audio compression techniques have been proposed to generate high fidelity audio signals with a very high compression ratio, such as MPEG-1/-2/-4 technique of ISO/IEC, AC-2/AC-3 technique of Dolby, ATRAC/MiniDisc/SDDS technique of Sony, and PAC/EPAC/MPAC technique of lucent. The following description will specifically select MPEG-2 AAC technology and Dubi's AC-3 technology.

The MPEG-1 technology and the MPEG-2 BC technology are high-quality coding technologies mainly used for mono and stereo audio signals, and with the increasing demand for multi-channel audio coding that achieves higher coding quality at a lower code rate, the MPEG-2 BC coding technology cannot achieve high-quality coding of five channels at a code rate lower than 540kbps because the MPEG-2 BC coding technology emphasizes backward compatibility with the MPEG-1 technology. In response to this deficiency, the MPEG-2 AAC technique has been proposed, which can achieve higher quality encoding of a five-channel signal at a rate of 320 kbps.

Fig. 1 shows a block diagram of an MPEG-2 AAC encoder comprising a gain controller 101, a filterbank 102, a temporal noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward adaptive predictor 105, and/or difference stereo module 106, a bit allocation and quantization encoding module 107 and a bitstream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further comprises a compression ratio/distortion handling controller, a scale factor module, a non-uniform quantizer and an entropy coding module.

The filter bank 102 employs a Modified Discrete Cosine Transform (MDCT) whose resolution is signal adaptive, i.e., a 2048-point MDCT transform for a steady-state signal and a 256-point MDCT transform for a transient signal; thus, for a 48kHz sampled signal, the maximum frequency resolution is 23Hz and the maximum time resolution is 2.6 ms. Also, a sine window and a Kaiser-Bessel window may be used in the filter bank 102, with the sine window being used when the harmonic spacing of the input signal is less than 140Hz and the Kaiser-Bessel window being used when the strong component spacing in the input signal is greater than 220 Hz.

The audio signal enters the filter bank 102 after passing through the gain controller 101, is filtered according to different signals, and then is processed by the time domain noise shaping module 103 on the spectral coefficient output by the filter bank 102, the time domain noise shaping technology is to perform linear predictive analysis on the spectral coefficient on the frequency domain, and then controls the shape of the quantization noise on the time domain according to the analysis, so as to achieve the purpose of controlling the pre-echo.

The intensity/coupling module 104 is used for stereo coding of signal intensity, since for signals in high frequency band (greater than 2kHz), the auditory sense of direction is related to the variation of the signal intensity (signal envelope) and is not related to the waveform of the signal, i.e. the constant envelope signal does not affect the auditory sense of direction, therefore, the intensity/coupling technique can be formed by using this feature and the related information between multiple channels to synthesize several channels into a common channel for coding.

The second-order backward adaptive predictor 105 is used for eliminating redundancy of a steady-state signal and improving coding efficiency. The sum and difference stereo (M/S) module 106 operates on channel pairs, which refer to two channels, such as left and right channels or left and right surround channels in a two-channel signal or multi-channel signal. The M/S module 106 uses the correlation between two channels in a channel pair to achieve the effects of reducing the code rate and improving the coding efficiency. The bit allocation and quantization coding block 107 is implemented by a nested loop process where the non-uniform quantizer is lossy coded and the entropy coding block is lossless coded, thus removing redundancy and reducing correlation. The nested loop includes an inner loop that adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and an outer loop that estimates the coding quality of the signal using the ratio of the quantization noise to the masking threshold. The encoded signal is finally passed through a bitstream multiplexing module 108 to form an encoded audio stream output.

In the case of sampling rate scalability, the input signal is simultaneously subjected to a quad-band polyphase filter bank (PQF) to generate four equal-bandwidth frequency bands, each of which generates 256 spectral coefficients by MDCT, for a total of 1024. A gain controller 101 is used in each frequency band. Whereas a low sample rate signal is obtained in the decoder ignoring the high frequency PQF band.

Fig. 2 gives a block schematic diagram of a corresponding MPEG-2 AAC decoder. The decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and/or difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, a temporal noise shaping module 208, a filter bank 209, and a gain control module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and a control stream. After the signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a quantized value of the signal spectrum are obtained. The inverse quantizer 203 is a set of non-uniform quantizer banks implemented by a companding function for converting integer quantized values into a reconstructed spectrum. Because the scale factor module in the encoder differentiates the current scale factor from the previous scale factor, and then the difference value is encoded by Huffman, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then the real scale factor is restored. The M/S module 205 converts the sum and difference channels into left and right channels under the control of side information. Since the second-order backward adaptive predictor 105 is used in the encoder to eliminate redundancy of the steady-state signal and improve the encoding efficiency, the prediction decoding is performed in the decoder by the prediction module 206. The strength/coupling module 207 performs strength/coupling decoding under the control of the side information, then outputs the decoded signals to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering through the filter bank 209, wherein the filter bank 209 adopts an Inverse Modified Discrete Cosine Transform (IMDCT) technology.

For the case of sampling frequency scalability, the high frequency PQF band may be ignored by the gain control module 210 to obtain a low sample rate signal.

The MPEG-2 AAC coding and decoding technology is suitable for audio signals with medium and high code rates, but the coding quality of the audio signals with low code rates or even low code rates is poor; meanwhile, the coding and decoding technology has more coding and decoding modules, higher complexity of realization and is not beneficial to real-time realization.

Fig. 3 shows a schematic structural diagram of an encoder adopting dolby AC-3 technology, which includes a transient signal detection module 301, a modified discrete cosine transform filter MDCT 302, a spectral envelope/exponent encoding module 303, a mantissa encoding module 304, a forward-backward adaptive perceptual model 305, a parameter bit allocation module 306, and a bit stream multiplexing module 307.

The audio signal is discriminated as a steady-state signal or a transient-state signal by the transient-state signal detection module 301, and at the same time, the time-domain data is mapped to the frequency-domain data by the signal adaptive MDCT filter bank 302, wherein a 512-point long window is applied to the steady-state signal and a pair of short windows is applied to the transient-state signal.

The spectral envelope/exponent coding module 303 codes the exponent portion of the signal in three modes, D15, D25 and D45 coding modes, according to the requirements of code rate and frequency resolution. The AC-3 technique differentially encodes the spectral envelope in frequency because at most 2 increments are required, each increment representing a 6dB level change, and the absolute value encoding is used for the first dc term, and the remaining exponents are differentially encoded. In D15 spectral envelope index coding, each index requires approximately 2.33 bits, 3 difference groups are coded in a word length of 7 bits, and the D15 coding mode provides fine frequency resolution by sacrificing time resolution. Since fine frequency resolution is only required for relatively stationary signals, while the spectrum of such signals remains relatively constant over many blocks, D15 is transmitted occasionally, typically once every 6 sound blocks (one data frame) of the spectral envelope, for stationary signals. When the signal spectrum is unstable, the spectrum estimation value needs to be updated frequently. The estimates are encoded with a smaller frequency resolution, typically using the D25 and D45 encoding modes. The D25 encoding mode provides adequate frequency and time resolution, with differential encoding for every other frequency coefficient, thus requiring approximately 1.15 bits per exponent. The D25 coding mode may be used when the spectrum is stable over 2 to 3 blocks and then changes abruptly. The D45 encoding mode is to differentially encode every third frequency coefficient, so that each exponent requires approximately 0.58 bits. The D45 encoding mode provides high time resolution and low frequency resolution and is therefore generally used in the encoding of transient signals.

The forward-backward adaptive perceptual model 305 is used to estimate the masking threshold of the signal per frame. The forward adaptive part is only applied at the encoder end, and under the limit of code rate, a group of optimal perceptual model parameters are estimated through an iterative loop, and then the parameters are transferred to the backward adaptive part to estimate the masking threshold of each frame. The backward adaptive part is applied at both encoder and decoder side.

The parametric bit allocation module 306 analyzes the spectral envelope of the audio signal according to masking criteria to determine the number of bits allocated to each mantissa. This module 306 performs global bit allocation for all channels using one pool of bits. During encoding in the mantissa encoding module 304, bits are cyclically taken out from the bit pool and allocated to all channels, and quantization of mantissas is adjusted according to the number of available bits. For the purpose of compression coding, the AC-3 encoder also uses a high-frequency coupling technique, which divides the high-frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of human ear, and then selects some channels to couple from a certain sub-band. Finally, the AC-3 audio stream output is formed by the bitstream multiplexing module 307.

Figure 4 shows a schematic flow chart of decoding using dolby AC-3. Firstly, inputting a bit stream coded by an AC-3 coder, carrying out data frame synchronization and error code detection on the bit stream, and carrying out error code masking or mute processing if a data error code is detected. Then, unpacking the bit stream to obtain main information and side information, and then performing exponential decoding. In performing exponential decoding, two pieces of side information are needed: one is the number of packed indices; one is the exponential strategy employed, such as the D15, D25, or D45 patterns. And carrying out bit allocation on the decoded exponent and the bit allocation side information to indicate the bit number used by each packed mantissa to obtain a group of bit allocation pointers, wherein each bit allocation pointer corresponds to one coded mantissa. The bit allocation pointer indicates the quantizer used for the mantissas and the number of bits each mantissa occupies in the codestream. The single encoded mantissa value is dequantized and converted to a dequantized value, and the mantissa occupying zero bits is restored to zero or replaced with a random dither value under the control of the dither mask. Then, the decoupling operation is performed, which is to recover the high frequency part, including the exponent and mantissa, of the coupled channel from the common coupled channel and the coupling factor. If a subband is subjected to matrix processing when the coding end adopts 2/0 mode coding, the sum and difference channel values of the subband need to be converted into left and right channel values through matrix recovery at the decoding end. The code stream contains a dynamic range control value for each audio block, which is dynamic range compressed to change the magnitude of the coefficients, including the exponent and mantissa. And performing inverse transformation on the frequency domain coefficient to convert the frequency domain coefficient into a time domain sample, then performing windowing processing on the time domain sample, and performing overlap addition on adjacent blocks to reconstruct a PCM audio signal. When the number of channels output by decoding is less than the number of channels in the encoded bit stream, the audio signal needs to be subjected to down-mixing processing, and finally, a PCM stream is output.

Dolby AC-3 encoding techniques are mainly aimed at high bitrate multi-channel surround signals, but their encoding effect is poor when the encoding bitrate of the 5.1 channels is below 384 kbps; but also for mono and binaural stereo.

In summary, the existing encoding and decoding technology cannot comprehensively solve the encoding and decoding quality of audio signals with very low code rate, low code rate and high code rate and mono-channel and dual-channel signals, and the implementation is complex.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide an apparatus and a method for enhancing audio encoding and decoding, so as to solve the problems of low encoding efficiency and poor quality of the audio signal with lower code rate in the prior art.

The invention relates to an enhanced audio coding device, which comprises a signal type analysis module, a psychoacoustic analysis module, a time-frequency mapping module, a frequency domain linear prediction and vector quantization module, a quantization and entropy coding module and a bit stream multiplexing module, wherein the signal type analysis module is used for analyzing the signal type of a signal; the signal type analysis module is used for carrying out signal type analysis on an input audio signal, outputting the audio signal to the psychoacoustic analysis module and the time-frequency mapping module, and outputting a signal type analysis result to the bit stream multiplexing module; the psychoacoustic analysis module is used for calculating a masking threshold and a signal-to-mask ratio of an input audio signal and outputting the masking threshold and the signal-to-mask ratio to the quantization and entropy coding module; the time-frequency mapping module is used for converting a time-domain audio signal into a frequency-domain coefficient; the frequency domain linear prediction and vector quantization module is used for performing linear prediction and multi-level vector quantization on frequency domain coefficients, outputting a residual sequence to the quantization and entropy coding module, and outputting side information to the bit stream multiplexing module; the quantization and entropy coding module is used for quantizing and entropy coding the residual sequence under the control of the signal-to-mask ratio output by the psychoacoustic analysis module and outputting the residual sequence to the bit stream multiplexing module; the bit stream multiplexing module is used for multiplexing the received data to form a code stream of audio coding.

The invention relates to an enhanced audio decoding device, which comprises a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, an inverse frequency domain linear prediction and vector quantization module and a frequency-time mapping module; the bit stream demultiplexing module is used for demultiplexing the compressed audio data stream and outputting corresponding data signals and control signals to the entropy decoding module and the inverse frequency domain linear prediction and vector quantization module; the entropy decoding module is used for decoding the signals, recovering the quantized value of the spectrum and outputting the quantized value to the inverse quantizer group; the inverse quantizer group is used for reconstructing an inverse quantized spectrum and outputting the inverse quantized spectrum to the inverse frequency domain linear prediction and vector quantization module; the inverse frequency domain linear prediction and vector quantization module is used for carrying out inverse quantization processing and inverse linear prediction filtering on the inverse quantized spectrum to obtain a spectrum before prediction and outputting the spectrum to the frequency-time mapping module; and the frequency-time mapping module is used for carrying out frequency-time mapping on the spectral coefficient to obtain a time domain audio signal of a low frequency band.

The invention is suitable for high-fidelity compression coding of audio signals with various sampling rates and channel configurations, and can support audio signals with the sampling rate of 8kHz to 192 kHz; all possible channel configurations can be supported; and supports audio coding/decoding at a wide range of target code rates.

Drawings

FIG. 1 is a block diagram of an MPEG-2 AAC encoder;

FIG. 2 is a block diagram of an MPEG-2 AAC decoder;

FIG. 3 is a schematic diagram of an encoder employing Dolby AC-3 technology;

FIG. 4 is a schematic diagram of a decoding flow employing the Dolby AC-3 technique;

FIG. 5 is a schematic diagram of an audio encoding apparatus according to the present invention;

FIG. 6 is a schematic structural diagram of an audio decoding apparatus according to the present invention;

FIG. 7 is a schematic structural diagram of a first embodiment of an encoding apparatus according to the present invention;

FIG. 8 is a schematic diagram of a filtering architecture using Harr wavelet based wavelet transform;

FIG. 9 is a schematic diagram of time-frequency partition obtained by Harr wavelet based wavelet transform;

FIG. 10 is a schematic structural diagram of a decoding apparatus according to a first embodiment of the present invention;

FIG. 11 is a schematic structural diagram of a second embodiment of an encoding apparatus according to the present invention;

FIG. 12 is a schematic structural diagram of a second embodiment of the decoding apparatus of the present invention;

FIG. 13 is a schematic structural diagram of a third embodiment of an encoding apparatus according to the present invention;

FIG. 14 is a schematic structural diagram of a third embodiment of a decoding apparatus according to the present invention;

FIG. 15 is a schematic structural diagram of a fourth embodiment of an encoding apparatus according to the present invention;

FIG. 16 is a schematic structural diagram of a fourth embodiment of a decoding apparatus according to the present invention;

FIG. 17 is a schematic structural diagram of a fifth embodiment of the encoding apparatus of the present invention;

FIG. 18 is a schematic structural diagram of a fifth embodiment of the decoding apparatus of the present invention;

FIG. 19 is a schematic structural diagram of a sixth embodiment of the encoding apparatus of the present invention;

FIG. 20 is a schematic structural diagram of a sixth embodiment of the decoding apparatus according to the present invention;

FIG. 21 is a schematic structural diagram of a seventh embodiment of the encoding apparatus of the present invention;

fig. 22 is a schematic structural diagram of a seventh embodiment of the decoding device of the present invention.

Detailed Description

Fig. 1 to 4 are schematic structural diagrams of several encoders in the prior art, which have been introduced in the background art and are not described herein again.

It should be noted that: for convenience and clarity of the present invention, the following embodiments of the encoding and decoding apparatuses are described in a corresponding manner, but do not indicate that the encoding apparatus and the decoding apparatus are necessarily in a one-to-one correspondence.

As shown in fig. 5, the audio encoding apparatus provided by the present invention includes a signal type analyzing module 50, a psychoacoustic analyzing module 51, a time-frequency mapping module 52, a frequency-domain linear prediction and vector quantization module 53, a quantization and entropy coding module 54, and a bitstream multiplexing module 55; wherein the signal type analyzing module 50 is used for performing signal type analysis on the input audio signal; the psychoacoustic analysis module 51 is configured to calculate a masking threshold and a signal-to-mask ratio of the audio signal; the time-frequency mapping module 52 is configured to convert the time-domain audio signals into frequency-domain coefficients; the frequency domain linear prediction and vector quantization module 53 is configured to perform linear prediction and multi-level vector quantization on the frequency domain coefficients, output a residual sequence to the quantization and entropy coding module 54, and output side information to the bitstream multiplexing module 55; the quantization and entropy coding module 54 is used for performing quantization and entropy coding on the residual coefficient under the control of the signal-to-mask ratio output by the psychoacoustic analysis module 51, and outputting the residual coefficient to the bit stream multiplexing module 55; the bitstream multiplexing module 55 is configured to multiplex the received data to form an audio encoding stream.

After the digital audio signal is input into the signal preprocessing module 50, the signal type is analyzed, and then the signal is respectively input into the psychoacoustic analysis module 51 and the time-frequency mapping module 52, on one hand, the masking threshold and the signal-to-mask ratio of the frame of audio signal are calculated in the psychoacoustic analysis module 51, and the signal-to-mask ratio is transmitted to the quantization and entropy coding module 54 as a control signal; the audio signal in the time domain on the other hand is transformed into frequency domain coefficients by a time-frequency mapping module 52. The frequency domain coefficients are transmitted to the frequency domain linear prediction and vector quantization module 53, if the gain threshold of the frequency domain coefficients meets the given condition, the frequency domain coefficients are subjected to linear prediction filtering, the obtained prediction coefficients are converted into line spectrum frequency coefficients lsf (line spectrum frequency), then the code word indexes of all levels of codebooks are searched and calculated by adopting the optimal distortion metric criterion, the code word indexes are transmitted to the bit stream multiplexing module 55 as side information, and the residual sequence obtained through prediction analysis is output to the quantization and entropy coding module 54. The residual sequence/frequency domain coefficients are quantized and entropy coded in a quantization and entropy coding module 54 under the control of the signal-to-mask ratio output by the psychoacoustic analysis module 51. The encoded data and side information are input to the bit stream multiplexing module 55, and multiplexed to form a code stream of the enhanced audio coding.

The respective constituent modules of the above-described audio encoding apparatus are explained in detail below.

In the present invention, the signal type analyzing module 50 is used for performing type analysis on the input audio signal. The signal type analyzing module 50 performs forward and backward masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the type of the signal is a slowly varying signal or a rapidly varying signal, and if the type of the signal is a rapidly varying signal, the related parameter information of the abrupt change component, such as the position where the abrupt change signal occurs and the intensity of the abrupt change signal, is continuously calculated.

The psychoacoustic analysis module 51 is mainly used for calculating a masking threshold, a perceptual entropy, and a signal-to-mask ratio of the input audio signal. The number of bits required for transparent coding of the current signal frame can be dynamically analyzed according to the perceptual entropy calculated by the psychoacoustic analysis module 51, so as to adjust the bit allocation between frames. The psychoacoustic analysis module 51 outputs the signal-to-mask ratio of each sub-band to the quantization and entropy coding module 54, which controls it.

The time-frequency mapping module 52 is configured to transform the audio signal from a time domain signal to a frequency domain coefficient, and is composed of a filter bank, which may specifically be a Discrete Fourier Transform (DFT) filter bank, a Discrete Cosine Transform (DCT) filter bank, a Modified Discrete Cosine Transform (MDCT) filter bank, a cosine modulation filter bank, a wavelet transform filter bank, or the like.

The frequency domain coefficients obtained by time-frequency mapping are transmitted to the frequency domain linear prediction and vector quantization module 53 for linear prediction and vector quantization. The frequency domain linear prediction and vector quantization module 53 is composed of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer. Inputting the frequency domain coefficient into a linear prediction analyzer for prediction analysis to obtain prediction gain and a prediction coefficient, and if the prediction gain meets a certain condition, outputting the frequency domain coefficient into a linear prediction filter for filtering to obtain a prediction residual sequence of the frequency domain coefficient; the residual sequence is directly output to the quantization and entropy coding block 54, while the prediction coefficients are converted into line spectrum versus frequency coefficients LSF by a converter, and then enter a vector quantizer for multi-level vector quantization, and the quantized side information is transmitted to the bitstream multiplexing block 55.

The frequency domain linear prediction processing of the audio signal can effectively suppress pre-echo and obtain larger coding gain. Let us assume a real signal x (t) whose squared Hilbert envelope e (t) is expressed as:

e(t)＝F^-1{∫C(ξ)·C^*(ξ-f)dξ}，

where c (f) is a single-sided spectrum corresponding to the positive frequency components of signal x (t), i.e., the Hilbert envelope of the signal is related to the autocorrelation function of the spectrum of the signal. And the power spectral density function of a signal is related to the autocorrelation function of its time-domain waveform by: psd (F) { (τ) · x { (F) } F { (τ) · x { (τ)^*(τ -t) d τ }. The squared Hilbert envelope of the signal in the time domain is therefore a dual relationship to the power spectral density function of the signal in the frequency domain. From the above, it can be seen that the partial band-pass signals in each of the certain frequency rangesIf its Hilbert envelope is kept constant, the autocorrelation of neighboring spectral values will also be kept constant, which means that the sequence of spectral coefficients is a steady-state sequence with respect to frequency, so that the spectral values can be processed with predictive coding techniques, effectively representing the signal with a common set of prediction coefficients.

The quantization and entropy coding module 54 further comprises a non-linear quantizer bank and an encoder, wherein the quantizer may be a scalar quantizer or a vector quantizer. Vector quantizers are further classified into two broad categories, memoryless vector quantizers and memoryless vector quantizers. For a memoryless vector quantizer, each input vector is quantized independently, independent of the previous vectors; the memory vector quantizer considers previous vectors when quantizing a vector, i.e., uses the correlation between vectors. The main memoryless vector quantizer comprises a full search vector quantizer, a tree search vector quantizer, a multi-stage vector quantizer, a gain/waveform vector quantizer and a separate mean vector quantizer; the main memory vector quantizers include a prediction vector quantizer and a finite state vector quantizer.

If a scalar quantizer is employed, the non-linear quantizer bank further includes M sub-band quantizers. The quantization is mainly performed by using scale factors in each sub-band quantizer, specifically: all frequency domain coefficients in M scale factor bands are subjected to nonlinear compression, the frequency domain coefficients of the sub-bands are quantized by using scale factors, quantized spectrums represented by integers are obtained and output to an encoder, the first scale factor in each frame of signal is output to a bit stream multiplexing module 55 as a common scale factor, and other scale factors and the previous scale factor are subjected to differential processing and then output to the encoder.

The scale factor in the above steps is a constantly changing value, and can be adjusted according to a bit allocation strategy. The invention provides a bit allocation strategy with minimum global perception distortion, which comprises the following specific steps:

first, each subband quantizer is initialized and the appropriate scale factor is chosen such that the quantized values of the spectral coefficients in all subbands are 0. At this time, the quantization noise of each sub-band is equal to the energy value of each sub-band, the noise mask ratio NMR of each sub-band is equal to the signal mask ratio SMR thereof, the number of bits consumed for quantization is 0, and the remaining number of bits Bl is equal to the target number of bits B.

Secondly, searching a sub-band with the largest noise masking ratio NMR, if the largest noise masking ratio NMR is less than or equal to 1, keeping the scale factor unchanged, outputting a distribution result, and ending the bit distribution process; otherwise, reducing the scale factor of the corresponding sub-band quantizer by one unit, and then calculating the bit number delta B required to be increased for the sub-band_i(Q_i). If the remaining bit number B of the sub-band_l≥ΔB_i(Q_i) Confirming the modification of the sub-scale factor and remaining bit number B_lMinus Δ B_i(Q_i) And recalculating the noise masking ratio NMR of the sub-band, then continuing to search the sub-band with the largest noise masking ratio NMR, and repeatedly executing the subsequent steps. If the number of remaining bits B of the subband is_l＜ΔB_i(Q_i) If so, canceling the modification, reserving the last scale factor and the residual bit number, and finally outputting the distribution result to finish the bit distribution process.

If a vector quantizer is adopted, a plurality of M-dimensional vectors consisting of frequency domain coefficients are input into a nonlinear quantizer group, each M-dimensional vector is subjected to spectrum flattening according to a flattening factor, namely, the dynamic range of a spectrum is reduced, then a code word with the minimum distance from the vector to be quantized is found in a codebook by the vector quantizer according to a subjective perception distance measurement criterion, and a corresponding code word index is transmitted to an encoder. The flattening factor is adjusted according to a bit allocation strategy of vector quantization, and the bit allocation of vector quantization is controlled according to the perceptual importance among different subbands.

After the quantization process, the statistical redundancy of the quantized coefficients and the side information is further removed by using an entropy coding technique. Entropy coding is a source coding technique, and its basic idea is: shorter length codewords are given to symbols with a greater probability of occurrence and longer codewords are given to symbols with a lesser probability of occurrence, thusThe length of the average codeword is shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, the average length N of the codeword will satisfy using the appropriate variable length coding

<math> <mrow> <mo>[</mo> <mfrac> <mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>]</mo> <mo>≤</mo> <mover> <mi>n</mi> <mo>&OverBar;</mo> </mover> <mo><</mo> <mo>[</mo> <mfrac> <mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>+</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <mo>]</mo> <mo>,</mo> </mrow> </math>

Where h (x) represents the entropy of the source and x represents the sign variable. Since the entropy h (x) is the shortest limit of the average codeword length, the above formula shows that the average codeword length is very close to its lower entropy h (x), so this variable length coding technique becomes "entropy coding". The entropy coding mainly includes Huffman coding, arithmetic coding or run-length coding, and any of the above coding methods can be adopted in the entropy coding of the present invention.

The quantization spectrum output after quantization by the scalar quantizer and the scale factor after the difference processing are subjected to entropy coding in an encoder to obtain a codebook number, a codebook number coding value and a lossless coding quantization spectrum, the codebook number is subjected to entropy coding to obtain a codebook number coding value, and the scale factor coding value, the codebook number coding value and the lossless coding quantization spectrum are output to a bit stream multiplexing module 55.

The codeword index obtained by quantization by the vector quantizer is subjected to one-dimensional or multi-dimensional entropy coding in an encoder to obtain a coded value of the codeword index, and then the coded value of the codeword index is output to the bitstream multiplexing module 55.

The bit stream multiplexing module 55 receives the side information output by the frequency domain linear prediction and vector quantization module 53 and the code stream or code value of the code word index output by the quantization and entropy coding module 54, which includes the common scale factor, the scale factor code value, the code book sequence number code value and the lossless coding quantization spectrum, and multiplexes them to obtain the compressed audio data stream.

The encoding method based on the encoder specifically includes: performing signal type analysis on an input audio signal; calculating the signal-to-mask ratio of the signal after type analysis; performing time-frequency mapping on the signal after the signal type analysis to obtain a frequency domain coefficient of the audio signal; performing standard linear prediction analysis on the frequency domain coefficient to obtain a prediction gain and a prediction coefficient; judging whether the prediction gain exceeds a set threshold value, if so, performing frequency domain linear prediction error filtering on the frequency domain coefficient according to the prediction coefficient to obtain a linear prediction residual sequence of the frequency domain coefficient; converting the prediction coefficient into a line spectrum pair frequency coefficient, and performing multi-level vector quantization processing on the line spectrum pair frequency coefficient to obtain side information; quantizing and entropy coding the residual sequence; if the prediction gain does not exceed the set threshold, quantizing and entropy coding the frequency domain coefficient; multiplexing the side information and the coded audio signal to obtain a compressed audio code stream.

The step of analyzing the signal type is to analyze the forward and backward masking effect based on the adaptive threshold and the waveform prediction to determine whether the signal is of a fast-changing type or a slow-changing type, and the specific steps are as follows: decomposing input audio data into frames; decomposing an input frame into a plurality of sub-frames, and searching a local maximum point of the PCM data absolute value on each sub-frame; selecting a subframe peak value from the local maximum points of each subframe; for a certain subframe peak, predicting typical sample values of a plurality of (typically, 4) subframes delayed forward relative to the subframe by using a plurality of (typically, 3) subframe peaks before the subframe; calculating the difference value and the ratio of the subframe peak value and the predicted typical sample value; if the predicted difference and the ratio are both larger than the set threshold, judging that the subframe has a jump signal, confirming that the subframe has a local maximum peak point with the capability of backward masking pre-echo, and if a subframe with a small enough peak value exists between the front end of the subframe and 2.5ms before the masking peak point, judging that the frame signal belongs to a fast-changing type signal; if the prediction difference value and the ratio are not larger than the set threshold value, the steps are repeated until the frame signal is judged to be the fast-changing type signal or the last subframe is reached, and if the frame signal is not judged to be the fast-changing type signal when the last subframe is reached, the frame signal belongs to the slowly-changing type signal.

There are many methods for performing time-frequency transformation on time-domain audio signals, such as Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Modified Discrete Cosine Transform (MDCT), cosine modulated filter bank, wavelet transform, etc. The following describes the time-frequency mapping process by taking modified discrete cosine transform MDCT and cosine modulation filtering as examples.

For the case of performing time-frequency transformation by using Modified Discrete Cosine Transform (MDCT), time domain signals of M samples of a previous frame and M samples of a current frame are selected, windowing is performed on the time domain signals of 2M samples of the two frames, and then MDCT transformation is performed on the windowed signals, so that M frequency domain coefficients are obtained.

The impulse response of the MDCT analysis filter is:

the MDCT transforms to:

wherein: w (n) is a window function; x (n) is an input time domain signal of the MDCT transform; x (k) is the output frequency domain signal of the MDCT transform.

To satisfy the condition of complete reconstruction of the signal, the window function w (n) of the MDCT transform must satisfy the following two conditions:

w (2M-1-n) ═ w (n) and w²(n)+w²(n+M)＝1。

In practice, a Sine window may be selected as the window function. Of course, the above-described limitation of the window function can also be modified by using a biorthogonal transform with specific analysis filters and synthesis filters.

For the case of performing time-frequency transformation by cosine modulation filtering, time-domain signals of M samples of a previous frame and M samples of a current frame are selected, then window-adding operation is performed on the time-domain signals of 2M samples of the two frames, and then cosine modulation transformation is performed on the windowed signals, so that M frequency-domain coefficients are obtained.

The impulse response of the conventional cosine modulation filtering technique is

n＝0，1，...，N_h-1

n＝0，1，...，N_f-1

Wherein K is more than or equal to 0 and less than M-1, n is more than or equal to 0 and less than 2KM-1, K is an integer more than zero,

suppose an analysis window (analysis prototype filter) p of an M subband cosine modulated filter bank_a(N) has an impulse response length of N_aSynthesis window (synthesis prototype filter) p_s(N) has an impulse response length of N_s. When the analysis window and the synthesis window are equal, i.e. p_a(n)＝p_s(N), and N_a＝N_sFrom the above two formulasThe cosine modulated filter bank shown is a quadrature filter bank, where the matrices H and F ([ H ]) are]_n，k＝h_k(n)，[F]_n，k＝f_k(n)) is an orthogonal transformation matrix. To obtain a linear phase filter bank, a symmetric window p is further defined_a(2KM-1-n)＝p_a(n) of (a). In order to ensure the complete reconstruction of orthogonal and bi-orthogonal Systems, the window function should satisfy certain conditions, as described in the literature, "Multirate Systems and Filter Banks", p.p. vaidynathan, preptic Hall, Englewood Cliffs, NJ, 1993.

The calculation of the masking threshold and the signal-to-mask ratio of the pre-processed audio signal comprises the steps of:

firstly, mapping a signal from a time domain to a frequency domain. The time domain data may be converted to frequency domain coefficients X [ k ] using fast Fourier transform and Hanning Window (Hanning Window) techniques]。X[k]By amplitude r k]And phase phi k]Is represented by X [ k ]]＝r[k]e^jφ[k]Then the energy e [ b ] of each sub-band]Is the sum of the energies of all spectral lines within the sub-band, i.e.

Wherein k is_lAnd k_hRespectively representing the upper and lower boundaries of the subband b.

Second, tonal and non-tonal components of the signal are determined. The tonality of the signal is estimated by inter-frame prediction for each spectral line, the euclidean distance between the predicted value and the true value of each spectral line is mapped as an unpredictable measure, the highly predictive spectral components are considered as being very tonal, and the less predictive spectral components are considered as being noise-like.

Amplitude r of the predicted value_predAnd phase phi_predCan be expressed by the following formula:

r_pred[k]＝r_t-1[k]+(r_t-1[k]-r_t-2[k])

φ_pred[k]＝φ_t-1[k]+(φ_t-1[k]-φ_t-2[k])，

wherein t represents a coefficient of the current frame; t-1 represents the coefficient of the previous frame; t-2 denotes the coefficients of the first two frames.

The unpredictable measure c k is then calculated as:

c [k] = \frac{dist (X [k], X_{pred} [k])}{r [k] + | r_{pred} [k]}

wherein, Euclidean distance dist (X [ k ])]，X_pred[k]) Calculated using the formula:

dist(X[k]，X_pred[k])＝|X[k]-X_pred[k]|

＝((r[k]cos(φ[k])-r_pred[k]cos(φ_pred[k]))²+(r[k]sin(φ[k])-r_pred[k]sin(φ_pred[l]))²)^。

thus, the degree of unpredictability c b for each sub-band]Is a weighted sum of the energy of all spectral lines within the sub-band over its unpredictability, i.e.

Sub-band energy e b]And degree of unpredictability c [ b ]]Respectively carrying out convolution operation with the spreading function to obtain sub-band energy spreading e_s[b]And subband unpredictable degree extension c_s[b]The spreading function of mask i over subband b is denoted as s [ i, b ]]. To eliminate the effect of the expansion function on the energy transformation, a subband unpredictability expansion c is required_s[b]Performing normalization processing, the result of normalization is used

Is shown as

{\tilde{c}}_{s} [b] = \frac{c_{s} [b]}{e_{s} [b]} .

Similarly, to eliminate the effect of the spreading function on the subband energies, a normalized energy spread is definedIs composed of

{\tilde{e}}_{s} [b] = \frac{e_{s} [b]}{n [b]},

Wherein the normalization factor n [ b ]]Comprises the following steps:

bmax is the number of sub-bands into which the frame signal is divided.

Expansion according to normalized unpredictability

The tonality of a subband t b can be calculated]：

t [b] = - 0.299 - {0.43 \log}_{e} ({\tilde{c}}_{s} [b]),

And t is not less than 0]≤1。

When t [ b ] is 1, the subband signal is a pure tone; when t [ b ] is 0, it indicates that the subband signal is white noise.

And thirdly, calculating the Signal-to-Noise Ratio (SNR) required by each sub-band. The value of Noise-Masking-Tone (NMT) of all subbands is set to 5dB, the value of Tone-Masking-Noise (TMN) is set to 18dB, and the SNR [ b ] required for each subband is 18 tb +6 (1-tb ]) if the Noise is not perceived.

And fourthly, calculating a masking threshold value of each sub-band and the perceptual entropy of the signal. Calculating the noise energy threshold value nb of each sub-band as

n [b] = {\tilde{e}}_{s} [b] 10^{- SNR [b] / 10} .

In order to avoid the influence of pre-echo, the noise energy threshold value n [ b ] of the current frame is used]Noise energy threshold n from previous frame_prev[b]Comparing to obtain the masking threshold value n [ b ] of the signal]＝min(n[b]，2n_prev[b]) This ensures that the masking threshold is not biased by the high energy impact at the near end of the analysis window.

Further, consider a stationary masking threshold qsthr [ b [ ]]The masking threshold of the final signal is selected to be the larger of the static masking threshold and the masking threshold calculated above, i.e., n [ b ]]＝max(n[b]，qsthr[b]). The perceptual entropy is then calculated using the formula, i.e.

<math> <mrow> <mi>pe</mi> <mo>=</mo> <mo>-</mo> <munderover> <mi>Σ</mi> <mrow> <mi>b</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>b</mi> <mi>max</mi> </mrow> </munderover> <mrow> <mo>(</mo> <msub> <mi>cbwidth</mi> <mi>b</mi> </msub> <mo>×</mo> <msub> <mi>log</mi> <mn>10</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>[</mo> <mi>b</mi> <mo>]</mo> <mo>/</mo> <mrow> <mo>(</mo> <mi>e</mi> <mo>[</mo> <mi>b</mi> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

Wherein cbwidth_bIndicating the number of spectral lines contained in each sub-band.

The fifth step: a Signal-to-Mask Ratio (SMR) is calculated for each subband Signal. Signal-to-mask ratio SMR [ b ] for each sub-band]Is composed of

SMR [b] = {10 \log}_{10} (\frac{e [b]}{n [b]}) .

After the frequency domain coefficients are obtained, linear prediction and vector quantization are performed on the frequency domain coefficients. Firstly, standard linear prediction analysis is carried out on the frequency domain coefficient, including calculation of an autocorrelation matrix and recursive execution of a Levinson-Durbin algorithm to obtain a prediction gain and a prediction coefficient. Judging whether the calculated prediction gain exceeds a preset threshold value or not, and if so, performing frequency domain linear prediction error filtering on the frequency domain coefficient according to the prediction coefficient; otherwise, the frequency domain coefficient is not processed, and the next step is executed to quantize and entropy encode the frequency domain coefficient.

Linear prediction can be divided into forward prediction and backward prediction, wherein forward prediction refers to prediction of a current value using values before a certain time, and backward prediction refers to prediction of a current value using values after a certain time. Linear prediction error filtering, a linear prediction filter function of

Wherein a is_iRepresenting the prediction coefficient and p the prediction order. After filtering the frequency domain coefficient X (k) after time-frequency transformation, a prediction error E (k), also called a residual sequence, is obtained, and the prediction error E (k) and the residual sequence satisfy the relationship

Thus, through frequency domain linear predictive filtering, time-frequency variationThe transformed frequency domain coefficients X (k) can be obtained by using the residual sequence E (k) and a set of prediction coefficients a_iAnd (4) showing. Then the set of prediction coefficients a_iAnd converting the linear spectrum frequency coefficient into a linear spectrum frequency coefficient LSF, performing multi-stage vector quantization on the linear spectrum frequency coefficient LSF, selecting an optimal distortion measurement criterion (such as a nearest neighbor criterion) for the vector quantization, searching and calculating code word indexes of all levels of codebooks, determining code words corresponding to prediction coefficients by the aid of the code word indexes, and outputting the code word indexes as side information. Meanwhile, the residual sequence e (k) is quantized and entropy-encoded. As can be seen from the principle of linear predictive analysis coding, the dynamic range of the residual sequence of spectral coefficients is smaller than that of the original spectral coefficients, so that fewer bits can be allocated for quantization, or an improved coding gain can be obtained for the same number of bits.

After the signal-to-mask ratios of the sub-band signals are obtained, the frequency domain coefficients or residual sequences are quantized and entropy coded according to the signal-to-mask ratios, wherein the quantization may be scalar quantization or vector quantization.

Scalar quantization comprises the steps of: carrying out nonlinear compression on the frequency domain coefficients in all scale factor bands; quantizing the frequency domain coefficient of each sub-band by using the scale factor of each sub-band to obtain a quantized spectrum represented by an integer; selecting a first scale factor in each frame signal as a common scale factor; and carrying out differential processing on other scale factors and the previous scale factor.

The vector quantization comprises the following steps: forming the frequency domain coefficients into a plurality of multi-dimensional vector signals; performing spectrum flattening on each M-dimensional vector according to a flattening factor; and searching a code word with the minimum distance from the vector to be quantized in the codebook according to the subjective perception distance measure criterion to obtain the code word index of the code word.

The entropy encoding step includes: entropy coding is carried out on the quantized spectrum and the scale factor after the difference processing, and a code book serial number, a scale factor coding value and a lossless coding quantized spectrum are obtained; and entropy coding is carried out on the code book serial number to obtain a code book serial number coding value.

Or the following steps: and carrying out one-dimensional or multidimensional entropy coding on the code word index to obtain a coding value of the code word index.

The entropy encoding method may be any of conventional Huffman coding, arithmetic coding, run-length coding, and the like.

And after quantization and entropy coding processing, obtaining an audio coding signal, multiplexing the signal with a common scale factor, side information and a signal type analysis result to obtain a compressed audio code stream.

FIG. 6 is a schematic structural diagram of an audio decoding apparatus according to the present invention. The audio decoding apparatus includes a bitstream demultiplexing module 801, an entropy decoding module 802, an inverse quantizer 803, an inverse frequency-domain linear prediction and vector quantization module 804, and a frequency-time mapping module 805. The compressed audio data stream is demultiplexed by the bit stream demultiplexing module 801 to obtain corresponding data signals and control signals, and the corresponding data signals and control signals are output to the entropy decoding module 802 and the inverse frequency domain linear prediction and vector quantization module 804. The data signal and the control signal are decoded in the entropy decoding module 802 to recover the quantized value of the spectrum. The quantized values are reconstructed in the inverse quantizer 803 to obtain an inverse quantized spectrum, which is output to the inverse frequency domain linear prediction and vector quantization module 804, where inverse quantization processing and inverse linear prediction filtering are performed to obtain a spectrum before prediction, which is output to the frequency-time mapping module 805, and a time domain audio signal of a low frequency band is obtained after frequency-time mapping processing.

The bit stream demultiplexing module 801 decomposes the compressed audio data stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules. After the compressed audio data stream is demultiplexed, the signal output to the entropy decoding module 802 includes a common scale factor, a scale factor code value, a code book sequence number code value and a lossless coding quantization spectrum, or a code value of a code word index; output to the inverse linear prediction and vector quantization module 804 is inverse frequency domain linear prediction vector quantization control information.

In the encoding apparatus, if a scalar quantizer is used in the quantization and entropy coding module 54, in the decoding apparatus, the entropy decoding module 802 receives the common scale factor, the scale factor encoded value, the codebook number encoded value and the lossless coding quantization spectrum output by the bitstream demultiplexing module 801, and then performs codebook number decoding, spectral coefficient decoding and scale factor decoding on the common scale factor, the scale factor encoded value, the codebook number encoded value and the lossless coding quantization spectrum, reconstructs a quantization spectrum, and outputs an integer representation of the scale factor and a quantization value of the spectrum to the inverse quantizer 803. The decoding method adopted by the entropy decoding module 802 corresponds to the encoding method of entropy encoding in the encoding apparatus, such as Huffman decoding, arithmetic decoding, run-length decoding, or the like.

Upon receiving the quantized values of the spectrum and the integer representation of the scale factors, the inverse quantizer 803 inverse quantizes the quantized values of the spectrum into a reconstructed spectrum without scaling (inverse quantized spectrum), and outputs the inverse quantized spectrum to the inverse frequency-domain linear prediction and vector quantization module 804. The inverse quantizer 803 may be a uniform quantizer or a non-uniform quantizer implemented by a companding function. In the encoding apparatus, the quantizer set employs a scalar quantizer, and the inverse quantizer set 803 in the decoding apparatus also employs a scalar inverse quantizer. In a scalar inverse quantizer, the quantized values of the spectrum are first subjected to a non-linear expansion, and then all the spectral coefficients in the corresponding scale factor band are obtained with each scale factor (inversely quantized spectrum).

If the quantization and entropy coding module 54 employs a vector quantizer, in the decoding apparatus, the entropy decoding module 802 receives the encoded value of the codeword index output by the bitstream demultiplexing module 801, and decodes the encoded value of the codeword index by using an entropy decoding method corresponding to the entropy coding method during encoding, so as to obtain the corresponding codeword index.

The codeword index is output to the inverse quantizer 803, and a quantized value (inverse quantized spectrum) is obtained by looking up a codebook and output to the frequency-time mapping module 805. The inverse quantizer set 803 employs an inverse vector quantizer.

In the encoder, a frequency domain linear predictive vector quantization technology is adopted to suppress pre-echo and obtain larger coding gain. Therefore, in the decoder, the inverse quantization spectrum and inverse frequency domain linear prediction vector quantization control information output by the bitstream demultiplexing module 801 are input to the inverse frequency domain linear prediction and vector quantization module 804 to restore the spectrum before linear prediction.

The inverse frequency domain linear prediction and vector quantization module 804 includes an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is configured to perform inverse quantization on the codeword index to obtain a line spectrum versus frequency coefficient LSF; the inverse converter is used for inversely converting the line spectrum frequency coefficient LSF into a prediction coefficient; the inverse linear prediction filter is configured to perform inverse filtering on the inverse quantized spectrum according to the prediction coefficient to obtain a spectrum before prediction, and output the spectrum to the frequency-time mapping module 805.

The inverse quantized spectrum or the spectrum before prediction is mapped by the frequency-time mapping module 805, so as to obtain a time domain audio signal of a low frequency band. The frequency-time mapping module 805 may be an Inverse Discrete Cosine Transform (IDCT) filter bank, an Inverse Discrete Fourier Transform (IDFT) filter bank, an Inverse Modified Discrete Cosine Transform (IMDCT) filter bank, an inverse wavelet transform filter bank, a cosine modulation filter bank, or the like.

The decoding method based on the decoder comprises the following steps: demultiplexing the compressed audio code stream to obtain data information and control information; entropy decoding the information to obtain a quantized value of a spectrum; carrying out inverse quantization processing on the quantized value of the spectrum to obtain an inverse quantized spectrum; judging whether the control information contains information that the inverse quantization spectrum needs to be subjected to inverse frequency domain linear prediction vector quantization, if so, performing inverse vector quantization processing to obtain a prediction coefficient, and performing inverse linear prediction filtering on the inverse quantization spectrum according to the prediction coefficient to obtain a spectrum before prediction; performing frequency-time mapping on the spectrum before prediction to obtain a time domain audio signal of a low frequency band; and if the control information does not contain the information that the inverse quantization spectrum needs to be subjected to inverse frequency domain linear prediction vector quantization, performing frequency-time mapping on the inverse quantization spectrum to obtain a time domain audio signal of a low frequency band.

If the demultiplexed information includes a code book number code value, a common scale factor, a scale factor code value and a lossless coding quantization spectrum, it indicates that the spectrum coefficient is quantized by a scalar quantization technology in the coding device, and the entropy decoding step includes: decoding the code book sequence number coded value to obtain the code book sequence numbers of all scale factor bands; decoding the quantization coefficients of all scale factor bands according to the code book corresponding to the code book serial number; and decoding the scale factors of all scale factor bands and reconstructing a quantized spectrum. The entropy decoding method adopted in the above process corresponds to the entropy encoding method in the encoding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.

The following describes the entropy decoding process by taking the example of decoding the code book number by the run-length decoding method, decoding the quantization coefficient by the Huffman decoding method, and decoding the scale factor by the Huffman decoding method.

Firstly, obtaining the codebook numbers of all scale factor bands by a run-length decoding method, wherein the decoded codebook numbers are integers of a certain interval, and if the interval is assumed to be [0, 11], only the codebook numbers which are positioned in the effective range, namely between 0 and 11, correspond to the corresponding spectral coefficient Huffman codebooks. For all-zero subbands, a certain codebook number can be selected to correspond to, typically, 0 number can be selected.

And after the code book number of each scale factor band is obtained through decoding, decoding the quantized coefficients of all scale factor bands by using a spectral coefficient Huffman code book corresponding to the code book number. If the codebook number of a scale factor band is in the valid range, for example, between 1 and 11 in this embodiment, the codebook number corresponds to a spectrum coefficient codebook, the codebook is used to decode from the quantized spectrum to obtain the codeword index of the quantized coefficient of the scale factor band, and then unpack from the codeword index to obtain the quantized coefficient. If the code book number of the scale factor band is not between 1 and 11, the code book number does not correspond to any spectrum coefficient code book, the quantized coefficient of the scale factor band is not decoded, and the quantized coefficient of the sub-band is directly set to be zero.

The scale factors are used to reconstruct spectral values on the basis of the inversely quantized spectral coefficients. And if the code book number of the scale factor band is in the effective range, each code book number corresponds to one scale factor. When the scale factors are decoded, firstly, the code stream occupied by the first scale factor is read, then, Huffman decoding is carried out on other scale factors, the difference value between each scale factor and the previous scale factor is obtained in sequence, and the difference value and the previous scale factor value are added to obtain each scale factor. If the quantized coefficients of the current sub-band are all zero, then the scale factors for that sub-band do not need to be decoded.

After the entropy decoding process, the quantized value of the spectrum and the integer representation of the scale factor are obtained, and then the quantized value of the spectrum is subjected to inverse quantization processing to obtain an inverse quantization spectrum. The inverse quantization process includes: performing nonlinear expansion on the quantized values of the spectrum; all spectral coefficients in the corresponding scale factor band (inversely quantized spectrum) are obtained from each scale factor.

If the demultiplexed information includes the code value of the code word index, it indicates that the coding device quantizes the spectral coefficient by using the vector quantization technology, and the entropy decoding step includes: and decoding the code value of the code word index by adopting an entropy decoding method corresponding to the entropy coding method in the coding device to obtain the code word index. And then carrying out inverse quantization processing on the code word index to obtain an inverse quantization spectrum.

And performing inverse frequency domain linear prediction vector quantization on the inverse quantized spectrum. Firstly, judging whether the frame signal is subjected to frequency domain linear prediction vector quantization according to control information, and if so, obtaining a code word index after prediction coefficient vector quantization from the control information; then obtaining a quantized line spectrum frequency coefficient LSF according to the code word index, and calculating a prediction coefficient according to the quantized line spectrum frequency coefficient LSF; and then carrying out linear prediction synthesis on the inverse-quantized spectrum to obtain a spectrum before prediction.

The transfer function a (z) used in the linear prediction error filtering process is:

wherein: a is_iIs a prediction coefficient; p is the prediction order. The residual sequence e (k) thus satisfies, with the spectrum x (k) before prediction:

thus, the residual sequence E (k) and the calculated prediction coefficients a_iObtaining a spectrum X (k) before prediction through frequency domain linear prediction synthesis, and carrying out frequency-time mapping processing on the spectrum X (k) before prediction.

If the control information indicates that the frame signal is not subjected to frequency domain linear prediction vector quantization, inverse frequency domain linear prediction vector quantization processing is not performed, and the inverse quantization spectrum is directly subjected to frequency-time mapping processing.

The method of performing frequency-time mapping processing on the inverse-quantized spectrum corresponds to a time-frequency mapping processing method in the encoding method, and can be performed by using methods such as Inverse Discrete Cosine Transform (IDCT), Inverse Discrete Fourier Transform (IDFT), Inverse Modified Discrete Cosine Transform (IMDCT), and inverse wavelet transform.

The frequency-time mapping process is described below by taking the inverse modified discrete cosine transform IMDCT as an example. The frequency-time mapping process includes three steps: IMDCT transformation, time domain windowing processing and time domain superposition operation.

First, the spectrum or inverse quantity before predictionIMDCT transform is carried out on the spectrum to obtain a transformed time domain signal x_i，n. The expression of the IMDCT transform is:

wherein N represents the sample serial number, N is more than or equal to 0 and less than N, N represents the time domain sample number and is 2048, and N₀(N/2+ 1)/2; i represents a frame number; k represents a spectrum number.

Secondly, windowing is carried out on the time domain signals obtained through IMDCT transformation in the time domain. To satisfy the complete reconstruction condition, the window function w (n) must satisfy the following two conditions: w (2M-1-n) ═ w (n) and w²(n)+w²(n+M)＝1。

Typical window functions are the Sine window, the Kaiser-Bessel window, etc. The invention adopts a fixed window function, and the window function is as follows: w (N + k) ═ cos (pi/2 × ((k +0.5)/N-0.94 × (2 × pi/N × (k +0.5))/(2 × pi))), where pi is the circumferential ratio and k is 0.. N-1; w (k) denotes the kth coefficient of the window function, w (k) ═ w (2 × N-1-k); n represents the number of samples of the encoded frame, and takes the value N1024. The above-described limitations on the window function can additionally be modified using biorthogonal transforms, using specific analysis filters and synthesis filters.

And finally, carrying out superposition processing on the windowed time domain signal to obtain a time domain audio signal. The method comprises the following steps: the first N/2 samples of the signal obtained after the windowing operation and the last of the signal of the previous frame are comparedN/2 samples are overlapped and added to obtain N/2 output time domain audio samples, namely time Audio samples_i，n＝preSam_i，n+preSam_i-1，n+N/2Wherein i represents a frame number, n represents a sample number, has

And N is 2048.

And after the compressed audio data stream is processed by the steps, a time domain audio signal of a low frequency band is obtained.

Fig. 7 is a schematic structural diagram of a first embodiment of the encoding apparatus of the present invention. This embodiment adds a multi-resolution analysis module 56 between the output of the frequency domain linear prediction and vector quantization module 53 and the input of the quantization and entropy coding module 54 on the basis of fig. 5.

For the fast-changing type signals, in order to effectively overcome the pre-echo phenomenon generated in the encoding process and improve the encoding quality, the encoding device of the invention improves the time resolution of the encoded fast-changing signals through the multi-resolution analysis module 56. The residual sequence or frequency domain coefficients output by the frequency domain linear prediction and vector quantization module 53 are input to the multiresolution analysis module 56, and if the signal is a fast-varying type signal, frequency domain wavelet transform or frequency domain Modified Discrete Cosine Transform (MDCT) is performed to obtain a multiresolution representation of the residual sequence/frequency domain coefficients, which is output to the quantization and entropy coding module 54. If the signal is a slowly varying type signal, the residual sequence/frequency domain coefficients are not processed and are directly output to the quantization and entropy coding module 54.

The multiresolution analysis module 56 performs time-frequency domain reorganization on the input frequency domain data to improve the time resolution of the frequency domain data at the cost of reducing the frequency precision, thereby automatically adapting to the time-frequency characteristics of the fast-varying type signals to achieve the effect of pre-echo suppression without adjusting the form of the filter bank in the time-frequency mapping module 52 at any time. The multiresolution analysis module 56 comprises a frequency domain coefficient transform module for transforming frequency domain coefficients into time-frequency plane coefficients and a recombination module; the recombination module is used for recombining the time-frequency plane coefficients according to a certain rule. The frequency domain coefficient transform module may employ a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, or the like.

The operation of the multiresolution analysis module 56 will be described below using examples of frequency domain wavelet transforms and frequency domain MDCT transforms.

1) Frequency domain wavelet transform

Assuming a time sequence x (i), i is 0, 1.. and 2M-1, a frequency domain coefficient obtained after time-frequency mapping is x (k), and k is 0, 1.. and M-1. The wavelet basis of the frequency domain wavelet or wavelet packet transform may be fixed or adaptive.

The following describes the process of performing multi-resolution analysis on the frequency domain coefficients by taking the simplest Harr wavelet-based wavelet transform as an example.

The Harr wavelet base has a scale coefficient ofWavelet coefficient of

FIG. 8 is a diagram showing a filtering structure of a wavelet transform using Harr wavelet basis, where H₀Representing low-pass filtering (filter coefficient of

)，H₁Representing high-pass filtering (filter coefficients of

) And "↓ 2" indicates a down-sampling operation of 2 times. For the mid-low frequency part X of MDCT coefficients₁(k)，k＝0，...，k₁Harr wavelet transform is carried out on the high-frequency part of the MDCT coefficient without carrying out wavelet transform to obtain the coefficient X of different time-frequency intervals₂(k)、X₃(k)、X₄(k)、X₅(k)、X₆(k) And X₇(k) The corresponding time-frequency plane division is shown in fig. 9. Selecting different wavelet bases, optionallyDifferent wavelet transform structures are processed to obtain other similar time-frequency plane divisions. Therefore, the time-frequency plane division during signal analysis can be adjusted arbitrarily according to the needs, and the analysis requirements of different time and frequency resolutions are met.

The time-frequency plane coefficients are recombined in a recombination module according to a certain rule, for example: the time-frequency plane coefficients can be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged according to the sequence of the sub-windows and the scale factor bands.

2) Frequency domain MDCT transform

The frequency domain data input into the frequency domain MDCT filter bank is set as X (k), k is 0, 1,.. and N-1, and M-point MDCT transformation is sequentially carried out on the N points of frequency domain data, so that the frequency precision of the time-frequency domain data is reduced to some extent, and the time precision is correspondingly improved. Different time-frequency plane divisions, i.e. different time and frequency accuracies, can be obtained by using frequency domain MDCT transforms of different lengths in different frequency domain ranges. The recombination module recombines the time-frequency domain data output by the frequency domain MDCT transform filter bank, and the recombination method is that the time-frequency plane coefficients are firstly organized in the frequency direction, meanwhile, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged according to the sequence of the sub-windows and the scale factor bands.

The basic flow of the encoding method based on the encoding apparatus shown in fig. 7 is the same as the encoding method based on the encoding apparatus shown in fig. 5, except that the following steps are added: before quantizing and entropy coding the residual error sequence/frequency domain coefficient, if the residual error sequence/frequency domain coefficient is a fast-changing type signal, carrying out multi-resolution analysis on the residual error sequence/frequency domain coefficient; if the signal is not a fast-varying type signal, the residual sequence/frequency domain coefficients are directly quantized and entropy encoded.

The multi-resolution analysis may employ a frequency domain wavelet transform or a frequency domain MDCT transform. The frequency domain wavelet analysis method comprises the following steps: performing wavelet transformation on the frequency domain coefficient to obtain a time-frequency plane coefficient; and recombining the time-frequency plane coefficients according to a certain rule. Whereas the MDCT transform law includes: performing MDCT on the frequency domain coefficient to obtain a time-frequency plane coefficient; and recombining the time-frequency plane coefficients according to a certain rule. The method of recombination may comprise: the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged according to the sequence of the sub-windows and the scale factor bands.

Fig. 10 is a schematic structural diagram of a decoding device according to a first embodiment of the present invention. The decoding apparatus is added with a multi-resolution synthesis module 806 on the basis of the decoding apparatus shown in fig. 6. A multi-resolution synthesis module 806 is located between the output of the inverse quantizer 803 and the input of the inverse frequency-domain linear prediction and vector quantization module 804 for multi-resolution synthesis of the inverse quantized spectrum.

In the encoder, a multi-resolution filtering technique is applied to the fast-varying type signal to improve the temporal resolution of the encoded fast-varying type signal. Accordingly, in the decoder, the multiresolution synthesis module 806 is needed to restore the frequency domain coefficients before multiresolution analysis to the fast-varying type signal. The multi-resolution synthesis module 806 includes: the coefficient reconstruction module and the coefficient transformation module, wherein the coefficient transformation module can adopt a frequency domain inverse wavelet transform filter bank or a frequency domain IMDCT transform filter bank.

The basic flow of the decoding method based on the decoding device shown in fig. 10 is the same as the decoding method based on the decoding device shown in fig. 6, except that the following steps are added: and after the inverse quantization spectrum is obtained, performing multi-resolution synthesis on the inverse quantization spectrum, and then judging whether inverse frequency domain linear prediction vector quantization processing needs to be performed on the inverse quantization spectrum after the multi-resolution synthesis.

The following method for multi-resolution synthesis is described by taking a frequency domain IMDCT transform as an example, and specifically comprises the following steps: recombining the inversely quantized spectral coefficients; and performing a plurality of IMDCT transformations on each coefficient to obtain an inverse quantization spectrum before multi-resolution analysis. The process is described in detail below with 128 IMDCT transforms (8 inputs, 16 outputs). Firstly, arranging the inversely quantized spectral coefficients according to the sequence of the sub-windows and the scale factor bands; and then recombined in frequency order such that the 128 coefficients of each sub-window are organized together in frequency order. Then, the coefficients arranged in the sub-window are organized in the frequency direction every 8 groups, and the 8 coefficients in each group are arranged in time series, so that there are 128 groups of coefficients in the frequency direction. And performing 16-point IMDCT transformation on each group of coefficients, and overlapping and adding 16 output coefficients after each group of IMDCT transformation to obtain 8 frequency domain data. And carrying out 128 times of similar operations from the low frequency to the high frequency in sequence to obtain 1024 frequency domain coefficients.

Fig. 11 is a schematic diagram of a second embodiment of the encoding apparatus of the present invention. This embodiment adds a sum and difference stereo (M/S) coding module 57 between the output of the frequency domain linear prediction and vector quantization module 53 and the input of the quantization and entropy coding module 54 to the base of fig. 5, and the psychoacoustic analysis module 51 outputs the masking threshold for the sum and difference channel to the quantization and entropy coding module 54. For multi-channel signals, the psychoacoustic analysis module 51 calculates a masking threshold for the sum and difference channels in addition to a masking threshold for the mono channel of the audio signal. The sum and difference stereo encoding module 57 may also be located between the quantizer set and the encoder in the quantization and entropy encoding module 54.

The sum and difference stereo encoding module 57 converts the frequency domain coefficients/residual sequences of the left and right channels into the frequency domain coefficients/residual sequences of the sum and difference channels by using the correlation between the two channels in the channel pair, so as to achieve the effects of reducing the code rate and improving the encoding efficiency, and therefore, the sum and difference stereo encoding module is only suitable for multi-channel signals with consistent signal types. If the signal is a mono signal or a multi-channel signal with inconsistent signal types, the sum and difference stereo encoding process is not performed.

The encoding method based on the encoding apparatus shown in fig. 11 is substantially the same as the encoding method based on the encoding apparatus shown in fig. 5, except that the following steps are added: before quantization and entropy coding processing are carried out on a residual sequence/frequency domain coefficient, whether an audio signal is a multi-channel signal is judged, if the audio signal is the multi-channel signal, whether the signal types of a left channel signal and a right channel signal are consistent is judged, if the signal types are consistent, whether a scale factor band corresponding to the two channels meets a sum stereo coding condition is judged, and if the signal types are consistent, sum stereo coding and difference stereo coding are carried out on the residual sequence/frequency domain coefficient to obtain a residual sequence/frequency domain coefficient of a sum channel; if not, then not proceeding sum and difference stereo coding; if the signal is a single-channel signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed.

Sum and difference stereo coding can be applied before the quantization process, but also after quantization and before entropy coding, i.e.: after quantizing the residual sequence/frequency domain coefficient, judging whether the audio signal is a multi-channel signal, if so, judging whether the signal types of the left and right channel signals are consistent, if so, judging whether the scale factor band meets the coding condition, if so, performing sum stereo coding on the quantized spectrum to obtain the quantized spectrum of a sum channel; if not, then not proceeding sum and difference stereo coding; if the signal is a single-channel signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed.

The method for judging whether the scale factor band can carry out sum and difference stereo coding is many, and the judging method adopted by the invention is as follows: by K-L transformation. The specific judgment process is as follows:

if the left channel scale factor band has a spectral coefficient of l (k), the right channel scale factor band has a spectral coefficient of r (k), and the correlation matrix C is

C = (\begin{matrix} C_{ll} & C_{lr} \\ C_{lr} & C_{rr} \end{matrix}),

Wherein,

n is the number of spectral lines in the scale factor band. Performing K-L transformation on the correlation matrix C to obtain

Wherein,

<math> <mrow> <mi>α</mi> <mo>&Element;</mo> <mo>[</mo> <mo>-</mo> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> <mo>,</mo> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> <mo>]</mo> </mrow> </math>

the rotation angle a satisfies

When a is + -pi/4, it is the sum and difference stereo coding mode. Thus, when the absolute value of the rotation angle a deviates a/4 by a small amount, e.g. 3 pi/16 < | a | < 5 pi/16, the corresponding scale factor band can be sum and difference stereo coded.

If sum and difference stereo coding is applied before the quantization process, the residual sequence/frequency domain coefficients of the left and right channels at the scale factor band are replaced by the residual sequence/frequency domain coefficients of the sum and difference channels by linear transformation:

[\begin{matrix} M \\ S \end{matrix}] = \frac{1}{2} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} L \\ R \end{matrix}],

where M represents the sum channel residual sequence/frequency domain coefficients; s represents a difference channel residual sequence/frequency domain coefficient; l represents the left channel residual sequence/frequency domain coefficient; r is represented as the right channel residual sequence/frequency domain coefficients.

If sum and difference stereo coding is applied after quantization, the quantized residual sequence/frequency domain coefficients of the left and right channels in the scale factor band are replaced by the residual sequence/frequency domain coefficients of the sum and difference channel by linear transformation:

[\begin{matrix} \hat{M} \\ \hat{S} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{L} \\ \hat{R} \end{matrix}],

wherein:representing quantized sum channel residual sequence/frequency domain coefficients;representing the quantized difference channel residual sequence/frequency domain coefficients;

representing the quantized left channel residual sequence/frequency domain coefficients;representing the quantized right channel residual sequence/frequency domain coefficients.

The sum and difference stereo coding is put after quantization processing, which not only can effectively remove the correlation of the left and right channels, but also can achieve lossless coding because the coding is carried out after quantization.

FIG. 12 is a diagram of a second decoding apparatus according to the present invention. The decoding apparatus is added with a sum and difference stereo decoding module 807 between the output of the inverse quantizer set 803 and the input of the inverse frequency domain linear prediction and vector quantization module 804, based on the decoding apparatus shown in fig. 6, and receives the signal type analysis result and the sum and difference stereo control signal output by the bitstream demultiplexing module 801, for converting the inverse quantized spectrum of the sum and difference channel into the inverse quantized spectrum of the left and right channels according to the control information.

In the sum and difference stereo control signal, there is a flag bit to indicate whether the sum and difference stereo decoding is required for the current channel pair, and if so, there is also a flag bit on each scale factor band to indicate whether the sum and difference stereo decoding is required for the corresponding scale factor band, and the sum and difference stereo decoding module 807 determines whether the sum and difference stereo decoding is required for the inverse quantization spectrum in some scale factor bands according to the flag bit of the scale factor band. If sum and difference stereo coding is performed in the coding apparatus, sum and difference stereo decoding must be performed on the inversely quantized spectrum in the decoding apparatus.

The sum and difference stereo decoding module 807, which may also be located between the output of the entropy decoding module 802 and the input of the inverse quantizer bank 803, receives the sum and difference stereo control signal and the signal type analysis result output by the bitstream demultiplexing module 601.

The decoding method based on the decoding apparatus shown in fig. 12 is basically the same as the decoding method based on the decoding apparatus shown in fig. 6, except that the following steps are added: after the inverse quantization spectrum is obtained, if the signal type analysis result shows that the signal types are consistent, judging whether sum difference stereo decoding needs to be carried out on the inverse quantization spectrum or not according to the sum difference stereo control signal; if necessary, judging whether the scale factor band needs sum and difference stereo decoding according to the flag bit on each scale factor band, if necessary, converting the inverse quantization spectrum of the sum and difference channel in the scale factor band into the inverse quantization spectrum of the left and right channels, and then carrying out subsequent processing; if the signal types are not consistent or sum and difference stereo decoding is not required, the inverse quantized spectrum is not processed and the subsequent processing is directly performed.

Sum and difference stereo decoding may also be performed after the entropy decoding process and before the inverse quantization process, i.e.: after the quantized value of the spectrum is obtained, if the signal type analysis result shows that the signal types are consistent, judging whether sum difference stereo decoding needs to be carried out on the quantized value of the spectrum according to the sum difference stereo control signal; if so, judging whether the scale factor band needs sum and difference stereo decoding according to the flag bit on each scale factor band, if so, converting the quantized value of the spectrum of the sum and difference channel in the scale factor band into the quantized value of the spectrum of the left and right channels, and then carrying out subsequent processing; if the signal types are not consistent or sum and difference stereo decoding is not required, the quantized values of the spectrum are not processed and are directly subjected to subsequent processing.

If the sum-difference stereo decoding is performed after entropy decoding and before inverse quantization, the frequency domain coefficients of the left and right channels in the scale factor band are obtained from the frequency domain coefficients of the sum-difference channel by using the following operations:

[\begin{matrix} \hat{l} \\ \hat{r} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{m} \\ \hat{s} \end{matrix}],

wherein:representing the quantized sum channel frequency domain coefficients; s denotes the quantized difference channel frequency domain coefficients;

representing the quantized left channel frequency domain coefficients;

representing the quantized right channel frequency domain coefficients.

If sum and difference stereo decoding is after inverse quantization, the inverse quantized frequency domain coefficients of the left and right channels in subbands are obtained from the frequency domain coefficients of the sum and difference channels according to the following matrix operation:

[\begin{matrix} l \\ r \end{matrix}] = [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} m \\ s \end{matrix}],

wherein: m represents sum channel frequency domain coefficients; s represents a difference channel frequency domain coefficient; l represents the left channel frequency domain coefficient; r represents the right channel frequency domain coefficients.

Fig. 13 is a schematic structural diagram of a third embodiment of the encoding apparatus of the present invention. This embodiment is based on the encoding apparatus shown in fig. 5, a band expansion module 58 and a resampling module 59 are added, wherein the band expansion module 58 is configured to analyze the original input audio signal over the entire frequency band, extract the spectral envelope of the high frequency part and related parameters characterizing the correlation between the low and high frequency spectrums, and output the extracted spectral envelope and related parameters as band expansion information to the bitstream multiplexing module 55; the resampling module 59 is used for resampling the original input audio signal and changing the sampling rate of the audio signal.

The resampling includes two types, i.e., upsampling and downsampling, and the following sampling is taken as an example to illustrate the resampling. In this embodiment, the resampling module 59 comprises a low-pass filter and a down-sampler, wherein the low-pass filter is used to limit the frequency band of the audio signal and eliminate aliasing that may be caused by down-sampling. The input audio signal is low-pass filtered and down-sampled. Assume that the input audio signal is s (n) viaThe output after being filtered by the low-pass filter with impulse response h (n) is v (n), then

The sequence obtained by M times down-sampling v (n) is x (n), and there are

Thus, the sampling rate of the resampled audio signal x (n) is reduced by a factor M with respect to the sampling rate of the original input audio signal s (n).

The basic principle of band extension is: for most audio signals, there is a strong correlation between the characteristics of the high frequency part and the characteristics of the low frequency part, so that the high frequency part of the audio signal can be effectively reconstructed through the low frequency part thereof, and thus, the high frequency part of the audio signal may not be transmitted. To ensure that the high frequency portion can be reconstructed correctly, only a small amount of band extension information needs to be transmitted in the compressed audio stream.

The band expansion module 58 includes a parameter extraction module, in which the input signal enters, extracts parameters representing the spectral characteristics of the input signal in different time-frequency regions, and then estimates the spectral envelope of the high-frequency part of the signal with a certain time-frequency resolution in the spectral envelope extraction module. In order to ensure that the time-frequency resolution is best suited for the characteristics of the current input signal, the time-frequency resolution of the spectral envelope can be freely chosen. The parameters of the spectral characteristics of the input signal and the spectral envelope of the high frequency part are fed as output of the band extension to the bit stream multiplexing module 55 for multiplexing.

The encoding method based on the encoding apparatus shown in fig. 13 is substantially the same as the encoding method based on the encoding apparatus shown in fig. 5, except that the following steps are added: the encoding method further includes: resampling the audio signal before performing a type analysis on the audio signal; the input audio signal is analyzed on the whole frequency band, the high-frequency spectrum envelope and the signal spectrum characteristic parameter of the input audio signal are extracted to be used as a frequency band expansion control signal, and the frequency band expansion control signal, the audio coding signal and the side information are multiplexed together to obtain a compressed audio code stream. Wherein, resampling comprises two steps: limiting the frequency band of the audio signal and downsampling the band-limited audio signal by a multiple.

Fig. 14 is a schematic diagram of a third structure of an embodiment of a decoding apparatus according to the present invention, which is added with a band expansion module 808 on the basis of the decoding apparatus shown in fig. 6, receives band expansion control information output by a bitstream demultiplexing module 801 and a low-frequency-band time-domain audio signal output by a frequency-time mapping module 805, reconstructs a high-frequency signal portion through spectrum shifting and high-frequency adjustment, and outputs a wide-frequency-band audio signal.

The decoding method based on the decoding apparatus shown in fig. 14 is basically the same as the decoding method based on the decoding apparatus shown in fig. 6, except that the following steps are added: the decoding method further comprises reconstructing a high-frequency part of the time domain audio signal according to the frequency band extension control information and the time domain audio signal to obtain a wideband audio signal.

Fig. 15 is a schematic structural diagram of a fourth embodiment of the encoding device of the present invention. This embodiment is based on the encoding apparatus shown in fig. 7, and adds a band expansion module 58 and a resampling module 59. In this embodiment, the connection relationship, functions and operation principles between the band expansion module 58 and the resampling module 59 and other modules are the same as those in fig. 13, and are not described herein again.

The encoding method based on the encoding apparatus shown in fig. 15 is substantially the same as the encoding method based on the encoding apparatus shown in fig. 7, except that the following steps are added: the encoding method further includes: resampling the audio signal before performing a type analysis on the audio signal; analyzing an input audio signal on the whole frequency band, and extracting high-frequency spectrum envelope and spectrum characteristic parameters of the input audio signal; and finally, multiplexing the audio coded signal and the side information together to obtain a compressed audio code stream.

Fig. 16 is a diagram illustrating a fourth embodiment of the decoding apparatus of the present invention. The decoding apparatus is added with a band extending module 808 on the basis of the decoding apparatus shown in fig. 10. In this embodiment, the connection relationship, function and operation principle between the band extending module 808 and other modules are the same as those in fig. 14, and are not described herein again.

The decoding method based on the decoding apparatus shown in fig. 16 is basically the same as the decoding method based on the decoding apparatus shown in fig. 10, except that the following steps are added: the decoding method further comprises reconstructing a high frequency part of the audio signal according to the band extension control information and the time domain audio signal to obtain a wideband audio signal.

Fig. 17 is a schematic structural diagram of a fifth embodiment of the encoding device of the present invention. This embodiment is based on the encoding apparatus shown in fig. 7, and adds a sum and difference stereo encoding module 57 between the output of the multiresolution analysis module 56 and the input of the quantization and entropy coding module 54 or between the set of quantizers in the quantization and entropy coding module 54 and the encoder. In this embodiment, the function and operation principle of the sum and difference stereo encoding module 57 are the same as those in fig. 11, and are not described herein again.

The encoding method based on the encoding apparatus shown in fig. 17 is substantially the same as the encoding method based on the encoding apparatus shown in fig. 7, except that the following steps are added: after multi-resolution analysis is carried out on the residual sequence/frequency domain coefficient, whether the audio signal is a multi-channel signal is judged, if so, whether the signal types of the left and right channel signals are consistent is judged, if so, whether the scale factor band meets the coding condition is judged, and if so, sum stereo coding is carried out on the residual sequence/frequency domain coefficient to obtain the residual sequence/frequency domain coefficient of a sum channel; if not, then not proceeding sum and difference stereo coding; if the signal is a single-channel signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed. The specific process is described above and will not be described herein.

Fig. 18 is a schematic structural diagram of a fifth embodiment of the decoding apparatus according to the present invention. The decoding apparatus is based on the decoding apparatus shown in fig. 10, and adds a sum and difference stereo decoding module 807 between the output of the inverse quantizer set 803 and the input of the multi-resolution synthesis 806 or between the output of the entropy decoding module 802 and the input of the inverse quantizer set 803. The function and operation principle of the sum and difference stereo decoding module 807 in this embodiment are the same as those in fig. 12, and are not described herein again.

The decoding method based on the decoding apparatus shown in fig. 18 is basically the same as the decoding method based on the decoding apparatus shown in fig. 10, except that the following steps are added: after the inverse quantization spectrum is obtained, if the signal type analysis result shows that the signal types are consistent, judging whether sum difference stereo decoding needs to be carried out on the inverse quantization spectrum or not according to the sum difference stereo control signal; if necessary, judging whether the scale factor band needs sum and difference stereo decoding according to the flag bit on each scale factor band, if necessary, converting the inverse quantization spectrum of the sum and difference channel in the scale factor band into the inverse quantization spectrum of the left and right channels, and then carrying out subsequent processing; if the signal types are not consistent or sum and difference stereo decoding is not required, the inverse quantized spectrum is not processed and the subsequent processing is directly performed. The specific process is described above and will not be described herein.

Fig. 19 is a schematic diagram of a sixth embodiment of the encoding apparatus of the present invention, which is obtained by adding a band expanding module 58 and a resampling module 59 to the encoding apparatus of fig. 17. In this embodiment, the connection relationship, functions and operation principles between the band expansion module 58 and the resampling module 59 and other modules are the same as those in fig. 13, and are not described herein again.

The encoding method based on the encoding apparatus shown in fig. 19 is basically the same as the encoding method based on the encoding apparatus shown in fig. 17, except that the following steps are added: the encoding method further includes: resampling the audio signal before performing a type analysis on the audio signal; analyzing an input audio signal on the whole frequency band, and extracting high-frequency spectrum envelope and spectrum characteristic parameters of the input audio signal; and finally, multiplexing the audio coded signal and the side information together to obtain a compressed audio code stream.

Fig. 20 is a diagram illustrating a sixth embodiment of the decoding apparatus of the present invention. The decoding apparatus is added with a band extending module 808 to the decoding apparatus shown in fig. 18. In this embodiment, the connection relationship, function and operation principle between the band extending module 808 and other modules are the same as those in fig. 14, and are not described herein again.

The decoding method based on the decoding apparatus shown in fig. 20 is basically the same as the decoding method based on the decoding apparatus shown in fig. 18, except that the following steps are added: the decoding method further comprises reconstructing a high frequency part of the audio signal according to the band extension control information and the time domain audio signal to obtain a wideband audio signal.

Fig. 21 is a schematic diagram of a seventh embodiment of the encoding apparatus of the present invention, which is obtained by adding a band expansion module 58 and a resampling module 59 to fig. 11. In this embodiment, the connection relationship, functions and operation principles between the band expansion module 58 and the resampling module 59 and other modules are the same as those in fig. 14, and are not described herein again.

The encoding method based on the encoding apparatus shown in fig. 21 is substantially the same as the encoding method based on the encoding apparatus shown in fig. 11, except that the following steps are added: the encoding method further includes: resampling the audio signal before performing a type analysis on the audio signal; analyzing an input audio signal on the whole frequency band, and extracting high-frequency spectrum envelope and spectrum characteristic parameters of the input audio signal; and finally, multiplexing the audio coded signal and the side information together to obtain a compressed audio code stream.

Fig. 22 is a diagram of a seventh embodiment of the decoding device according to the present invention. The decoding apparatus is added with a band extending module 808 on the basis of the decoding apparatus shown in fig. 12. In this embodiment, the connection relationship, function and operation principle between the band extending module 808 and other modules are the same as those in fig. 14, and are not described herein again.

The decoding method based on the decoding apparatus shown in fig. 22 is basically the same as the decoding method based on the decoding apparatus shown in fig. 12, except that the following steps are added: the decoding method further comprises reconstructing a high frequency part of the audio signal according to the band extension control information and the time domain audio signal to obtain a wideband audio signal.

In the 7 embodiments of the above coding apparatus, a gain control module may be further included, which receives the audio signal output by the signal type analysis module 59, controls the dynamic range of the fast-change type signal, and eliminates pre-echo in audio processing, and the output of the gain control module is connected to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and outputs the gain adjustment amount to the bitstream multiplexing module 55.

The gain control module only controls the fast-changing type signal according to the signal type of the audio signal, and directly outputs the slowly-changing type signal without processing. For the fast-changing type signal, the gain control module adjusts the time domain energy envelope of the signal and improves the gain value of the signal before the fast-changing point, so that the amplitude of the time domain signal before and after the fast-changing point is relatively close; the time domain signal with the adjusted time domain energy envelope is then output to the time-frequency mapping module 52, while the gain adjustment is output to the bitstream multiplexing module 55.

The coding method is basically the same as the coding method based on the coding device, and the difference is that the following steps are added: and performing gain control on the signal subjected to the signal type analysis.

In the 7 embodiments of the decoding apparatus, the decoding apparatus may further include an inverse gain control module, which is located after the output of the frequency-time mapping module 805, and receives the signal type analysis result and the gain adjustment amount information output by the bitstream demultiplexing module 801, and is used to adjust the gain of the time domain signal and control the pre-echo. The inverse gain control module controls the fast-varying type signal and does not process the slowly varying type signal after receiving the reconstructed time domain signal output by the frequency-time mapping module 805. For the fast-changing type signal, the inverse gain control module adjusts and reconstructs the energy envelope of the time domain signal according to the gain adjustment amount information, reduces the amplitude value of the signal before the fast-changing point, and adjusts the energy envelope back to the original state of low front and high back, so that the amplitude value of quantization noise before the fast-changing point and the amplitude value of the signal are correspondingly reduced together, and pre-echo is controlled.

The decoding method is the same as the decoding method based on the decoding device, and is different from the decoding method by adding the following steps: and carrying out inverse gain control on the reconstructed time domain signal.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. An enhanced audio coding apparatus comprising: the device comprises a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module and a bit stream multiplexing module, and is characterized by also comprising a signal type analysis module and a frequency domain linear prediction and vector quantization module;

the signal type analysis module is used for carrying out signal type analysis on an input audio signal, outputting the audio signal to the psychoacoustic analysis module and the time-frequency mapping module, and outputting a signal type analysis result to the bit stream multiplexing module;

the psychoacoustic analysis module is used for calculating a masking threshold and a signal-to-mask ratio of the audio signal after the signal type analysis is finished, and outputting the masking threshold and the signal-to-mask ratio to the quantization and entropy coding module;

the time-frequency mapping module is used for transforming the time-domain audio signal into a frequency-domain coefficient;

the frequency domain linear prediction and vector quantization module is used for performing linear prediction on frequency domain coefficients, converting the generated prediction coefficients into line spectrum pair frequency coefficients, performing multi-level vector quantization on the line spectrum pair frequency coefficients, outputting prediction residual sequences of the frequency domain coefficients to the quantization and entropy coding module, and outputting side information to the bit stream multiplexing module;

the quantization and entropy coding module is used for quantizing and entropy coding the residual error sequence/frequency domain coefficient under the control of the signal-to-mask ratio output by the psychoacoustic analysis module and outputting the quantized and entropy coded residual error sequence/frequency domain coefficient to the bit stream multiplexing module;

the bit stream multiplexing module is used for multiplexing the received data to form an audio coding code stream.

2. The apparatus of claim 1, wherein the frequency-domain linear prediction and vector quantization module is comprised of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer;

the linear prediction analyzer is used for performing prediction analysis on the frequency domain coefficient to obtain prediction gain and a prediction coefficient, and outputting the frequency domain coefficient meeting a certain condition to the linear prediction filter; directly outputting the frequency domain coefficients which do not satisfy the condition to the quantization and entropy coding module;

the linear prediction filter is used for filtering the frequency domain coefficient to obtain a residual sequence of the frequency domain coefficient, outputting the residual sequence to the quantization and entropy coding module, and outputting the prediction coefficient to the converter;

the converter is used for converting the prediction coefficients into line spectrum pair frequency coefficients;

the vector quantizer is used for performing multi-stage vector quantization on the line spectrum to frequency coefficients, and the quantized signals are transmitted to the bit stream multiplexing module.

3. The apparatus of claim 1, further comprising a sum and difference stereo coding module, located between an output of the frequency domain linear prediction and vector quantization module or the multiresolution analysis module and an input of the quantization and entropy coding module or between a quantizer set in the quantization and entropy coding module and an encoder, for converting the frequency domain coefficient/residual sequence of the left and right channels into a frequency domain coefficient/residual sequence of a sum and difference channel.

4. The apparatus of any of claims 1 to 3, further comprising a resampling module and a band extension module, wherein

The resampling module is used for resampling the input audio signal and changing the sampling rate of the input signal; the down sampler is used for down sampling the signal and reducing the sampling rate of the signal;

the band expansion module is used for analyzing the original input audio signal on the whole frequency band, extracting the spectrum envelope of the high-frequency part and the parameters representing the correlation between the low-frequency part and the high-frequency part, and outputting the parameters to the bit stream multiplexing module; the system specifically comprises a parameter extraction module and a spectrum envelope extraction module; the parameter extraction module is used for extracting parameters of the input signals representing the spectral characteristics of the input signals in different time frequency regions; the spectral envelope extraction module is used for estimating the spectral envelope of the high-frequency part of the signal with a certain time-frequency resolution, and then outputting the spectral characteristic parameters of the input signal and the spectral envelope of the high-frequency part to the bit stream multiplexing module.

5. A method of enhanced audio coding comprising the steps of:

firstly, carrying out signal type analysis on an input audio signal, wherein a signal type analysis result is used as a part of signal multiplexing;

step two, calculating the signal-to-mask ratio of the signals after type analysis;

thirdly, performing time-frequency mapping on the signals after type analysis to obtain frequency domain coefficients of the audio signals;

step four, performing standard linear prediction analysis on the frequency domain coefficient to obtain prediction gain and a prediction coefficient; judging whether the prediction gain exceeds a set threshold value, if so, performing frequency domain linear prediction error filtering on the frequency domain coefficient according to the prediction coefficient to obtain a residual sequence; converting the prediction coefficient into a line spectrum pair frequency coefficient, and performing multi-level vector quantization processing on the line spectrum pair frequency coefficient to obtain side information; if the prediction gain does not exceed the set threshold, the frequency domain coefficient is not processed, and the step five is carried out;

fifthly, quantizing and entropy coding the residual error sequence/frequency domain coefficient;

and step six, multiplexing the side information and the coded audio signal to obtain a compressed audio code stream.

6. The method of claim 5, wherein the quantization in step five is scalar quantization, comprising: carrying out nonlinear companding on the frequency domain coefficients in all scale factor bands; quantizing the frequency domain coefficient of each sub-band by using the scale factor of each sub-band to obtain a quantized spectrum represented by an integer; selecting a first scale factor in each frame signal as a common scale factor; carrying out difference processing on other scale factors and the previous scale factor;

the entropy encoding includes: entropy coding is carried out on the quantized spectrum and the scale factor after the difference processing, and a code book serial number, a scale factor coding value and a lossless coding quantized spectrum are obtained; and entropy coding is carried out on the code book serial number to obtain a code book serial number coding value.

7. The method of claim 5 or 6, wherein said step five further comprises: quantizing the residual sequence/frequency domain coefficients; judging whether the audio signal is a multi-channel signal, if so, judging whether the signal types of the left and right channel signals are consistent, if so, judging whether the scale factor bands corresponding to the two channels meet sum stereo coding conditions, and if so, performing sum stereo coding on a residual sequence/frequency domain coefficient in the scale factor bands of the two channels to obtain a residual sequence/frequency domain coefficient of a sum channel; if not, the residual error sequence/frequency domain coefficient in the scale factor band does not carry out sum stereo coding; if the signal is a single-channel signal or a multi-channel signal with inconsistent signal types, the residual sequence/frequency domain coefficient is not processed; entropy coding the residual sequence/frequency domain coefficient; wherein

The method for judging whether the scale factor band meets the coding condition comprises the following steps: K-L transformation, in particular: calculating a correlation matrix of the spectral coefficients of the left and right channel scale factor bands; performing K-L transformation on the correlation matrix; if the absolute value of the rotation angle alpha deviates less than pi/4, such as 3 pi/16 < | alpha | < 5 pi/16, the corresponding scale factor band can be subjected to sum and difference stereo coding; the sum and difference stereo coding is:

[\begin{matrix} \hat{M} \\ \hat{S} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{L} \\ \hat{R} \end{matrix}],

wherein:

representing the quantized sum channel frequency domain coefficients;

representing quantizedA difference channel frequency domain coefficient;representing the quantized left channel frequency domain coefficients;

representing the quantized right channel frequency domain coefficients.

8. The method of any of claims 5 to 7, further comprising, before said step one: resampling an input audio signal, specifically: limiting the frequency band of the audio signal and downsampling the band-limited audio signal by a multiple; after the sixth step, the method further comprises: analyzing an original input audio signal before resampling operation on the whole frequency band, and extracting a high-frequency spectrum envelope and signal spectrum characteristic parameters of the original input audio signal; multiplexing the audio coding signal and the side information together to obtain a compressed audio code stream.

9. An enhanced audio decoding device comprises a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group and a frequency-time mapping module, and is characterized by also comprising an inverse frequency domain linear prediction and vector quantization module;

the bit stream demultiplexing module is used for demultiplexing the compressed audio data stream and outputting corresponding data signals and control signals to the entropy decoding module and the inverse frequency domain linear prediction and vector quantization module;

the entropy decoding module is used for decoding the signals, recovering the quantized value of the spectrum and outputting the quantized value to the inverse quantizer group;

the inverse quantizer group is used for reconstructing an inverse quantized spectrum and outputting the inverse quantized spectrum to the inverse frequency domain linear prediction and vector quantization module;

the inverse frequency domain linear prediction and vector quantization module is used for performing inverse linear prediction filtering on the inverse quantized spectrum to obtain a spectrum before prediction and outputting the spectrum to the frequency-time mapping module;

and the frequency-time mapping module is used for carrying out frequency-time mapping on the spectral coefficients to obtain time domain audio signals.

10. The apparatus of claim 9, wherein the inverse frequency-domain linear prediction and vector quantization module comprises an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter; the inverse vector quantizer is used for inversely quantizing the code word index to obtain a line spectrum frequency coefficient; the inverse converter is used for inversely converting the line spectrum pair frequency coefficient into a prediction coefficient; and the inverse linear prediction filter is used for carrying out inverse filtering on the inverse quantization spectrum according to the prediction coefficient to obtain a spectrum before prediction.

11. The apparatus of claim 9 or 10, further comprising a sum and difference stereo decoding module, located between the output of the inverse quantizer set and the input of the multiresolution synthesis or inverse frequency domain linear prediction and vector quantization module or between the output of the entropy decoding module and the input of the inverse quantizer set, for receiving the signal type analysis result and a sum and difference stereo control signal output by the bitstream demultiplexing module, for converting the inverse quantized spectrum of the sum and difference channel into the inverse quantized spectrum of the left and right channels according to the control information.

12. An enhanced audio decoding method, comprising the steps of:

the method comprises the following steps that firstly, compressed audio data streams are demultiplexed to obtain data information and control information;

step two, entropy decoding is carried out on the information to obtain a quantized value of a spectrum;

thirdly, carrying out inverse quantization processing on the quantized value of the spectrum to obtain an inverse quantized spectrum;

step four, judging whether the control information contains information that the inverse quantization spectrum needs to be subjected to inverse frequency domain linear prediction vector quantization, if so, performing inverse vector quantization processing to obtain a prediction coefficient, and performing linear prediction synthesis on the inverse quantization spectrum according to the prediction coefficient to obtain a spectrum before prediction; if the information of the inverse quantization spectrum subjected to frequency domain linear prediction vector quantization is not contained, the inverse quantization spectrum is not processed, and the step five is carried out;

and step five, performing frequency-time mapping on the spectrum before prediction/the inverse-quantized spectrum to obtain a time domain audio signal of a low frequency band.

13. The method of claim 12, wherein said inverse vector quantization processing step further comprises: obtaining a code word index after the vector quantization of the prediction coefficient from the control information; and then, obtaining a quantized line spectrum pair frequency coefficient according to the code word index, and calculating a prediction coefficient according to the quantized line spectrum pair frequency coefficient.

14. The method of claim 12, wherein said step five further comprises: performing inverse modified discrete cosine transform on the inverse-quantized spectrum to obtain a transformed time domain signal; windowing the transformed time domain signal in the time domain; carrying out superposition processing on the windowed time domain signal to obtain a time domain audio signal; wherein the window function in the windowing process is:

w (N + k) ═ cos (pi/2 · ((k +0.5)/N-0.94 · sin (2 ·/N · (k +0.5))/(2 · pi))), wherein k ═ 0.. N-1; w (k) denotes the kth coefficient of the window function, w (k) ═ w (2 × N-1-k); n represents the number of samples of the encoded frame.

15. The method of any of claims 12 to 14, further comprising, between step two and step three: if the signal type analysis result shows that the signal types are consistent, judging whether sum and difference stereo decoding needs to be carried out on the quantized value of the spectrum according to the sum and difference stereo control signal; if so, judging whether the scale factor band needs sum and difference stereo decoding according to the flag bit on each scale factor band, if so, converting the quantized value of the spectrum of the sum and difference channel in the scale factor band into the quantized value of the spectrum of the left and right channels, and turning to the third step; if the signal types are not consistent or do not need to be enteredIf the line and difference stereo decoding is carried out, the quantized value of the spectrum is not processed, and the step three is carried out; wherein the sum and difference stereo decoding is:

[\begin{matrix} \hat{l} \\ \hat{r} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{m} \\ \hat{s} \end{matrix}],

wherein:

a quantized value representing the quantized sum channel spectrum;a quantized value representing a quantized difference channel spectrum;a quantized value representing a quantized left channel spectrum;

representing a quantized value of the quantized right channel spectrum.