EP1852851A1  An enhanced audio encoding/decoding device and method  Google Patents
An enhanced audio encoding/decoding device and method Download PDFInfo
 Publication number
 EP1852851A1 EP1852851A1 EP20050738242 EP05738242A EP1852851A1 EP 1852851 A1 EP1852851 A1 EP 1852851A1 EP 20050738242 EP20050738242 EP 20050738242 EP 05738242 A EP05738242 A EP 05738242A EP 1852851 A1 EP1852851 A1 EP 1852851A1
 Authority
 EP
 Grant status
 Application
 Patent type
 Prior art keywords
 frequency
 module
 spectrum
 domain
 signal
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Withdrawn
Links
Images
Classifications

 G—PHYSICS
 G10—MUSICAL INSTRUMENTS; ACOUSTICS
 G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
 G10L19/00—Speech or audio signals analysissynthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
 G10L19/02—Speech or audio signals analysissynthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
 G10L19/032—Quantisation or dequantisation of spectral components
 G10L19/038—Vector quantisation, e.g. TwinVQ audio

 G—PHYSICS
 G10—MUSICAL INSTRUMENTS; ACOUSTICS
 G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
 G10L19/00—Speech or audio signals analysissynthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
 G10L19/02—Speech or audio signals analysissynthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
 G10L19/028—Noise substitution, i.e. substituting nontonal spectral components by noisy source
Abstract
Description
 The invention relates to audio encoding and decoding, and in particular, to an enhanced audio encoding/decoding device and method based on a sensor model.
 In order to obtain HiFi digital audio signals, the digital audio signals need to be audio encoded or audio compressed for storage and transmission. The object of encoding the audio signals is to realize transparent representation thereof by using as less number of bits as possible, for example, the originally input audio signals are almost the same as the output audio signals after being encoded.
 In early 1980s, CD came into existence, which reflects many advantages of representing the audio signals by digits, such as high fidelity, large dynamic range and great robustness. However, all these advantages are achieved at the cost of a very high data rate. For example, the sampling rate requested by the digitization of the stereo signal of CD quality is 44.1kHz, and each sampling value has to be uniformly quantized by 15 bits, thus the noncompressed data rate reaches 1. 41Mb/s which brings great inconvenience to the transmission and storage of data, and the transmission and storage of data are limited by the bandwidth and cost especially in the situation of multimedia application and wireless transmission application. In order to maintain highquality audio signals, the data rate in new network and wireless multimedia digital audio system must be reduced without damaging the quality of the audio. With respect to said problem, various audio compression techniques have been put forward that can both obtain high compression ratio and generate hifi audio signals, among which the typical ones are the MPEG1/2/4 technique of ISO/IEC, AC2/AC3 technique of Dolby, ATRAC/MiniDisc/SDDS technique of Sony, and PAC/EPAC/MPAC technique of Lucent Technologies, etc. The MPEG2 AAC technique and the AC3 technique of Dolby are described specifically below.
 MPEG1 and MPEG2 BC techniques are high sound quality encoding technique mainly used for mono and stereo audio signals. With the increasing demand in the multichannel audio encoding that achieves high encoding quality at a relatively low code rate, since the MPEG2 BC encoding technique gives emphasis to backward compatibility with the MPEG1 technique, it is impossible to realize high sound quality encoding of five sound channels at a code rate lower than 540kbps. With respect to this shortage, the MPEG2 AAC technique was put forward, which can realize a high quality encoding of the five channel signals at a rate of 320kbps.
 Fig. 1 is a block diagram of the MPEG2 AAC encoder. Said encoder comprises a gain controller 101, a filter bank 102, a timedomain noise shaping module 103, an intensity/coupling module 104, a psychoacoustical model, a second order backward adaptive predictor 105, a sumdifference stereo module 106, a bit allocation and quantization encoding module 107, and a bit stream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further comprises a compression ratio/distortion processing controller, a scale factor module, a nonuniform quantizer, and an entropy encoding module.
 The filter bank 102 uses a modified discrete cosine transformation (MDCT), whose resolution is signaladaptive, that is, an MDCT transformation of 2048 dots is used for the steady state signal, while an MDCT transformation of 256 dots is used for the transient state signal, thus for a signal sampled at 48kHz, the maximum frequency resolution is 23Hz and the maximum time resolution is 2. 6ms. Meanwhile, sine window and KaiserBessel window can be used in the filter bank 102, and the sine window is used when the harmonic wave interval of the input signal is less than 140Hz, while the KaiserBessel window is used when the strong component interval in the input signal is greater than 220Hz.
 Audio signals enter the filter bank 102 through the gain controller 101, and are filtered according to the different signals, then the timedomain noise shaping module 103 processes the frequency spectrum coefficients output by the filter bank 102. The timedomain noise shaping technique performs linear prediction analysis on the frequency spectrum coefficients in the frequency domain, then controls the shape of the quantized noise in the time domain according to said analysis to thereby control the preecho.
 The intensity/coupling module 104 is used for stereo encoding of the signal intensity. With respect to signals of high frequency channel (greater than 2kHz), the sense of direction of audition is related to the change in the relevant signal intensity (signal envelope), but is irrelevant to the waveform of the signal, that is, a constant envelope signal has no influence on the sense of direction of audition. Therefore, this characteristic and the relevant information among multiple sound channels can be utilized to combine several sound channels into one common sound channel to be encoded, thereby forming the intensity/coupling technique.
 The second order backward adaptive predictor 105 is used for removing the redundancy of the steady state signal and improving the encoding efficiency. The sumdifference stereo (M/S) module 106 operates on sound channel pairs. The sound channel pair refers to the two sound channels of the leftright sound channels or the leftrigh't surround sound channels in, for example, dual sound channel signal or multiple sound channel signal. The M/S module 106 achieves the effect of reducing code rate and improving encoding efficiency by means of the correlation between the two sound channels in the sound channel pair. The bit allocation and quantization encoding module 107 is realized by a nested loop, wherein the nonuniform quantizer performs lossy encoding, while the entropy encoding module performs lossless encoding, thus removing redundancy and reducing correlation. The nested loop comprises inner layer loop and outer layer loop, wherein the inner layer loop adjusts the step size of the nonuniform quantizer until the provided bits are used up, and the outer layer loop estimates the encoding quality of signal by using the ratio between the quantized noise and the masking threshold. Finally, the encoded signals are formed into an encoded audio stream through the bit stream multiplexing module 108 to be output.
 Under scalable sampling rate, four frequency bands of equal bandwidth are generated in the multiphase filter bank of four frequency channels (PQF) while inputting signals. Each frequency band generating 256 frequency spectrum coefficients using MDCT, resulting in altogether 1024 frequency spectrum coefficients. The gain controller 101 is used in each frequency band. The high frequency PQF frequency band can be neglected in the decoder to obtain signals of low sampling rate.
 Fig. 2 is a schematic block diagram of the corresponding MPEG2 AAC decoder. Said decoder comprises a bit stream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, a sumdifference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, a timedomain noise shaping module 208, a filter bank 209 and a gain control module 210. The encoded audio stream is demultiplexed by the bit stream demultiplexing module 201 to obtain the corresponding data stream and control stream. Said signals are then decoded by the lossless decoding module 202 to obtain an integer representation of the scale factors and the quantized values of the signal spectrum. The inverse quantizer 203 is a nonuniform quantizer bank realized by a companding function, which is used for transforming the integer quantized values into a reconstruction spectrum. The scale factor module in the encoder differentiates the current scale factor from the previous scale factor and performs a Huffman encoding on the difference, so the scale factor module 204 in the decoder can obtain the corresponding difference through Huffman encoding, from which the real scale factor can be recovered. The M/S module 205 converts the sumdifference sound channel into a leftright sound channel under the control of the side information. Since the second order backward adaptive predictor 105 is used in the encoder to remove the redundancy of the steady state signal and improve the encoding efficiency, a prediction module 206 is used in the decoder for performing prediction decoding. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information; then outputs to the time domain noise shaping module 208 to perform time domain noise shaping decoding, and in the end, the filter bank 209 performs integration filtering by an inverse modified discrete cosine transformer (IMDCT) technique.
 In the case of scalable sampling rate, the PQF frequency band of high frequency can be neglected through the gain control module 210 so as to obtain signals of low sampling rate.
 The MPEG2 AAC encoding/decoding technique is suitable for audio signals of medium and high code rate, but it has a poor encoding quality for low code rate or very low code rate audio signals; meanwhile, said encoding/decoding technique involves a lot of encoding/decoding modules, so it is highly complex in implementation and is not easy for realtime implementation.
 Fig. 3 is a schematic drawing of the structure of the encoder using the Dolby AC3 technique, which comprises a transient state signal detection module 301, a modified discrete cosine transformation filter MDCT 302, a frequency spectrum envelope/index encoding module 303, a mantissa encoding module 304, a forwardbackward adaptive sensing model 305, a parameter bit allocation module 306, and a bit stream multiplexing module 307.
 The audio signal is determined through the transient state signal detection module 301 to be either a steady state signal or a transient state signal. Meanwhile, the timedomain data is mapped to the frequencydomain data through the signal adaptive MDCT filter bank 302, wherein a long window of 512 dots is applied to the steady state signal, and a pair of short windows are applied to the transient state signal.
 The frequency spectrum envelope/index encoding module 303 encodes the index portion of the signal according to the requirements of the code rate and frequency resolution in three modes, i.e. D15 encoding mode, D25 encoding mode and D45 encoding mode. The AC3 technique uses differential encoding for the spectrum envelope in frequency, because an increment of ±2 is needed at most, each increment representing a level change of 6dB. An absolute value encoding is used for the first DC item, and differential encoding is used for the rest of the indexes. In D15 frequency spectrum envelope index encoding, each index requires about 2.33 bits, and three differential groups are encoded in a word length of 7 bits. The D15 encoding mode sacrifices the time resolution to provide refined frequency resolution. Since only relative steady signals require refined frequency resolution, and the frequency spectrums of such signals are kept relatively constant on many blocks, with respect to the steady state signals, D15 is transmitted occasionally, usually the frequency spectrum envelope of every 6 sound blocks (one data frame) is transmitted at one time. When the signal frequency spectrum is not steady, the frequency spectrum estimate needs to be frequently updated. The estimate is encoded with lower frequency resolution generally using D25 and D45 encoding modes. The D25 encoding mode provides the appropriate frequency resolution and time resolution, and differential encoding is performed in every other frequency coefficient, thus each index needs about 1.15 bits. If the frequency spectrum is steady on two to three blocks but changes abruptly, the D25 encoding mode can be used. The D45 encoding mode performs differential encoding in every three frequency coefficients, thus each index needs about 0.58 bit. The D45 encoding mode provides very high time resolution but low frequency resolution, so it is generally used for encoding of transient state signals.
 The forwardbackward adaptive sensing model 305 is used for estimating the masking threshold of each frame of signals, wherein the forward adaptive portion is only applied to the encoder to estimate a group of optimal sensing model parameters through iterative loop under the restriction of the code rate, then said parameters are transferred to the backward adaptive portion to estimate the masking threshold of each frame. The backward adaptive portion is applied both to the encoder and the decoder.
 The parameter bit allocation module 306 analyzes the frequency spectrum envelope of the audio signals according to the masking rule to determine the number of bits allocated to each mantissa. Said module 306 performs an overall bit allocation for all the sound channels by using a bit reservoir. When encoding in the mantissa encoding module 304, bits are taken recurrently from the bit reservior to be allocated to all the sound channels. The quantization of the mantissa is adjusted according to the number of bits that can be obtained. In order to realize compression encoding, the AC3 encoder also uses the high frequency coupling technique, in which the high frequency portion of the coupled signal is divided into 18 subfrequency channels according to the critical bandwidth of the human ear, then some of the sound channels are selected to be coupled starting from a certain subband. Finally, AC3 audio stream is formed through the bit stream multiplexing module 307 to be output.
 Fig. 4 is a schematic drawing of the flow of decoding using Dolby AC3. First, the bit stream that is encoded byAC3 encoder is input, and data frame synchronization and error code detection are performed on the bit stream. If a data error code is detected, error code covering or muting processing is performed, then the bit stream is depackaged to obtain the primary information and the side information, and then index decoding is performed thereon. When performing index decoding, two pieces of side information are needed, one is the number of packaged indexes, the other is the index strategy that is adopted, such as D15, D25 or D45 mode. The decoded indexes and the bit allocation side information again perform the bit allocation to indicate the number of bits used by each packaged mantissa, thereby obtaining a group of bit allocation pointers, each corresponding to an encoded mantissa. The bit allocation pointers point out the quantizer for the mantissa and the number of bits occupied by each mantissa in the code stream. The single encoded mantissa value is dequantized to be transformed into a dequantized value, and the mantissa that occupies zero bit is recovered to zero or is replaced by a random jitter value under the control of the jitter mark. Then the decoupling operation is carried out, which recovers the high frequency portion of the coupled sound channel, including the index and the mantissa, from the common coupling sound channel and the coupling factor. When using the 2/0 mode to encode at the encoding terminal, a matrix processing is used for a certain subband, then at the decoding terminal, the sum and difference sound channel value of said subband should be converted into the leftright sound channel value through matrix recovery. The code stream includes the dynamic range control value of each audio block. A dynamic range compression is performed on said value to change the amplitude of the coefficient, including index and mantissa. The frequencydomain coefficients are inversely transformed into timedomain samples, then the timedomain samples are processed by adding window, and adjacent blocks are superposed to reconstruct the PCM audio signal. When the number of sound channels decoded and output is less than the number of sound channels in the encoded bit stream, a downmixing processing should be performed on the audio signal to finally output the PCM stream.
 The Dolby AC3 encoding technique is mainly for high bit rate signals of multichannel surround sound, but when the encoding bit rate of 5.1 sound channel is lower than 384kbps, the encoding effect is bad; besides, the encoding efficiency of stereo of mono and dual sound channels is also low.
 In summary; the existing encoding and decoding techniques cannot ensure the encoding and decoding quality of audio signals of very low code rate, low code rate and high code rate and of signals of mono and dual channels, and the implementation thereof is complex.
 The technical problem to be solved by this invention is to provide an enhanced audio encoding/decodirig device and method so as to overcome the low encoding efficiency and poor encoding quality with respect to the low code rate audio signals in the prior art.
 The enhanced audio encoding device of the invention comprises a signal type analyzing module, a psychoacoustical analyzing module, a timefrequency mapping module, a frequencydomain linear prediction and vector quantization module, a quantization and entropy encoding module, and a bitstream multiplexing module. The signal type analyzing module is configured to analyze the signal type of the input audio signal and output the audio signal to the psychoacoustical analyzing module and the timefrequency mapping module, and to output the result of signal type analysis to the bitstream multiplexing module at the same time; the psychoacoustical analyzingmodule is configured to calculate a masking threshold and a signaltomasking ratio of the input audio signal, and output them to said quantization and entropy encoding module; the timefrequency mapping module is configured to convert the timedomain audio signal into frequencydomain coefficients; the frequencydomain linear prediction and vector quantization module is configured to perform linear prediction and multistage vector quantization on the frequencydomain coefficients and output the residual sequence to the quantization and entropy encoding module and output the side information to the bitstream multiplexing module; the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the residual sequence under the control of the signaltomasking ratio output from the psychoacoustical analyzing module and to output it to the bitstream multiplexing module; and the bitstream multiplexing module is configured to multiplex the received data to form audio encoded code stream.
 The enhanced audio decoding device of the invention comprises a bitstream demultiplexing module, an entropy decoding module, an inverse quantizer bank, an inverse frequencydomain linear prediction and vector quantization module, and a frequencytime mapping module. The bitstream demultiplexing module is configured to demultiplex the compressed audio data stream and output the corresponding data signals and control signals to the entropy decoding module and the inverse frequencydomain linear prediction and vector quantization module; the entropy decoding module is configured to decode said signals, recover the quantized values of the spectrum so as to output to the inverse quantizer bank; the inverse quantizer bank is configured to reconstruct the inverse quantization spectrum and output it to the inverse frequencydomain linear prediction and vector quantization module; the inverse frequencydomain linear prediction and vector quantization module is configured to perform inverse quantization and inverse linear prediction filtering on the inverse quantization spectrum to obtain the spectrumbeforeprediction, and to output it to the frequencytime mapping module; and the frequencytime mapping module is configured to perform frequencytime mapping on the spectrum coefficients to obtain the timedomain audio signals of low frequency band.
 The invention is applicable to the HiFi compression encoding of audio signals with the configuration of multiple sampling rates and sound channels, and it supports audio signals with the sampling rate of 8kHz to 192kHz, meanwhile, it supports all possible sound channel configurations and supports audio encoding/decoding of wide range of target code rate.

 Fig. 1 is a block diagram of the MPEG2 AAC encoder;
 Fig. 2 is a block diagram of the MPEG2 AAC decoder;
 Fig. 3 is a schematic drawing of the structure of the encoder using the Dolby AC3 technique;
 Fig. 4 is a schematic drawing of the decoding flow using the Dolby AC3 technique;
 Fig. 5 is a schematic drawing of the structure of the audio encoding device according to the present invention;
 Fig. 6 is a schematic drawing of the structure of the audio decoding device according to the present invention;
 Fig. 7 is a schematic drawing of the structure of embodiment one of the encoding device according to the present invention;
 Fig. 8 is a schematic drawing of the filtering structure using wavelet transformation of Harr wavelet basis;
 Fig. 9 is a schematic drawing of the timefrequency division obtained by using wavelet transformation of Harr wavelet basis;
 Fig. 10 is a schematic drawing of the structure of embodiment one of the decoding device according to the present invention;
 Fig. 11 is a schematic drawing of the structure of embodiment two of the encoding device according to the present invention;
 Fig. 12 is a schematic drawing of the structure of embodiment two of the decoding device according to the present invention;
 Fig. 13 is a schematic drawing of the structure of embodiment three of the encoding device according to the present invention;
 Fig. 14 is a schematic drawing of the structure of embodiment three of the decoding device according to the present invention;
 Fig. 15 is a schematic drawing of the structure of embodiment four of the encoding device according to the present invention;
 Fig. 16 is a schematic drawing of the structure of embodiment four of the decoding device according to the present invention;
 Fig. 17 is a schematic drawing of the structure of embodiment five of the encoding device according to the present invention;
 Fig. 18 is a schematic drawing of the structure of embodiment five of the decoding device according to the present invention;
 Fig. 19 is a schematic drawing of the structure of embodiment six of the encoding device according to the present invention;
 Fig. 20 is a schematic drawing of the structure of embodiment six of the decoding device according to the present invention;
 Fig. 21 is a schematic drawing of the structure of embodiment seven of the encoding device according to the present invention;
 Fig. 22 is a schematic drawing of the structure of embodiment seven of the decoding device according to the present invention.
 Figs. 14 are the schematic drawings of the structures of the encoders of the prior art, which have been introduced in the background art, so they will not be elaborated herein.
 It has to be noted that to facilitate a convenient and clear description of the present invention, the following preferred embodiments of the encoding device and decoding device are described in a corresponding manner, but it is not necessary that the encoding device and the decoding device must be of onetoone correspondence.
 As shown in Fig. 5, the audio encoding device of the present invention comprises a signal type analyzing module 50, a psychoacoustical analyzing module 51, a timefrequency mapping module 52, a frequencydomain linear prediction and vector quantization module 53, a quantization and entropy encoding module 54, and a bitstream multiplexing module 55. The signal type analyzing module 50 is configured to analyze the signal type of the input audio signal; the psychoacoustical analyzing module 51 is configured to calculate a masking threshold and a signaltomasking ratio of the audio signal; the timefrequency mapping module 52 is configured to convert the timedomain audio signal into frequencydomain coefficients; the frequencydomain linear prediction and vector quantization module 53 is configured to perform linear prediction and multistage vector quantization on the frequencydomain coefficients and to output the residual sequence to the quantization and entropy encoding module 54, and to output the side information to the bitstream multiplexing module 55 at the same time; the quantization and entropy encoding module 54 is configured to perform quantization and entropy encoding of the residual coefficients under the control of the signaltomasking ratio output from the psychoacoustical analyzing module 51 and to output them to the bitstream multiplexing module 55; and the bitstream multiplexing module 55 is configured to multiplex the received data to form audio encoding code stream.
 After the digital audio signal is input to the signal preprocessing module 50, the signal type is analyzed and then the signal is input to the psychoacoustical analyzing module 51 and the timefrequency mapping module 52. On the one hand, the masking threshold and the signaltomasking ratio of said frame of audio signal are calculated in the psychoacoustical analyzing module 51, and the signaltomasking ratio is transmitted as a control signal to the quantization and entropy encoding module 54, and on the other hand, the timedomain audio signal is converted into frequencydomain coefficients through the timefrequency mapping module 52. Said frequencydomain coefficients are transmitted to the frequencydomain linear prediction and vector quantization module 53. If the gain threshold of the frequencydomain coefficients meets the given condition, linear prediction filtering is performed on the frequencydomain coefficients, and the resulted prediction coefficients are transformed into line spectrum frequency (LSF) coefficients. Then the optimal distortion measurement criterion is used to search and calculate the the code word indexes for the respective levels of code books, and the code word index is used as side information to be transferred to the bitstream multiplexing module 55, while the residual sequence obtained through prediction analysis is output to the quantization and entropy encoding module 54. Under the control of the signaltomasking ratio output from the psychoacoustical analyzing module 51, said residual sequence/frequencydomain coefficients are quantized and entropy encoded in the quantization and entropy encoding module 54. The encoded data and the side information are input to the bitstream multiplexing module 55 to be multiplexed to form a code stream of enhanced audio encoding.
 The modules composing said audio encoding device will be described below in detail.
 In the present invention, the signal type analyzing module 50 is configured to analyze the signal type of the input audio signal. The signal type analyzing module 50 determines if the signal is a slowly varying signal or a fast varying signal by analyzing the forward and backward masking effects based on the adaptive threshold and waveform prediction. If the signal is of a fast varying type, the relevant parameter information of the abrupt component is calculated, such as the location where the abrupt signal occurs and the intensity of the abrupt signal, etc.
 The psychoacoustical analyzing module 51 is mainly configured to calculate the masking threshold, the sensing entropy and the signaltomasking ratio of the input audio signal. The number of bits needed for the transparent encoding of the current signal frame can be dynamically analyzed based on the sensing entropy calculated by the psychoacoustical analyzing module 51, thereby adjusting the bit allocation among frames. The psychoacoustical analyzing module 51 outputs the signaltomasking ratio of each subband to the quantization and entropy encoding module 54 to control it.
 The timefrequency mapping module 52 is configured to convert the audio signal from a timedomain signal into a frequencydomain coefficient, and it is formed of a filter bank which can be specifically discrete Fourier transformation (DFT) filter bank, discrete cosine transformation (DCT) filter bank, modified discrete cosine transformation (MDCT) filter bank, cosine modulation filter bank, and wavelet transformation filter bank, etc.
 The frequencydomain coefficients obtained from the timefrequency mapping are transmitted to the frequencydomain linear prediction and vector quantization module 53 to undergo linear prediction and vector quantization. The frequencydomain linear prediction and vector quantization module 53 consists of a linear prediction analyzer, a linear prediction filter, a transformer, and a vector quantizer. Frequencydomain coefficients are input to the linear prediction analyzer for prediction analysis to obtain the prediction gain and prediction coefficients. If the prediction gain meets a certain condition, the frequenydomain coefficients are input to the linear prediction filter to be filtered, thereby obtaining the predicted residual sequence of the rfequencydomain coefficients. The residual sequence is directly output to the quantization and entropy encoding module 54, while the prediction coefficients are transformed into line spectrum pair frequency LSF coefficients through the transformer, and then are sent to the vector quantizer for a multistage vector quantization, and the quantized relevant side information is transmitted to the bitstream multiplexing module 55.
 Performing a frequencydomain linear prediction processing on the audio signals can effectively suppress the preecho and obtain greater encoding gain. Given that the real signal is x(t), and the square Hilbert envelope e(t) thereof is e(t) = e(t) = F ^{1}{∫C(ξ)·C*(ξ  f)dξ}, wherein C(f) is the oneside spectrum corresponding to the positive frequency component of signal x(t), that is, the Hilbert envelope of the signal is relevant to the autocorrelation function of said signal spectrum. The relationship between the power spectrum density function of the signal and the autocorrelation function of the timedomain waveform thereof is PSD(f) = F{∫x(τ)·x*(τt)dτ}, so the square Hilbert envelope of the signal at the timedomain and the power spectrum density function of the signal at the frequencydomain are corresponding to each other. It can be seen that with respect to some of the bandpass signals in each predetermined range, if the Hilbert envelope thereof is constant, the autocorrelation of the adjacent spectrum values is also constant, which implies that the sequence of the spectrum coefficients is a steady state sequence with respect to the frequency, thus the prediction encoding technique can be used to process the spectrum values and a group of common prediction coefficients can be used to effectively represent said signal.
 The quantization and entropy encoding module 54 further comprises a nonlinear quantizer bank and an encoder, wherein the quantizer can be either a scalar quantizer or a vector quantizer. The vector quantizer can be further divided into the two categories of memoryless vector quantizer and memory vector quantizer. As for the memoryless vector quantizer, each input vector is separately quantized independent of the previous vectors; while the memory vector quantizer quantizes a vector taking into account the previous vectors, i.e. using the correlation among the vectors. Main memoryless vector quantizers include full searching vector quantizer, tree searching vector quantizer, multistage vector quantizer, gain/waveform vector quantizer and separating mean vector quantizer; and the main memory vector quantizers include prediction vector quantizer and finite state vector quantizer.
 If the scalar quantizer is used, the nonlinear quantizer bank further comprises M subband quantizers. In each subband quantizer, the scale factor is mainly used to perform the quantization, specifically, all the frequencydomain coefficients of the M scale factor subband are nonlinearly compressed, then the frequencydomain coefficients of said subband are quantized by using the scale factors to obtain the quantization spectrum represented by an integer to be output to the encoder, The first scale factor in each frame of signal is used as the common scale factor to be output to the bitstream multiplexing module 55, and the rest of the scale factors are output to the encoder after differential processing with respect to their respective preceding scale factors.
 The scale factors in said step are constantly varying values, which are adjusted according to the bit allocation strategy. The present invention provides an overall sensing bit allocation strategy with the minimum distortion, and details are as follows:
 First, each subband quantizer is initialized to select an appropriate scale factor, so that the quantization values of the spectrum coefficients of all the subbands are zero. The quantization noise of each subband at this time equals to the energy value thereof, and the noise masking ratio NMR of each subband equals to its signaltomasking ratio SMR, the number of bit consumed by the quantization is zero, and the number of remaining bits B_{1} equals to the number of target bits B.
 Second, the subband with the largest noisetomasking ratio NMR is searched. If the largest noisetomasking ratio NMR is not more than 1, the scale factor remains unchanged and the allocation result is output, thus ending the bit allocation; otherwise, the scale factor of the corresponding subband quantizer is reduced by one unit, then the number of bits ΔB_{i} (Q_{i} ) that needs to be added for said subband is calculated. If the number of remaining bits of said subband B_{i}≥ΔB_{i} (Q_{i} ), the modification of said scale factor is confirmed and the number of remaining bits B_{i} is subtracted by ΔB_{i}(Q_{i} ) to recalculate the noisetomasking ratio NMR of said subband, then continue searching for the subband with the largest noisetomasking ratio NMR and repeat the subsequent steps. If the number of remaining bits B_{i}<ΔB_{i} (Q_{i} ), said modification is canceled and the previous scale factor and number of remaining bits are retained. Finally, the allocation result is output and the bit allocation is ended.
 If the vector quantizer is used, the frequencydomain coefficients form a plurality of Mdimensional vectors to be input to the nonlinear quantizer bank. Each Mdimensional vector is spectrum smoothed according to the smoothing factor, i.e. reducing the dynamic range of the spectrum. Then the vector quantizer finds the code word from the code book that has the shortest distance from the vector to be quantized according to the subjective perception distance measure criterion, and transfers the corresponding code word index to the encoder. The smoothing factor is adjusted based on the bit allocation strategy of vector quantization, while the bit allocation strategy of vector quantization is controlled according to the priority of sensing among different subbands.
 After said quantization processing, the entropy encoding technique is used to further remove the statistical redundancy of the quantized coefficients and the side information. Entropy encoding is a source encoding technique, whose basic idea is allocating shorter code words to symbols that have greater probability of appearance, and allocating longer code words to symbols that have less probability of appearance, thus the average code word length is the shortest. According to Shannon noiseless encoding theorem, if the transmitted N symbols of the source messages are independent from each other, appropriate variable length encoding is used, and the average length
n of the code word satisfies$\left[\frac{H\left(x\right)}{{\mathrm{log}}_{2}\left(D\right)}\right]\le \stackrel{\u203e}{n}<[\frac{H\left(x\right)}{{\mathrm{log}}_{2}\left(D\right)}+\frac{1}{N}],$ wherein H(x) represents the entropy of the signal source, x represents the symbol variable. Since entropy H(x) is the shortest limit of the average code word length, and said formula shows that the average length of the code word is very close to its lower limit entropy H(x), said variable length encoding technique is also called "entropy encoding". Entropy encoding mainly includes Huffman encoding, arithmetic encoding or run length encoding, etc. The entropy encoding in the present invention can be any of said encoding methods.  Entropy encoding is performed on the quantization spectrum quantized and output by the scalar quantizer and the differentially processed scale factors in the encoder to obtain the code book sequence numbers, the encoded values of the scale factors, and the lossless encoding quantization spectrum, then the code book sequence numbers are entropy encoded to obtain the encoded values of the code book sequence numbers, then the encoded values of the scale factors, the encoded values of the code book sequence numbers, and the lossless encoding quantization spectrum are output to the bitstream multiplexing module 55.
 The code word indexes quantized by the vector quantizer are onedimensional or multidimensional entropy encoded in the encoder to obtain the encoded values of the code word indexes, then the encoded values of the code word indexes are output to the bitstream multiplexing module 55.
 The bitstream multiplexing module 55 receives the side information output from the frequencydomain linear prediction and vector quantization module 53 and the code stream including the common scale factor, encoded values of the scale factors, encoded values of the code book sequence numbers and the quantization spectrum of lossless encoding or the encoded values of the code word indexes output from the quantization and entropy encoding module 54, and then multiplexes them to obtain the compressed audio data stream.
 The encoding method based on said encoder as described above includes analyzing the signal type of the input audio signal; calculating the signaltomasking ratio of the signal whose signal type has been analyzed; performing a timefrequency mapping on the signal whose signal type has been analyzed to obtain the frequencydomain coefficients of the audio signal; performing a standard linear prediction analysis on the frequencydomain coefficients to obtain the prediction gain and prediction coefficients; determining if the prediction gain exceeds the predetermined threshold, if it does, a frequencydomain linear prediction error filtering is performed on the frequencydomain coefficients based on the prediction coefficients to obtain the linear prediction residual sequence of the frequencydomain coefficients; transforming the prediction coefficients into line spectrum pair frequency coefficients, and performing a multistage vector quantization on said line spectrum pair frequency coefficients to obtain the side information; quantizing and entropy encoding the residual sequence; if the prediction gain does not exceed the predetermined threshold, quantizing and entropy encoding the frequencydomain coeffcients; and multiplexing the side information and the encoded audio signal to obtain the compressed audio code stream.
 The signal type analyzing step determines if the signal is of a fast varying type or of a slowly varying type by performing forward and backward masking effect analysis based on the adaptive threshold and waveform prediction, and the specific steps thereof are: decomposing the input audio data into frames; decomposing the input frames into a plurality of subframes and searching for the local extremal vertexes of the absolute values of the PCM data on each subframe; selecting the subframe peak value from the local extremal vertexes of the respective subframes; for a certain subframe peak value, predicting the typical sample value of a plurality of (typically four) subframes that are forward delayed with respect to said subframe by means of apluralityof (typically three) subframe peak values before said subframe; calculating the difference and ratio between said subframe peak value and the predicted typical sample value; if the predicted difference and ratio are both larger than the predetermined threshold, determining that said subframe has jump signal and confirming that said subframe has the local extremal vertex with the capability of backward masking preecho, if there is a subframe between the front end of said subframe and the position that is 2. 5ms in front of the masking vertex, whose peak value is small enough, determining that said frame of signal is a fast varying type signal; if the predicted difference and ratio are not larger than the predetermined threshold, repeating the above steps until it is determined that said frame of signal is fast varying type signal or until reaching the last subframe; and if it is still not determined whether said frame of signal is a fast varying type signal when the last subframe has been reached, said frame of signal is a slowly varying type signal.
 There are many methods for performing a timefrequency transformation of the timedomain audio signals, such as discrete Fourier transformation (DFT), discrete cosine transformation (DCT), modified discrete cosine transformation (MDCT), cosine modulation filter bank, wavelet transformation, etc. The modified discrete cosine transformation MDCT and cosine modulation filtering are taken as examples to illustrate the process of timefrequency mapping.
 With respect to using modified discrete cosine transformation MDCT to perform the timefrequency transformation, the timedomain signals of M samples from the previous frame and the time domain signals of M samples of the present frame are selected first, then a window adding operation is performed on the timedomain signal of altogether 2M samples of these two frames, and then, MDCT transformation is performed on the window added signal to obtain M frequencydomain coefficients.
 The pulse response of MDCT analysis filter is:
$${h}_{k}\left(n\right)=w\left(n\right)\sqrt{\frac{2}{M}}\mathrm{cos}\left[\frac{\left(2n+M+1\right)(2k+1)\mathit{\pi}}{4M}\right]$$ then MDCT transformation is:$$X\left(h\right)={\displaystyle \sum _{n=0}^{2M1}}x\left(n\right){h}_{k}\left(n\right)\phantom{\rule{2em}{0ex}}0\le k\le M1,$$
wherein w(n) is a window function, x(n) is the input timedomain signal of MDCT transformation, and X(k) is the output frequencydomain signal of MDCT transformation.  To meet the requirement for complete signal reconstruction, the window function w(n) of MDCT transformation must satisfy the following two conditions:
$$\mathrm{w}\left(\mathrm{2}\mathrm{M}\mathrm{}\mathrm{1}\mathrm{}\mathrm{n}\right)\mathrm{=}\mathrm{w}\left(\mathrm{n}\right){\phantom{\rule{1em}{0ex}}\mathrm{and\; w}}^{\mathrm{2}}\left(\mathrm{n}\right)\mathrm{+}{\mathrm{w}}^{\mathrm{2}}\left(\mathrm{n}\mathrm{+}\mathrm{M}\right)\mathrm{=}\mathrm{1.}$$  In practice, Sine window can be used as the window function. Of course, double orthogonal transformation can also be used, and said limitation to the window function is modified by specific analysis filter and synthesis filter.
 With respect to using cosine modulation filtering to perform the timefrequency transformation, the timedomain signals of M samples from the previous frame and the time domain signals of M samples of the present frame are selected first, then a window adding operation is performed on the timedomain signal of altogether 2M samples of these two frames, and then, cosine modulation filtering is performed on the window added signal to obtain M frequencydomain coefficients.
 The impulse responses of conventional cosine modulation filtering technique are:
$$\begin{array}{l}{h}_{k}\left(n\right)=2{p}_{a}\left(n\right)\mathrm{cos}\left(\frac{\mathit{\pi}}{M}\left(k+0.5\right)\left(n\frac{D}{2}\right)+{\theta}_{k}\right)\\ \phantom{\rule{13em}{0ex}}n=0,1,\cdots ,{N}_{h}1\end{array}$$ $$\begin{array}{l}{f}_{k}\left(n\right)=2{p}_{s}\left(n\right)\mathrm{cos}\left(\frac{\mathit{\pi}}{M}\left(k+0.5\right)\left(n\frac{D}{2}\right){\theta}_{k}\right),\\ \phantom{\rule{13em}{0ex}}n=0,1,\cdots ,{N}_{f}1\end{array}$$
wherein, 0≤k <M1, 0≤ n<2KM1, K is an integer greater than 0, and$${\theta}_{k}=(1{)}^{k}\frac{\mathit{\pi}}{4}$$  Suppose that the impulse response length of the analysis window (analysis prototype filter) P_{a}(n) of M subbands cosine modulation filter bank is N_{a}, and the impulse response length of integration window (integration prototype filter) P_{s}(n) is N_{s}. When the analysis window equals to the integration window, i.e. P_{a}(n) =P_{s}(n) and Na =N_{s}, the cosine modulation filter bank represented by the above two formulae is an orthogonal filter bank, and matrixes H and F ([H]_{n}, _{k} =h_{k}(n), [F]_{n,k} = f_{k}(n)) are orthogonal transformation matrixes. In order to obtain linear phase filter bank, it is further specified that the symmetrical windows satisfy P_{a}(2KM1n) =P_{a}(n). In order to ensure the complete reconstruction of the orthogonal and double orthogonal systems, the window function further needs to satisfy a certain condition, details can be found in the document "Multirate Systems and Filter Banks", P. P. Vaidynathan, Prentice Hall, Englewood Cliffs, NJ, 1993.
 The calculation of the masking threshold and signaltomas king ratio of the preprocessed audio signal includes the following steps:
 Step 1 : mapping the signal from timedomain to frequencydomain. Fast Fourier transformation and Hanning window techniques can be used to transform the timedomain data into frequencydomain coefficient X[k], X[k] is represented by amplitude r[k] and phaseϕ [k] as X[k] =r[k]e^{jϕ [k]}, then the energy e[b] of each subband is the sum of all the spectrum lines within said subband, i.e.
$e\left[b\right]={\displaystyle \sum _{k={k}_{l}}^{k={k}_{h}}}{r}^{2}\left[k\right],$ wherein k_{l} and k_{h} are respectively the upper and lower boundaries of the subband b.  Step 2: determining the tone and nontone components in the signal. The tonality of signal is estimated by performing interframe prediction on each spectrum line. The Euclidean distances of the prediction value and real value of each spectrum line are mapped into unpredictable measure, and spectrum component of high predictability is considered as having strong tonality, while spectrum component of low predictability is considered as quasinoise.
 The amplitude r_{pred} and phaseϕ_{pred} can be represented by the following equations:
$${r}_{\mathit{pred}}\left[k\right]={r}_{t1}\left[k\right]+\left({r}_{t1}\left[k\right]{r}_{t2}\left[k\right]\right)$$ $${\phi}_{\mathit{pred}}\left[k\right]={\phi}_{t1}\left[k\right]+\left({\phi}_{t1}\left[k\right]{\phi}_{t2}\left[k\right]\right)$$  with respect to its unpredictability, i.e.
$\mathrm{c}\left[\mathrm{b}\right]={\displaystyle \sum _{k={k}_{l}}^{k={k}_{h}}}c\left[k\right]{r}^{2}\left[k\right]\mathrm{.}$ A convolution operation is respectively performed for the subband energy e[b] and the unpredictability c[b] with respect to the spread function to obtain the subband energy spread e_{s}[b] arid the subband unpredictability spread c_{s}[b], the spread function of masking i with respect to subband b being represented by s[i, b]. In order to eliminate the influence on the energy transformation caused by the spread function, the subbanduripredictability spread c_{s}[b] has to be normalized, and the result of normalization c̃_{s} [b] is represented by${\tilde{c}}_{s}\left[b\right]={\frac{{c}_{s}\left[b\right]}{{e}_{s}\left[b\right]}}_{\mathrm{.}}$ Similarly, in order to eliminate the influence on the subband energy caused by the spread function, the normalized energy spread ẽ_{s}[b] is defined to be${\tilde{\mathrm{e}}}_{\mathrm{s}}\left[\mathrm{b}\right]={\frac{{e}_{s}\left[b\right]}{n\left[b\right]}}_{,}$ wherein the normalization factor n[b] is$n\left[b\right]={{\displaystyle \sum _{i=1}^{b\hspace{0.17em}\mathrm{max}}}s\left[i,,b\right]}_{,}$ b_{max} is the number of subbands allocated to said frame of signal.  The tonality t[b] of the subband can be calculated according to the normalized unpredictability spread c̃_{s}[b], i.e. t[b]= 0.2990.43log_{e}(c̃ _{s}[b]), and 0≤t [b]≤1. When t [b] =1, said subband signal is pure tone, and when t [b] =0, said subband signal is white noise.
 Step 3: calculating the signaltonoise ratio (SNR) needed for each subband. The value of the noisemaskingtone (NMT) of all the subbands is set to be 5dB, and the value of the tonemaskingnoise (TMN) is set to be 18dB; if the noise is to be made imperceptible, the signaltonoise ratio SNR[b] of each subband should be that SNR[b] =18t[b] +6(1t[b]).
 Step 4: calculating the masking threshold of each subband and the sensing entropy of the signal. The noise energy threshold n[b] of each subband is calculated to be n[b] = ẽ_{s} [b] 10^{SNR[b]/10} based on the normalized signal energy of each subband and the needed signaltonoise ratio SNR obtained in the above steps.
 In order to avoid the influence of the preecho, the noise energy threshold n[b] of the present frame is compared to the noise energy threshold n_{prev}[b] of the previous frame, and the masking threshold of the signal is obtained to be n[b] =min (n[b], 2n_{prev}[b]), thereby ensuring that there will not be any deviation in the masking threshold owing to the generation of highenergy impact at the near end of the analysis window.
 Further, while taking into account the influence of the still masking threshold qsthr [b], the final masking threshold of the signal is selected to be the larger one of the still masking threshold and said calculated masking threshold, i.e. n[b] = max (n[b], qsthr[b]). Then the sensing entropy is calculated by the equation of
$\mathit{pe}={{\displaystyle \sum _{b=0}^{{\displaystyle b\hspace{0.17em}\mathrm{max}}}}\left({\mathit{cbwidth}}_{b}\times {\mathrm{log}}_{10}\left(n\left[b\right]/\left(e\left[b\right]+1\right)\right)\right)}_{,}$ wherein cbwidth_{b} represents the number of spectrum lines included in each subband.  Step 5: calculating the signaltomasking ratio (SMR) of each subband signal. The signaltomasking ratio SMR[b] of each subband is
$\mathit{SMR}\left[b\right]=10{\mathrm{log}}_{10}{\left(\frac{e\left[b\right]}{n\left[b\right]}\right)}_{\mathrm{.}}$  After obtaining the frequencydomain coefficients, a liner prediction and vector quantization is performed on the frequencydomain coefficients. First, a standard linear prediction analysis is performed on the frequencydomain coefficients, including calculating the autocorrelation matrix, obtaining the prediction gain and the prediction coefficients by recursively executing the LevinsonDurbin algorithm. Then it is determined whether the calculated prediction gain exceeds a predetermined threshold, and if it does, a frequencydomain linear prediction error filtering is performed on the frequencydomain coefficients based on the prediction coefficients, otherwise, the frequencydomain coefficients are not processed and the next step is executed to quantize and entropy encoding the frequencydomain coefficients.
 Linear prediction includes forward prediction and backward prediction. Forward prediction refer to predicting the current value by using the values before a certain moment, while the backward prediction refers to predicting the current value by using the values after a certain moment. The forward prediction will be used as an example to explain the linear prediction error filtering. The linear predicton filtering function is
$A\left(z\right)=1{\displaystyle \sum _{t=1}^{p}}{a}_{i}{z}^{i},$ wherein a_{i} denotes the prediction coefficient, p is the prediction order. The frequencydomain coefficient X(k) that has undergone the timefrequency transformation is filtered to obtain the prediction error E(k), which is also called the residual sequence, and there is the relationship of$E\left(k\right)=X\left(k\right)\cdot A\left(z\right)=X\left(k\right){\displaystyle \sum _{t=1}^{p}}{a}_{i}X\left(ki\right)$  Thus, after frequencydomain linear prediction filtering, the frequencydomain coefficients X(k) output after the timefrequency transformation can be represented by the residual sequence E(k) and a group of prediction coefficients a_{i}. Then said group of prediction coefficients a_{i} are transformed into the linear spectrum frequency LSF coefficients, and multistage vector quantization is performed thereon. The vector quantization uses the optimal distortion measurement criterion (e.g. nearest neighboring criterion) to search and calculate the code word indexes of the respective stages of code book, thereby determining the code words corresponding to the prediction coefficients and outputting the code word indexes as the side information. Meanwhile, the residual sequence E(k) is quantized and entropy encoded. It can be seen from the encoding principle of linear prediction analysis that the dynamic range of the residual sequence of the spectrum coefficients is smaller than that of the original spectrum coefficients, so less number of bits are allocated thereto during quantization, or under the condition of same number of bits, improved encoding gain can be obtained.
 After obtaining the signaltomasking ratio of the subband signal, the frequencydomain coefficients or the residual sequence is quantized and entropy encoded based on said signaltomasking ratio, wherein the quantization can be scalar quantization or vector quantization.
 The scalar quantization comprises the steps of nonlinearly compressing the frequencydomain coefficients in all the scale factor bands; using the scale factor of each subband to quantize the frequencydomain coefficients of said subband to obtain the quantization spectrum represented by an integer; selecting the first scale factor in each frame of signal as the common scale factor; and differentiating the rest of the scale factors from their respective previous scale factors.
 The vector quantization comprises the steps of forming a plurality of multidimensional vector signals with the frequencydomain coefficients; performing spectrum smoothing for each Mdimensional vector according to the smoothing factor; searching for the code word from the code book that has the shortest distance from the vector to be quantized according to the subjective perception distance measure criterion to obtain the code word indexes.
 The entropy encoding step comprises entropy encoding the quantization spectrum and the differentiated scale factors to obtain the sequence numbers of the code book, the encoded values of the scale factors and the quantization spectrum of lossless encoding; and entropy encoding the sequence numbers of the code book to obtain the encoded values thereof.
 Or, a onedimensional or multidimensional entropy encoding is performed on the code word indexes to obtain the encoded values of the code word indexes.
 The entropy encoding method described above can be any one of the existing Huffman encoding, arithmetic encoding or run length encoding method.
 After quantization and entropy encoding, the encoded audio signal is obtained, which is multiplexed together with the common scale factor, side information and the result of signal type analysis to obtain the compressed audio code stream.
 Fig. 6 is a schematic drawing of the structure of the audio decoding device according to the present invention. The audio decoding device comprises a bitstream demultiplexing module 801, an entropy decoding module 802, an inverse quantizer bank 803, an inverse frequencydomain linear prediction and vector quantization module 804, and a frequencytime mapping module 805. The compressed audio code stream is demultiplexed by the bitstream demultiplexing module 801 to obtain the corresponding data signal and control signal which are output to the entropy decoding module 802 and the inverse frequencydomain linear prediction and vector quantization module 804; the data signal and control signal are decoded in the entropy decoding module 802 to recover the quantized values of the spectrum. Said quantized values are reconstructed in the inverse quantizer bank 803 to obtain the inversely quantized spectrum. The inversely quantized spectrum is then output to the inverse frequencydomain linear prediction and vector quantization module 804 for inverse quantization and inverse linear prediction filtering to obtain the spectrumbeforeprediction, which is output to the frequencytime mapping module 805, then the timedomain audio signal of low frequency band is obtained after the frequencytime mapping.
 The bitstream demultiplexing module 801 decomposes the compressed audio code stream to obtain the corresponding data signal and control signal and to provide the corresponding decoding information for other modules. The compressed audio data stream is demultiplexed, and the signals output to the entropy decoding module 802 include the common scale factor, the scale factor encoded values, the encoded values of the code book sequence numbers, and the quantized spectrum of the lossless encoding, or the encoded values of the code word indexes; the control information of inverse frequencydomain linear prediction and vector quantization is output to the inverse frequencydomain linear prediction and vector quantization module 804.
 If, in the encoding device, the quantization and entropy encoding module 54 use the scalar quantizer, then in the decoding device, what the entropy decoding module 802 receives are the common scale factor, the scale factor encoded values, the encoded values of the code book sequence numbers, and the quantized spectrum of the lossless encoding as output from the bitstream demultiplexing module 801, then code book sequence number decoding, spectrum coefficient decoding and scale factor decoding are performed thereon to reconstruct the quantization spectrum and to output the integer representation of the scale factors and the quantized values of the spectrum to the inverse quantizer bank 803. The decoding method used by the entropy decoding module 802 corresponds to the encoding method used by entropy encoding in the encoding device, which is, for example, Huffman decoding, arithmetic decoding or run length decoding, etc.
 Upon receipt of the quantized values of the spectrum and the integer representation of the scale factor, the inverse quantizer bank 803 inversely quantizes the quantized values of the spectrum into reconstructed spectrum without scaling (inverse quantization spectrum), and outputs the inverse quantization spectrum to the inverse frequencydomain linear prediction and vector quantization module 804. The inverse quantizer bank 803 can be either a uniform quantizer bank or a nonuniform quantizer bank realized by a companding function.
 In the encoding device, the quantizer bank uses the scalar quantizer, so in the decoding device, the inverse quantizer bank 803 also uses the scalar inverse quantizer. In the scalar inverse quantizer, the quantized values of the spectrum are nonlinearly expanded first, then all the spectrum coefficients (inverse quantization spectrum) in the corresponding scale factor band are obtained by using each scale factor.
 If the quantization and entropy encoding module 54 uses the vector quantizer, then in the decoding device, the entropy decoding module 802 receives the encoded values of the code word indexes output from the bitstream demultiplexing module 801. The encoded values of the code word indexes are decoded by the entropy decoding method corresponding to the entropy encoding method used in encoding, thereby obtaining the corresponding code word indexes.
 The code word indexes are output to the inverse quantizer bank 803. By looking up the code book, the quantized values (inverse quantization spectrum) are obtained and output to the frequencytime mapping module 805. The inverse quantizer bank 803 uses the inverse vector quantizer.
 In the encoder, the technique of frequencydomain linear prediction vector quantization is used to suppress the preecho and to obtain greater encoding gain. Therefore, in the decoder, the control information of inverse frequencydomain linear prediction vector quantization output from the inverse quantization spectrum and bitstream demultiplexing module 801 is input to the inverse frequencydomain linear prediction and vector quantization module 804 to recover the spectrumbefore linearprediction.
 The inverse frequencydomain linear prediction and vector quantization module 804 comprises an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is used for inversely quantizing the code word indexes to obtain the line spectrum pair frequency LSF coefficients; the inverse transformer is used for inversely transforming the line spectrum frequency LSF coefficients into prediction coefficients, and the inverse linear prediction filter is used for inversely filtering the inverse quantization spectrum based on the prediction coefficients to obtain the spectrumbeforeprediction and output it to the frequencytime mapping module 805.
 Timedomain audio signals of low frequency channel can be obtained by a mapping processing of the inverse quantization spectrum or the spectrumbeforeprediction by the frequencytime mapping module 805. The frequencytime mapping module 805 can be a filter bank of inverse discrete cosine transformation (IDCT), a filter bank of inverse discrete Fourier transformation (IDFT), a filter bank of inverse modified discrete cosine transformation (IMDCT), a filter bank of inverse wavelet transformation, and a cosine modulation filter bank, etc.
 The decoding method based on the abovementioned decoder comprises: demultiplexing the compressed audio code stream to obtain the data information and control information; entropy decoding said information to obtain the quantized value of the spectrum; inversely quantizing the quantized values of the spectrum to obtain the inverse quantization spectrum; determining if the control information contains information concerning that the inverse quantization spectrum needs to undergo the inverse frequencydomain linear prediction vector quantization, if it does, performing the inverse vector quantization to obtain the prediction coefficients, and performing an inverse linear prediction filtering on the inverse quantization spectrum according to the prediction coefficients to obtain the spectrumbeforeprediction; frequencytime mapping the spectrumbeforeprediction to obtain the timedomain audio signals of low frequency band; if the control information does not contain information concerning that the inverse quantization spectrum needs to undergo the inverse frequencydomain linear prediction vector quantization, frequencytime mapping the inverse quantization spectrum to obtain the timedomain audio signals of low frequency band.
 If the demultiplexed information include the encoded values of the code book sequence numbers, the common scale factor, the encoded values of the scale factors, and the quantization spectrum of the lossless encoding, then it means that the spectrum coefficients in the encoding device are quantized by the scalar quantization technique. Accordingly, the entropy decoding steps include: decoding the encoded values of the code book sequence numbers to obtain the code book sequence numbers of all the scale factor bands; decoding the quantization coefficients of all the scale factor bands according to the code book corresponding to the code book sequence number; and decoding the scale factors of all the scale factor bands to reconstruct the quantization spectrum. The entropy decoding method used in said process corresponds to the entropy encoding method used in the encoding method, which is, for example, run length decoding method, Huffman decoding method, or arithmetic decoding method, etc.
 The entropy decoding process is described below by using as examples the decoding of the code book sequence numbers by the run length decoding method, the decoding of the quantization coefficients by the Huffman, decoding method, and the decoding of the scale factors by the Huffman decoding method.
 First, the code book sequence numbers of all the scale factor bands are obtained through the run length decoding method. The decoded code book sequence numbers are integers within a certain range. Suppose that said range is [0, 11], then only the code book sequence numbers within said valid range, i.e. between 011, are corresponding to the Huffman code book of the spectrum coefficients. As for the allzero subband, a certain code book sequence can be selected to correspond to it, typically, the 0 sequence number can be selected.
 When the code book number of the respective scale factor band is obtained through decoding, the Huffman code book of spectrum coefficients corresponding to said code book number is used to decode the quantization coefficients of all the scale factor bands. If the code book number of a scale factor band is within the valid range, for example between 111 in this embodiment, then said code book number corresponds to a spectrumcoefficient code book, and said code book is used to decode the quantization spectrum to obtain the code word indexes of the quantization coefficients of the scale factor bands, subsequently, the code word indexes are depackaged to obtain the quantization coefficients. If the code book number of the scale factor band is not between 1 and 11, then said code book number is not corresponding to any spectrum coefficient code book, and the quantization coefficients of said scale factor band does not need to be decoded, but they are all directly set to be zero.
 The scale factor is used to reconstruct the spectrum value on the basis of the inverse quantization spectrum coefficients. If the code book number of the scale factor band is within the valid range, each code book number corresponds to a scale factor. When decoding said scale factors, the code stream occupied by the first scale factor is read first, then the rest of the scale factors are Huffman decoded to obtain the differences between each of the scale factors and their respective previous scale factors, and said differences are added to the values of the previous scale factors to obtain the respective scale factors. If the quantization coefficients of the present subband are all zero, then the scale factors of said subband do not have to be decoded.
 After said entropy decoding, the quantized values of the spectrum and the integer representation of the scale factors are obtained, then the quantized values of the spectrum are inversely quantized to obtain the inverse quantization spectrum. The inverse quantization processing includes nonlinearly expanding the quantized values of the spectrum, and obtaining all the spectrum coefficients (inverse quantization spectrum) in the corresponding scale factor band according to each scale factor.
 If the demultiplexed information contain the encoded values of the code word indexes, it means that the encoding device uses the vector quantization technique to quantize the spectrum coefficients, then the entropy decoding steps include: decoding the encoded values of the code word indexesby means of the entropy decoding method corresponding to the entropy encoding method used in the encoding device so as to obtain the code word indexes, then inversely quantizing the code word index to obtain the inverse quantization spectrum.
 An inverse frequencydomain linear prediction vector quantization is performed on the inverse quantization spectrum. First, it is determined if said frame of signal has undergone the frequencydomain linear prediction vector quantization according to the control information, if it has, the code word indexes resulted from the vector quantization of the prediction coefficients are obtained from the control information; then the quantized line spectrum frequency LSF coefficients are obtained according to the code word indexes, on the basis of which the prediction coefficients are calculated; subsequently, a linear prediction synthesizing is performed on the inverse quantization spectrum to obtain the spectrumbeforeprediction.
 The transfer function of the linear prediction error filtering is
$A\left(z\right)=1{\displaystyle \sum _{t=1}^{p}}{a}_{i}{z}^{i},$ wherein a_{i} denotes the prediction coefficient, p is the prediction order. The residual sequence E(k) and the spectrum X(k) before prediction has a relationship of$$X\left(k\right)=E\left(k\right)\cdot \frac{1}{A\left(z\right)}=E\left(k\right)+{\displaystyle \sum _{i=1}^{p}}{a}_{i}X\left(ki\right)\mathrm{.}$$  Thus the residual sequence E(k) and the calculated prediction coefficient a_{i} are synthesized by frequencydomain linear prediction to obtain the spectrum X(k) before prediction which is then frequencytime mapped.
 If the control information indicates that said signal frame has not undergone the frequencydomain linear prediction vector quantization, the inverse frequencydomain linear prediction vector quantization will not be performed, and the inverse quantization spectrum is directly frequencytime mapped.
 The method of performing a frequencytime mapping on the inverse quantization spectrum corresponds to the timefrequency mapping method in the encoding method, which can be inverse discrete cosine transformation (IDCT), inverse discrete Fourier transformation (IDFT), inverse modified discrete cosine transformation (IMDCT), and inverse wavelet transformation, etc.
 The frequencytime mapping process is illustrated below by taking inverse modified discrete cosine transformation IMDCT as an example. The frequencytime mapping process includes three steps: IMDCT transformation, timedomain window adding processing and timedomain superposing operation.
 First, IMDCT transformation is perform on the spectrumbeforeprediction or the inverse quantization spectrum to obtain the transformed timedomain signal x_{i,n} . The expression of IMDCT transformation is
$${x}_{l,n}=\frac{2}{N}{\displaystyle \sum _{k=0}^{\frac{N}{2}1}}\mathit{spec}\left[i\right]\left[k\right]\mathrm{cos}\left(\frac{2\mathit{\pi}}{N}\left(n+{n}_{0}\right)\left(k+\frac{1}{2}\right)\right)$$
wherein, n is the sequence number of the sample, and 0≤n<N, N represents the number of timedomain samples which is 2048, n _{0} =(N/2+1)/2; i represents the frame sequence number; k represents the spectrum sequence number.  Second, window adding is performed on the timedomain signal obtained from IMDCT transformation at the time domain. In order to satisfy the requirement for complete reconstruction, the window function w(n) must meet the two conditions of w (2M1n) = w(n) and w ^{2}(n) +w ^{2}(n+M) =1.
 Typical window functions include, among others, Sine window and KaiserBessel window. The present invention uses a fixed window function, which is w(N+k) =cos(pi/2*((k+0.5)/N0.94*sin (2*pi/N*(k+0.5))/(2*pi))), wherein pi is the circular constant, k=1...N1; w(k) represents the kth coefficient of the window function and w(k) = w (2*N1k); N represents the number of samples of the encoded frame, and N=1024. In addition, said restriction to the window function can be modified by using double orthogonal transformation with a specific analysis filter and synthesizing filter.
 Finally, said window added timedomain signal is superposed to obtain the timedomain audio signal. Specifically, the first N/2 samples of the signals obtained by the window adding are superposed with the last N/2 samples of the previous frame of signal to obtain N/2 output timedomain audio samples, i.e., timeSam_{i,n} = preSam_{i}, _{n} +preSam_{i1}, _{n+N/2}, wherein i denotes the frame sequence number, n denotes the sample sequence number,
$0\le n\le \frac{N}{2},$ and N is 2048.  After processing of the compressed audio data stream through the abovedescribed steps, timedomain audio signals of low frequency band are obtained.
 Fig. 7 is a schematic drawing of the structure of embodiment one of the encoding device of the present invention. On the basis of Fig. 5, this embodiment has a multiresolution analyzing module 56 added between the output of the frequencydomain linear prediction and vector quantization module 53 and the input of the quantization and entropy encoding module 54.
 With respect to signals of a fast varying type, in order to effectively overcome the preecho produced during the encoding and to improve the encoding quality, the encoding device of the present invention increases the time resolution of the encoded fast varying signals by means of a multiresolution analyzing module 56. The residual sequence or frequencydomain coefficients output from the frequencydomain linear prediction and vector quantization module 53 are input to the multiresolution analyzing module 56. If the signal is of a fast varying type, a frequencydomain wavelet transformation or frequencydomain modified discrete cosine transformation (MDCT) is performed to obtain the multiresolution representation for the residual sequence/frequencydomain coefficients to be output to the quantization and entropy encoding module 54. If the signal is of a slowly varying type, the residual sequence/frequencydomain coefficients are directly output to the quantization and entropy encoding module 54 without being processed.
 The multiresolution analyzing module 56 performs a timeandfrequencydomain reorganization of the input frequencydomain data to improve the time resolution of the frequencydomain data at the cost of reducing the precision of the frequency, thereby to automatically adapt to the timefrequency characteristics of the signals of fast varying type, accordingly, the effect of suppressing the preecho is achieved without adjusting the form of the filter bank in the timefrequency mapping module 52 at any time. The multiresolution analyzing module 56 comprises a frequencydomain coefficient transformation module and a reorganization module, wherein the frequencydomain coefficient transformation module is used for transforming the frequencydomain coefficients into a timefrequency plane coefficients; and the reorganization module is used for reorganizing the timefrequency plane coefficient according to a certain rule. The frequencydomain coefficient transformation module can use the filter bank of frequencydomain wavelet transformation, or the filter bank of frequencydomain MDCT transformation, etc.
 The operation process of multiresolution analysis module 56 is described below by taking frequencydomain wavelet transformation and frequencydomain MDCT transformation as examples.
 Suppose that the time series is x(i), i=0, 1, ...2M1, and the frequencydomain coefficients obtained through timefrequency mapping is X(k), k=0, 1, ... M1. The frequencydomain wavelet or the wavelet basis of wavelet package transformation may either be fixed or adaptive.
 The multiresolution analysis on the frequencydomain coefficients is illustrated below by taking the simplest wavelet transformation of Harr wavelet basis as an example.
 The scale coefficient of Harr wavelet basis is
${\left[\frac{1}{\sqrt{2}},,\frac{1}{\sqrt{2}}\right]}_{,}$ and  the wavelet coefficient is
${\left[\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}\right]}_{\mathrm{.}}$ Fig. 8 shows the schematic drawing of the filtering structure that performs wavelet transformation by using Harr wavelet basis, wherein H_{0} represents lowpass filtering (the filtering coefficient is${\left[\frac{1}{\sqrt{2}},,\frac{1}{\sqrt{2}}\right]}_{),}$ H_{1} represents highpass filtering (the filtering coefficient is${\left[\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}\right]}_{),}$ "↓ 2" represents a duple down sampling operation. No wavelet transformation is performed for the medium and low frequency portions X_{1}(k), k=1, ..., k_{1} in the frequencydomain coefficients, and Harr wavelet transformation is performed for the high frequency portions in the MDCT coefficients to obtain coefficients X_{2}(k), X_{3}(k), X_{4}(k), X_{5}(k), X_{6}(k), X_{7}(k), of different timefrequency intervals, and the division of the corresponding timefrequency plane is as shown in Fig. 9. By selecting different wavelet bases, different wavelet transformation structures can be used for processing so as to obtain other similar timefrequency plane divisions. Therefore, the timefrequency plane division during signal analysis can be discretionarily adjusted as desired so as to meet different requirements of the analysis of the time and frequency resolution.  The abovementioned timefrequency plane coefficients are reorganized in the reorganization module according to a certain rule, for example, the timefrequency plane coefficients can be organized in the frequency direction first, and the coefficients in each frequency band are organized in the time direction, then the organized coefficients are arranged in the order of subwindow and scale factor band.
 Suppose that the frequencydomain data input to the filter bank of the frequencydomain DMCT transformation is X(k), k= 0, 1, ..., N1. M dot MDCT transformation is performed on said N dot frequencydomain data sequentially, so that the frequency precision of the time frequency domain data is reduced, while the time precision is increased. MDCT transformations of different lengths are used in different frequencydomain ranges, thereby to obtain different timefrequency plane divisions, i.e. different time and frequency precision. The reorganization module reorganizes the timefrequency data output from the filter bank of the frequencydomain MDCT transformation. One way of reorganization is to organize the timefrequency plane coefficients in the frequency direction first, and the coefficients in each frequency band are organized in the time direction at the same time, then the organized coefficients are arranged in the order to subwindow and scale factor band.
 With respect to the encoding method based on the encoding device as shown in Fig. 7, the basic flow thereof is the same as that of the encoding method based on the encoding device as shown in Fig. 5, and the difference therebetween is that the former further includes the following steps: before quantizing and entropy encoding the residual sequence/frequencydomain coefficients, if the signal is a fast varying signal, performing a multiresolution analysis on the residual sequence/frequencydomain coefficients; if the signal is not a fast varying signal, directly quantizing and entropy encoding the residual sequence/frequencydomain coefficients.
 The multiresolution analysis can use frequencydomain wavelet transformation method or frequencydomain MDCT transformation method. The frequencydomain wavelet analysis method includes: wavelet transforming the frequencydomain coefficients to obtain the timefrequency plane coefficients; reorganizing said timefrequency plane coefficients according to a certain rule. The MDCT transformation includes: MDCT transforming the frequencydomain coefficients to obtain the timefrequency plane coefficients; reorganizing said timefrequency plane coefficients according to a certain rule. The reorganization method includes: organizing the timefrequency plane coefficients in the frequency direction, and organizing the coefficients in each frequency band in the time direction, then arranging the organized coefficients in the order of subwindow and scale factor band.
 Fig. 10 is a schematic drawing of embodiment one of the decoding device of the present invention. Said decoding device has a multiresolution integration module 806 added on the basis of the decoding device as shown in Fig. 6. Said multiresolution integration module 806 is between the output of the inverse quantizer bank 803 and the input of the inverse frequencydomain linear prediction and vector quantization module 804 for multiresolution integrating the inverse quantization spectrum.
 In the encoder, the technique of multiresolution filtering is used for the fast varying type signals to increase the time resolution of the encoded fast varying type signals. Accordingly, in the decoder, the multiresolution integration module 806 is used to recover the frequencydomain coefficients of the fast varying signals before multiresolution analysis. The multiresolution integration module 806 comprises a coefficient reorganization module and a coefficient transformation module, wherein the coefficient transformation module may use a filter bank of frequencydomain inverse wavelet transformation or a filter bank of frequencydomain IMDCT transformation.
 With respect to the decoding method of the decoding device as shown in Fig. 10, the basic flow thereof is the same as that of the decoding,method of the decoding device as shown in Fig. 6, and the difference is that the former further includes the steps of after obtaining the inverse quantization spectrum, performing a multiresolution integration thereon, and then determining if it is necessary to perform an inverse frequencydomain linear prediction vector quantization on the multiresolution integrated inverse quantization spectrum.
 The method of multiresolution integration is described below by taking the frequencydomain IMDCT transformation as an example. The method specifically includes: reorganizing the coefficients of the inverse quantization spectrum, performing a plurality of IMDCT transformation on each coefficient to obtain the inverse quantization spectrum before the multiresolution analysis. This process is described in detail by using 128 IMDCT transformation (8 inputs and 16 outputs). Firstly, the coefficients of the inverse quantization spectrum are arranged in the order of subwindow and scale factor band; then they are reorganized in the order of frequency, thus the 128 coefficients of each subwindow are organized together in the order of frequency. Subsequently, the coefficients that are arranged in the order of subwindow are organized in frequency direction with 8 in each group, and the 8 coefficients in each group are arranged in time sequence, thus there are altogether 128 groups of coefficients in the frequency direction. An IMDCT transformation of 16 dots is performed on each group of coefficients, and the 16 coefficients output after the IMDCT transformation of each group are added in an overlapping manner to obtain 8 frequencydomain data. 128 times of such operation are performed from the low frequency direction to the high frequency direction to obtain 1024 frequencydomain coefficients.
 Fig. 11 is the schematic drawing of the second embodiment of the encoding device of the present invention. On the basis of Fig. 5, said embodiment has a sumdifference stereo (M/S) encoding module 57 added between the output of the frequencydomain linear prediction and vector quantization module 53 and the input of the quantization and entropy encoding module 54. The psychoacoustical analyzing module 51 outputs the masking threshold of the sumdifference sound channel to the quantization and entropy encoding module 54. With respect to multichannel signals, the psychoacoustical analyzing module 51 calculates not only the masking threshold of the single sound channel of the audio signals, but also the masking threshold of the sumdifference sound channels. The sumdifference stereo encoding module 57 can also be located between the quantizer bank and the encoder in the quantization and entropy encoding module 54.
 The sumdifference stereo module 57 makes use of the correlation between the two sound channels in the sound channel pair to equate the freuqencydomain coefficients/residual sequence of the leftright sound channels to the freuqencydomain coefficients/residual sequence of the sumdifference sound channels, thereby reducing the code rate and improving the encoding efficiency. Hence, it is only suitable for multichannel signals of the same signal type. While as for mono signals or multichannel signals of different signal types, the sumdifference stereo encoding is not performed.
 The encoding method of the encoding device as shown in Fig. 11 is substantially the same as the encoding method of the encoding device as shown in Fig. 5, and the difference is that the former further includes the steps of determining whether the audio signals are multichannel signals before quantizing and entropy encoding the residual sequence/frequencydomain coefficients, if they are multichannel signals, determining whether the types of the signals of the leftright sound channels are the same, if the signal types are the same, determining whether the scale factor bands corresponding to the two sound channels meet the conditions of sumdifference stereo encoding, if they meet the conditions, performing a sumdifference stereo encoding on the residual sequence/frequencydomain coefficients to obtain the residual sequence/frequencydomain coefficients of the sumdifference sound channels; if they do not meet the conditions, the sumdifference stereo encoding is not performed. If the signals are mono signals or multichannel signals of different types, the frequencydomain coefficients are not processed.
 The sumdifference stereo encoding can be applied not only before the quantization, but also after the quantization and before the entropy encoding, that is, after quantizing the residual sequence/frequencydomain coefficients, it is determined if the audio signals are multichannel signals; if they are, it is determined if the signals of the leftright sound channels are of the same type, and if the signal types are the same, it is determined if the scale factor bands meet the encoding condition. If they meet the condition, performing a sumdifference stereo encoding on the quantization spectrum to obtain the quantization spectrum of the sumdifference sound channels. If they do not meet the conditions, the sumdifference stereo encoding is not performed. If the signals are mono signals or multichannel signals of different types, the frequencydomain coefficients are not processed.
 There are many methods for determining whether a sumdifference stereo encoding can be performed on the scale factor band, and the one used in the present invention is KL transformation. The specific process of determination is as follows:
 Suppose that the spectrum coefficient of the scale factor band of the left sound channel is l(k), and the corresponding spectrum coefficient of the scale factor band of the right sound channel is r(k), the correlation matrix thereof is
$C={\left(\begin{array}{cc}{\mathit{C}}_{\mathit{ll}}& {\mathit{C}}_{\mathit{lr}}\\ {\mathit{C}}_{\mathit{lr}}& {\mathit{C}}_{\mathit{rr}}\end{array}\right)}_{,}$ wherein$${\mathit{C}}_{\mathit{ll}}=\frac{1}{N}{\displaystyle \sum _{k=0}^{N1}}l\left\{k\right\}*l\left(k\right);{\mathit{C}}_{\mathit{lr}}=\frac{1}{N}{\displaystyle \sum _{k=0}^{N1}}l\left\{k\right\}*r\left(k\right);$$ ${\mathit{C}}_{\mathit{rr}}=\frac{1}{N}{\displaystyle \sum _{k=0}^{N1}}r\left\{k\right\}*r\left(k\right)$ is the number of spectrum lines of the scale factor band. The KL transformation is performed on the correlation matrix C to obtain${\mathit{RCR}}^{T}=\mathrm{\Lambda}={\left(\begin{array}{cc}{\mathit{\lambda}}_{\mathit{ii}}& 0\\ 0& {\mathit{\lambda}}_{\mathit{oo}}\end{array}\right)}_{,}$ wherein,$$R=\left(\begin{array}{cc}\mathrm{cos}\hspace{0.17em}a& \mathrm{sin}\hspace{0.17em}a\\ \mathrm{sin}\hspace{0.17em}a& \mathrm{cos}\hspace{0.17em}a\end{array}\right)\phantom{\rule{4em}{0ex}}a\in {\left[\frac{\mathit{\pi}}{2},\frac{\mathit{\pi}}{2}\right]}_{\mathrm{.}}$$ The rotation angle a satisfies the equation of$\mathrm{tan}\left(2a\right)={\frac{2{C}_{\mathit{lr}}}{{C}_{\mathit{ll}}{C}_{\mathit{rr}}}}_{\mathrm{.}}$ When a = ±π/4, it is the sumdifference stereo encoding mode. Therefore, when the absolute value of the rotation angle a deviates from π/4 by a small amount, e.g. 3π/16< a <5n/16, a sumdifference stereo encoding can be performed on the corresponding scale factor band.  If the sumdifference stereo encoding is applied before the quantization, the residual sequence/frequencydomain coefficients of the leftright sound channels at the scale factor band are linearly transformed and are replaced with the residual sequence/frequencydomain coefficients of the sumdifference sound channels :
$\left[\begin{array}{c}M\\ S\end{array}\right]=\frac{1}{2}\left[\begin{array}{cc}1& 1\\ 1& 1\end{array}\right][\begin{array}{c}L\\ R\end{array}{]}_{,}$ wherein M denotes the residual sequence/frequencydomain coefficients of the sum sound channel, S denotes the residual sequence/frequencydomain coefficients of the difference channel, L denotes the residual sequence/frequencydomain coefficients of the left sound channel, and R denotes the residual sequence/frequencydomain coefficients of the right sound channel.  If the sumdifference stereo encoding is applied after the quantization, the quantized residual sequence/frequencydomain coefficients of the leftright sound channels at the scale factor band are linearly transformed and are replaced with the residual sequence/frequencydomain coefficients of the sumdifference sound channels:
$\left[\begin{array}{c}\hat{\begin{array}{c}M\end{array}}\\ \hat{\begin{array}{c}S\end{array}}\end{array}\right]=\left[\begin{array}{cc}1& 0\\ 1& 1\end{array}\right][\begin{array}{c}\hat{\begin{array}{c}L\end{array}}\\ \hat{\begin{array}{c}R\end{array}}\end{array}{]}_{,}$ wherein M̂ denotes the quantized residual sequence/frequencydomain coefficients of the sum sound channel, Ŝ denotes the quantized residual sequence/frequencydomain coefficients of the difference channel, L̂ denotes the quantized residual sequence/frequencydomain coefficients of the left sound channel, and R̂ denotes the quantized residual sequence/frequencydomain coefficients of the right sound channel.  Putting the sumdifference stereo encoding after the quantization can effectively eliminate the correlation between the leftright sound channels, meanwhile, a lossless encoding can be realized since the encoding is after the quantization.
 Fig. 12 is a schematic drawing of embodiment two of the decoding device of the present invention. On the basis of the decoding device of Fig. 6, said decoding device has a sumdifference stereo decoding module 807 added between the output of the inverse quantizer bank 803 and the input of the inverse frequencydomain linear prediction and vector quantization module 804 to receive the result of signal type analysis and the sumdifference stereo control signal output from the bitstream demultiplexing module 801, and to transform the inverse quantization spectrum of the sumdifference sound channels into the inverse quantization spectrum of the leftright sound channels according to said control information.
 In the sumdifference control signal, there is a flag bit for indicating if the present sound channel pair needs a sumdifference stereo decoding. If they need, then there is also a flag bit on each scale factor to indicate if the corresponding scale factor band needs to be sumdifference stereo decoded, and the sumdifference stereo decoding module 66 determines, on the basis of the flag bit of the scale factor band, if it is necessary to perform sumdifference stereo decoding on the inverse quantization spectrum in some of the scale factor bands. If the sumdifference stereo encoding is performed in the encoding device, then the sumdifference stereo decoding must be performed on the inverse quantization spectrum in the decoding device.
 The sumdifference stereo decoding module 807 can also be located between the output of the entropy decoding module 802 and the input of the inverse quantizer bank 803 to receive the sumdifference stereo control signal and the result of signal type analysis output from the bitstream demultiplexing module 601.
 The decoding method of the decoding device as shown in Fig. 12 is substantially the same as the decoding method of the decoding device as shown in Fig. 6, and the difference is that the former further includes the followng steps: after obtaining the inverse quantization spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sumdifference stereo decoding on the inverse quantization spectrum according to the sumdifference stereo control signal. If it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sumdifference stereo decoding. If it needs, the inverse quantization spectrum of the sumdifference sound channels in said scale factor band is transformed into inverse quantization spectrum of the leftright sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sumdifference stereo decoding, the inverse quantization spectrum is not processed and the subsequent processing is directly performed.
 The sumdifference stereo decoding can also be performed after the entropy decoding and before the inverse quantization, that is, after obtaining the quantized values of the spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sumdifference stereo decoding on the quantized value sof the spectrum according to the sumdifference stereo control signal. If it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sumdifference stereo decoding, if it needs, the quantized values of the spectrum of the sumdifference sound channels in said scale factor band are transformed into the quantized values of the spectrum of the leftright sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sumdifference stereo decoding, the quantized values of the spectrum are not processed and the subsequent processing is directly performed.
 If the sumdifference stereo decoding is after the entropy decoding and before the inverse quantization, then the frequencydomain coefficients of the leftright sound channels in the scale factor band are obtained from the frequencydomain coefficients of the sumdifference sound channels through the equation of
$\left[\begin{array}{c}\hat{\begin{array}{c}l\end{array}}\\ \hat{\begin{array}{c}r\end{array}}\end{array}\right]=\left[\begin{array}{cc}1& 0\\ 1& 1\end{array}\right][\begin{array}{c}\hat{\begin{array}{c}m\end{array}}\\ \hat{\begin{array}{c}s\end{array}}\end{array}{]}_{,}$ wherein m denotes the quantized frequencydomain coefficients of the sum sound channel, ŝ denotes the quantized frequencydomain coefficients of the difference channel, î denotes the quantized frequencydomain coefficients of the left sound channel, and î denotes the quantized frequencydomain coefficients of the right sound channel.  If the sumdifference stereo decoding is after the inverse quantization, then the inversely quantized frequencydomain coefficients of the leftright sound channels in the subband are obtained from the frequencydomain coefficients of the sumdifference sound channels through the computation of the matrix of
$\left[\begin{array}{c}l\\ r\end{array}\right]=\left[\begin{array}{cc}1& 1\\ 1& 1\end{array}\right][\begin{array}{c}m\\ s\end{array}{]}_{,}$ wherein, m denotes the frequencydomain coefficients of the sum sound channel, s denotes the frequencydomain coefficients of the difference channel, l denotes the frequencydomain coefficients of the left sound channel, and r denotes the frequencydomain coefficients of the right sound channel.  Fig. 13 is a schematic drawing of the structure of the third embodiment of the encoding device of the present invention. On the basis of the encoding device as shown in Fig. 5, said embodiment has a frequency band spreading module 58 and a resampling module 59 added. The frequency band spreading module 58 is used for analyzing the originally input audio signal on the entire frequency band to extract the spectrum envelope of the high frequency portion and the parameters representing the correlation between the low and high frequency spectrum, and to output them as the frequency band spreading information to the bitstream multiplexing module 55; and the resampling module 59 is used for resampling the originally input audio signal to change the sampling rate thereof.
 The resampling includes upsampling and downsampling. The resampling is described below using downsampling as an example. In this embodiment, the resampling module 59 comprises a lowpass filter and a downsampler, wherein the lowpass filter is used for limiting the frequency band of the audio signals and eliminating the aliasing that might be caused by downsampling. The input audio signal is downsampled after being lowpass filtered. Suppose that the input audio signal is s(n), and said signal is output as v(n) after being filtered by the lowpass filter having a pulse response of h(n), then
$v\left(n\right)={{\displaystyle \sum _{k=\infty}^{\infty}}h\left(k\right)s\left(nk\right)}_{;}$ the sequence of an M times of downsampling on v(n) is x(n), then$x\left(m\right)=v\left(\mathit{Mm}\right)={{\displaystyle \sum _{k=\infty}^{\infty}}h\left(k\right)s\left(\mathit{Mm}k\right)}_{\mathrm{.}}$ Thus the sampling rate of the resampled audio signal x(n) is reduced by M times as compared to the sampling rate of the originally input audio signal s(n).  The basic principle of frequency band spreading is that with respect to most audio signals, there is a strong correlation between the characteristic of the high frequency portion thereof and the characteristic of the low frequency portion thereof, so the high frequency portions of the audio signals can be effectively reconstructed through the low frequency portions, thus the high frequency portions of the audio signals may not be transmitted. In order to ensure a correct reconstruction of the high frequency portions, only few frequency band spreading information are transmitted in the compressed audio code stream.
 The frequency band spreading module 58 comprises a parameter extracting module and a spectrum envelope extracting module. Signals are input to the parameter extracting module which extracts the parameters representing the spectrum characteristics of the input signals at different timefrequency regions, then in the spectrum envelope extracting module, the spectrum envelope of the high frequency portion of the signal is estimated at a certain timefrequency resolution. In order to ensure that the timefrequency resolution is most suitable for the characteristics of the present input signals, the timefrequency resolution of the spectrum envelope can be selected freely. The parameters of the spectrum characteristics of the input signals and the spectrum envelope of the high frequency portion are used as the output of frequency band spreading to be sent to the bitstream multiplexing module 55 for multiplexing.
 The encoding method based on the encoding device as shown in Fig. 13 is substantially the same as the encoding method based on the encoding device as shown in Fig. 5, and the difference is that the former further includes the following steps: resampling the audio signal before analyzing the type thereof; analyzing the input audio signal on the entire frequency band to extract the high frequency spectrum envelope and the parameters of the signal spectrum characteristics thereof as the control signal of the frequencyband spreading, which are multiplexed together with the audio encoded signal and the side information to obtain the compressed audio code stream. Wherein the resampling includes the two steps of limiting the frequency band of the audio signal and performing a multiple downsampling on the audio signal whose frequency band is limited.
 Fig. 14 is a schematic drawing of the structure of embodiment three of the decoding device of the present invention. On the basis of the decoding device as shown in Fig. 6, said decoding device has a frequency band spreading module 808 added, which receives the frequency band spreading control information output from the bit stream demultiplexing module 801 and the timedomain audio signal of low frequency channel output from the frequencytime mapping module 805, and which reconstructs the high frequency signal portion through spectrum shift and high frequency adjustment to output the wide band audio signal.
 The decoding method based on the decoding device as shown in Fig. 14 is substantially the same as the decoding method based on the decoding device as shown in Fig. 6, and the difference lies in that the former further includes the step of reconstructing the high frequency portion of the timedomain audio signal according to the frequency band spreading control information and the timedomain audio signal; thereby to obtain the wide band audio signal.
 Fig. 15 is a schematic drawing of the structure of the fourth embodiment of the encoding device of the present invention, which has a frequency band spreading module 58 and a resampling module 59 added on the basis of the encoding device as shown in Fig. 7. In this embodiment, the connection between said frequency band spreading module 58 and resampling module 59 and other modules, and the function and operation principle of these two modules are the same as those shown in Fig. 13, so they will not be elaborated herein.
 The encoding method based on the encoding device as shown in Fig. 15 is substantially the same as the encoding method based on the encoding device as shown in Fig. 7, and the difference is that the former further includes the following steps: resampling the audio signal before analyzing the type thereof; analyzing the input audio signal on the entire frequency band to extract the high frequency spectrum envelope and the parameters of the spectrum characteristics thereof; and multiplexing them together with the audio encoded signal and the side information to obtain the compressed audio code stream.
 Fig. 16 is a schematic drawing of embodiment four of the decoding device of the present invention. On the basis of the decoding device as shown in Fig. 10, said decoding device has a frequency band spreading module 808 added. In this embodiment, the connection between said frequency band spreading module 808 and other modules, and the function and operation principle thereof are the same as those shown in Fig. 14, so they will not be elaborated herein:
 The decoding method based on the decoding device as shown in Fig. 16 is substantially the same as the decoding method based on the decoding device as shown in Fig. 10, and the difference is that said decoding method further includes the step of reconstructing the high frequency portion of the audio signal according to the frequency band spreading control information and the timedomain audio signal, thereby to obtain audio signal of wide frequency band.
 Fig. 17 is a schematic drawing of the structure of the fifth embodiment of the encoding device of the present invention. On the basis of the encoding device as shown in Fig. 7, said embodiment has a sumdifference stereo encodingmodule 57 added between the output of the multiresolution analyzing module 56 and the input of the quantization and entropy encoding module 54, or between the quantizer bank and the encoder in the quantization and entropy encoding module 54. In this embodiment, the function and operation principle of the sumdifference stereo encoding module 57 are the same as those shown in Fig. 11, so they will not be elaborated herein.
 The encoding method of the encoding device as shown in Fig. 17 is substantially the same as the encoding method of the encoding device as shown in Fig. 7, and the difference is that the former further includes the steps of determining whether the audio signals are multichannel signals after multiresolution analysis of the residual sequence/frequencydomain coefficients. If they are multichannel signals, determining whether the types of the signals of the leftright sound channels are the same, and if the signal types are the same, determining whether the scale factor bands meet the encoding conditions. If they meet the conditions, performing a sumdifference stereo encoding on residual sequence/frequencydomain coefficients to obtain the residual sequence/frequencydomain coefficients of the sumdifference sound channels; if they do not meet the conditions, the sumdifference stereo encoding is not performed. If the signals are mono signals or multichannel signals of different types, the frequencydomain coefficients are not processed. The specific flow thereof has been described above, so it will not be elaborated again.
 Fig. 18 is a schematic drawing of the structure of embodiment five of the decoding device of the present invention. On the basis of the decoding device as shown in Fig. 10, said decoding device has a sumdifference stereo decoding module 807 added between the output of the inverse quantizer bank 803 and the input of the multiresolution integration module 806, or between the output of the entropy decoding module 802 and the input of the inverse quantizer bank 803. In this embodiment, the function and operation principle of the sumdifference stereo decoding module 807 are the same as those shown in Fig. 12, so they will not be elaborated herein.
 The decoding method of the decoding device as shown in Fig. 18 is substantially the same as the decoding method of the decoding device as shown in Fig. 10, and the difference is that the former further includes the following steps: after obtaining the inverse quantization spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sumdifference stereo decoding on the inverse quantization spectrum according to the sumdifference stereo control signal. If it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sumdifference stereo decoding, and if it needs, the inverse quantization spectrum of the sumdifference sound channels in said scale factor band is transformed into inverse quantization spectrum of the leftright sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sumdifference stereo decoding, the inverse quantization spectrum is not processed and the subsequent processing is directly performed. The specific flow thereof has been described above, so it will not be elaborated again.
 Fig. 19 is the schematic drawing of the sixth embodiment of the encoding device of the present invention. On the basis of Fig. 17, this embodiment has a frequency band spreading module 58 and a resampling module 59 added. In this embodiment, the connection between said frequency band spreading module 58 and resampling module 59 and other modules, and the functions and operation principles of these two modules are the same as those in Fig. 13, so they will not be elaborated herein.
 The encoding method based on the encoding device as shown in Fig. 19 is substantially the same as the encoding method based on the encoding device as shown in Fig. 17, and the difference is that the former further includes the following steps: resampling the audio signal before analyzing the type thereof; analyzing the input audio signal on the entire frequency band to extract the high frequency spectrum envelope and the parameters of the spectrum characteristics thereof; and multiplexing them together with the audio encoded signal and the side information to obtain the compressed audio code stream.
 Fig. 20 is a schematic drawing of embodiment six of the decoding device of the present invention.On the basis of the decoding device as shown in Fig. 18, said decoding device has a frequency band spreading module 808 added. In this embodiment, the connection between said frequency band spreading module 808 and other modules, and the function and principle thereof are the same as those shown in Fig. 14, so they will not be elaborated herein.
 The decoding method based on the decoding device as shown in Fig. 20 is substantially the same as the decoding method based on the decoding device as shown in Fig. 18, and the difference is that said decoding method further includes the step of reconstructing the high frequency portion of the audio signal according to the frequency band spreading control information and the timedomain audio signal, thereby to obtain audio signals of wide frequency band.
 Fig. 21 is a schematic drawing of the seventh embodiment of the encoding device of the present invention. On the basis of Fig. 11, said embodiment has a frequency band spreading module 58 and a resampling module 59 added. In this embodiment, the connection between said frequency band spreading module 58 and resampling module 59 and other modules, and the functions and operation principles of said two modules are the same as those in Fig. 14, so they will not be elaborated herein.
 The encoding method of the encoding device as shown in Fig. 21 is substantially the same as the encoding method of the encoding device as shown in Fig. 11, and the difference is that said encoding method further includes the steps of resampling the audio signal before analyzing the type thereof; analyzing the input audio signal on the entire frequency band to extract the high frequency spectrum envelope and the parameters of the spectrum characteristics thereof; and multiplexing them together with the audio encoded signal and the side information to obtain the compressed audio code stream.
 Fig. 22 is a schematic drawing of embodiment seven of the decoding device of the present invention. On the basis of the decoding device as shown in Fig. 12, said decoding device has a frequency band spreadingmodule 808 added. In this embodiment, the connection between said frequency band spreading module 808 and other modules, and the function and principle thereof are the same as those shown in Fig. 14, so they will not be elaborated herein.
 The decoding method based on the decoding device as shown in Fig. 22 is substantially the same as the decoding method based on the decoding device as shown in Fig. 12, and the difference is that said decoding method further includes the step of reconstructing the high frequency portion of the audio signal according to the frequency band spreading control information and the timedomain audio signal, thereby to obtain audio signals of wide frequency band.
 The seven embodiments of the encoding device as described above may also include a gain control module which receives the audio signals output from the signal type analyzing module 59, controls the dynamic range of the fast varying type signals, and eliminates the preecho in audio processing. The output thereof is connected to the timefrequency mapping module 52 and the psychoacoustical analyzing module 51, meanwhile, the amount of gain adjustment is output to the bitstream multiplexing module 55.
 According to the signal type of the audio signal, the gain control module controls only the fast varying type signal, while the slowly varying signal is directly output without being processed. As for the fast varying type signal, the gain control module adjusts the timedomain energy envelope of the signal to increase the gain value of the signal before the fast varying point, so that the amplitudes of the timedomain signal before and after the fast varying point are close to each other; then the timedomain signal whose timedomain energy envelope is adjusted is output to the timefrequency mapping module 52 Meanwhile, the amount of gain adjustment is output to the bitstream multiplexing module 55.
 The encoding method based on said encoding device is substantially the same as the encoding method based on the above described encoding device, and the difference lies in that the former further includes the step of performing a gain control on the signal whose signal type has been analyzed.
 The seven embodiments of the decoding device as described above may also include an inverse gain control module which is located after the output of the frequencytime mapping module 805 to receive the result of signal type analysis and the information of the amount of gain adjustment output from the bitstream demultiplexing module 801, thereby adjusting the gain of the timedomain signal and controlling the preecho. After receiving the reconstructed timedomain signal output from the frequencytime mapping module 805, the inverse gain control module controls the fast varying signals but leaves the slowly varying signals unprocessed. As for the signal of a fast varying type, the inverse gain control module adjusts the energy envelope of the reconstructed timedomain signal according to the information of the amount of gain adjustment, reduces the amplitude value of the signal before the fast varying point, and adjusts the energy envelope back to the original state of low in the front and high in the back. Thus the amplitude value of the quantified noise before the fast varying point will be reduced along with the amplitude value of the signal, thereby controlling the preecho.
 The decoding method based on said decoding device is substantially the same as the decoding method based on the above described decoding device, and the difference lies in that the former further includes the step of performing an inverse gain control on the reconstructed timedomain signals.
 Finally, it has to be noted that the abovementioned embodiments illustrate rather than limit the technical solutions of the invention. While the invention has been described in conjunction with preferred embodiments, those skilled in the art shall understand that that modifications or equivalent substitutions can be made to the technical solutions of the present invention without deviating from the spirit and scope of the technical solutions of the present invention. Accordingly, it is intended to embrace all such modifications or equivalent substitutions as fall within the scope of the appended claims
wherein the Euclidean distance dist (X[k], X _{ pred[k]}) is calculated by the equation of
Thus the unpredictability c[b] of each subband is the weighted sum of the energy of all the spectrum lines within said subband
Claims (15)
 An enhanced audio encoding device, comprising a psychoacoustical analyzing module, a timefrequency mapping module, a quantization and entropy encoding module, and a bitstream multiplexing module, characterized in that said device further comprises a signal type analyzing module and a frequencydomain linear prediction and vector quantization module; wherein
the signal type analyzing module is configured to analyze the signal type of the input audio signal and output the audio signal to the psychoacoustical analyzing module and the timefrequency mapping module, and to output the result of signal type analysis to the bitstream multiplexing module at the same time;
the psychoacoustical analyzing module is configured to calculate a masking threshold and a signaltomasking ratio of the audio signal whose signal type has been analyzed, and output them to said quantization and entropy encoding module; the timefrequency mapping module is configured to convert the timedomain audio signal into frequencydomain coefficients;
the frequencydomain linear prediction and vector quantization module is configured to perform a linear prediction on the frequencydomain coefficients, convert the produced prediction coefficients into the line spectrum pair frequency coefficients, and to perform a multistage vector quantization on the line spectrum pair frequency coefficients, and then to output the prediction residual sequence of the frequencydomain coefficients to the quantization and entropy encoding module, and output the side information to the bitstream multiplexing module;
the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the residual sequence/frequencydomain coefficients under the control of the signaltomasking ratio output from the psychoacoustical analyzing module, and to output them to the bitstream multiplexing module; and
the bitstream multiplexing module is configured to multiplex the received data to form audio encoding code stream.  The enhanced audio encoding device according to claim 1, characterized in that the frequencydomain linear prediction and vector quantization module consists of a linear prediction analyzer, a linear prediction filter, a transformer, and a vector quantizer;
the linear prediction analyzer is used for predictive analyzing the frquencydomain coefficients to obtain the prediction gain and prediction coefficients, and for outputting the frequencydomain coefficients that meet a certain condition to the linear prediction filter; while the prediction coefficients that do not meet the condition are directly output to said quantization and entropy encoding module;
the linear prediction filter is used for filtering the frequencydomain coefficients to obtain the residual sequence of the frequencydomain coefficients, and for outputting the residual sequence to the quantization and entropy encoding module and outputting the prediction coefficients to the transformer;
the transformer is used for transforming the prediction coefficients into line spectrum pair frequency coefficients; the vector quantizer is used for performing multistage vector quantization on the line spectrum pair frequency coefficients, and the quantized signals are transmitted to the bitstream multiplexing module.  The enhanced audio encoding device according to claim 1, further comprising a sumdifference stereo encoding module located between the output of the frequencydomain linear prediction and vector quantization module or the multiresolution analyzing module and the input of the quantization and entropy encoding module, or between the quantizer bank and the encoder in the quantization and entropy encoding module, which is used for transforming the frequencydomain coefficents/ residual sequence of the leftright sound channels into the frequencydomain coefficents/residual sequence of the sumdifference sound channel.
 The enhanced audio encoding device according to any one of claims 13, further comprising a resampling module and a frequency band spreading module; wherein
the resampling module is used for resampling the input audio signal to change the sampling rate thereof; said resampling module comprises a lowpass filter and a downsampler; wherein the lowpass filter is used for limiting the frequency band of the audio signal, and the downsampler is used for downsampling the signal to reduce the sampling rate of the signal;
the frequencyband spreading module is used for analyzing the original input audio signal on the entire frequency band to extract the spectrum envelope of the high frequency portion and the parameters representing the correlation between the low and high frequency portion and to output them to the bitstream multiplexing module; said frequencyband spreading module comprises a parameter extracting module and a spectrum envelope extracting module; said parameter extracting module is used for extracting the parameters representing the spectrum characteristics of the input signal at different timefrequency regions, and said spectrum envelope extracting module is used for estimating the spectrum envelope of the high frequency portion of the signal at a certain timefrequency resolution, and then outputting the parameters of the spectrum characteristics of the input signal and the spectrum envelope of the high frequency portion to the bitstream multiplexing module.  An enhanced audio encoding method, comprising the following steps:step 1: analyzing the type of the input audio signal and using the result of analysis of the signal type as a part of signal multiplexing;step 2: calculating the signaltomasking ratio of the signal whose signal type has been analyzed;step 3: timefrequency mapping the type analyzed signal to obtain the frequencydomain coefficients of the audio signal; step 4: performing a standard linear prediction analysis on the frequencydomain coefficients to obtain the prediction gain and the prediction coefficients; determining if the prediction gain exceeds the predetermined threshold, and if it does, a frequencydomain linear prediction error filtering is performed on the frequencydomain coefficients based on the prediction coefficients to obtain the residual sequence; transforming the prediction coefficients into line spectrum pair frequency coefficients, and performing a multistage vector quantization on said line spectrum pair frequency coefficients to obtain the side information; if the prediction gain does not exceed the predetermined threshold, proceeding to step 5 without processing the frequencydomain coefficients;step 5: quantizing and entropy encoding the residual sequence/frequencydomain coefficients;step 6: multiplexing the side information and the encoded audio signal to obtain the compressed audio code stream.
 The enhanced audio encoding method according to claim 5, characterized in that the quantization in step 5 is scalar quantization which comprises nonlinearly companding the frequencydomain coefficients in all the scale factor bands; using the scale factor of each subband to quantize the frequencydomain coefficients of said subband to obtain the quantization spectrum represented by an integer; selecting the first scale factor in each frame of signal as the common scale factor; and differentiating the rest of the scale factors from their respective previous scale factor;
the entropy encoding comprises entropy encoding the quantization spectrum and the differentiated scale factors to obtain the sequence numbers of the code book, the encoded values of the scale factors and the quantization spectrum of lossless encoding; and entropy encoding the sequence numbers of the code book to obtain the encoded values thereof.  The enhanced audio encoding method according claim 5 or 6, characterized in that said step 5 further comprises quantizing the residual sequence/frequencydomain coefficients; determining if the audio signals are multichannel signals, and if they are, determining if the signals of the leftright sound channels are of the same type; if the signal types are the same, determining if the scale factor bands corresponding to the two sound channels meet the conditions of sumdifference stereo encoding, and if they meet the conditions, performing a sumdifference stereo encoding on the residual sequence/frequencydomain coefficients in said scale factor bands of the two sound channels to obtain the residual sequence/frequencydomain coefficients of the sumdiffrence sound channels; if they do not meet the conditions, not performing the sumdifference stereo encoding on the residual sequence/frequencydomain coefficientsinsaidscale factor bands; if the signals are mono signals or multichannel signals of different types, not processing the residual sequence/frequencydomain coefficients; and entropy encoding the residual sequence/frequencydomain coefficients; wherein the method for determining whether the scale factor band meets the condition for encoding is KL transformation, specifically, the correlation matrix of the spectrum coefficients of the scale factor bands of the leftright sound channels are calculated; a KL transformation is performed on the correlation matrix; if the absolute value of the rotation angle a deviates from π/4 by a small amount, e.g. 3n/16< a <5π/16, a sumdifference stereo encoding can be performed on the corresponding scale factor bands; said sumdifference stereo encoding is
$\left[\begin{array}{c}\hat{\begin{array}{c}M\end{array}}\\ \hat{\begin{array}{c}S\end{array}}\end{array}\right]=\left[\begin{array}{cc}1& 0\\ 1& 1\end{array}\right][\begin{array}{c}\hat{\begin{array}{c}L\end{array}}\\ \hat{\begin{array}{c}R\end{array}}\end{array}{]}_{,}$ wherein M̂ denotes the quantized frequencydomain coefficients of the sum sound channel, Ŝ denotes the quantized frequencydomain coefficients of the difference channel, L̂ denotes the quantized frequencydomain coefficients of the left sound channel, and R̂ denotes the quantized frequencydomain coefficients of the right sound channel.  The enhanced audio encoding method according to any one of claims 57, characterized in that before said step 1, there is a step of resampling the input audio signal, which specifically includes: limiting the frequency band of the audio signal and performing a multiple downsampling on the audio signal whose frequency band is limited; and after said step 6, there is a step of analyzing the original input audio signal before the resampling on the entire frequency band to extract the high frequency spectrum envelope and the parameters of the signal spectrum characteristics thereof; and multiplexing them together with the audio encoded signal and the side information to obtain the compressed audio code stream.
 An enhanced audio decoding device, comprising a bitstream demultiplexing module, an entropy decoding module, an inverse quantizerbank, a frequencytime mapping module,characterized in that said device further comprises an inverse frequencydomain linear prediction and vector quantization module;
the bitstream demultiplexing module is configured to demultiplex the compressed audio data stream and output the corresponding data signal and control signal to the entropy decoding module and the inverse frequencydomain linear prediction and vector quantization module;
the entropy decodingmodule is configured to decode said signals, recover the quantized values of the spectrum so as to output to the inverse quantizer bank;
the inverse quantizer bank is configured to reconstruct the inverse quantization spectrum and output it to the inverse frequencydomain linear prediction and vector quantization module;
the inverse frequencydomain linear prediction and vector quantization module is configured to perform inverse linear prediction filtering on the inverse quantization spectrum to obtain the spectrumbeforeprediction and output it to the frequencytime mapping module; and
the frequencytime mapping module is configured to perform a frequencytime mapping on the spectrum coefficients to output the timedomain audio signal.  The enhanced audio decoding device according to claim 9, characterized in that the inverse frequencydomain linear prediction and vector quantization module comprises an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter ; wherein the inverse vector quantizer is used for inversely quantizing the code word indexes to obtain the line spectrum pair frequency coefficients, the inverse transformer is used for inversely transforming the line spectrum pair frequency coefficients into prediction coefficients, and the inverse linear prediction filter is used for inversely filtering the inverse quantization spectrum based on the prediction coefficients to obtain the spectrumbeforeprediction.
 The enhanced audio decoding device according to claim 9 or 10, further comprising a sumdifference stereo decoding module located between the output of the inverse quantizer bank and the input of the multiresolution integration module or the inverse frequencydomain linear prediction and vector quantization module, or between the output of the entropy decoding module and the input of the inverse quantizer bank, to receive the result of signal type analysis and the sumdifference stereo control signal output from the bitstream demultiplexing module, and to transform the inverse quantization spectrum of the sumdifference sound channels into the inverse quantization spectrum of the leftright sound channels based on said control information.
 An enhanced audio decodingmethod, comprising the following steps:step 1: demultiplexing the compressed audio data stream to obtain the data information and the control information;step 2: entropy decoding' said information to obtain the quantized values of the spectrum;step 3: inversely quantizing the quantized values of the spectrum to obtain the inverse quantization spectrum;step 4: determining if the control information contains information concerning that the inverse quantization spectrum has to undergo the inverse frequencydomain linear prediction vector quantization, and if it.does, performing the inverse vector quantization to obtain the prediction coefficients, and performing a linear prediction synthesizing on the inverse quantization spectrum according to the prediction coefficients to obtain the spectrumbeforeprediction; if it does not, proceeding to step 5 without processing the inverse quantization spectrum;step 5: performing a frequencytime mapping on the spectrumbeforeprediction/inverse quantization spectrum to obtain the timedomain audio signal of low frequency band.
 The enhanced audio decoding method according to claim 12, characterized in that said inverse vector quantization step further comprises obtaining from the control information the code word indexes resulted from the vector quantization of the prediction coefficients; and obtaining the quantized line spectrum pair frequency coefficients according to the code word indexes and calculating the prediction coefficients therefrom.
 The enhanced audio decoding method according to claim 12, characterized in that said step 5 further comprises performing inverse modified discrete cosine transformation on the inverse quantization spectrum to obtain the transformed timedomain signals; performing a window adding processing on the transformed timedomain signals in the time domain; superposing said window added timedomain signals to obtain the timedomain audio signals; wherein the function window in said window adding processing is: w(N+k) =cos (pi/2*((k+0.5)/N0.94*sin (2*pi/N*(k+0.5))/(2*pi))), wherein k=0...N1; w(k) represents the kth coefficient of the window function and w(k) = w(2*N1k); N represents the number of samples of the encoded frame.
 The enhanced audio decoding method according to any one of claims 1214, characterized in that between said steps 2 and 3, there are also the steps that if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sumdifference stereo decoding on the quantized value of the spectrum according to the sumdifference stereo control signal, and if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sumdifference stereo decoding; if it needs, the quantized values of the spectrum of the sumdifference sound channels in said scale factor band are transformed into the quantized values of the spectrum of the leftright sound channels, and proceed to step 3; if the signal types are not the same or it is unnecessary to perform the sumdifference stereo decoding, the quantized values of the spectrum are not processed and proceed to step 3;
wherein the sumdifference stereo decoding is$\left[\begin{array}{c}\hat{\begin{array}{c}l\end{array}}\\ \hat{\begin{array}{c}r\end{array}}\end{array}\right]=\left[\begin{array}{cc}1& 0\\ 1& 1\end{array}\right][\begin{array}{c}\hat{\begin{array}{c}m\end{array}}\\ \hat{\begin{array}{c}s\end{array}}\end{array}{]}_{,}$ wherein m̂ denotes the quantized value of the spectrum of the quantized sum sound channel, ŝ denotes the quantized value of the spectrum of the quantized difference channel, î denotes the quantized value of the spectrum of the quantized left sound channel, and r̂ denotes the quantized value of the spectrum of the quantized right sound channel.
Priority Applications (2)
Application Number  Priority Date  Filing Date  Title 

CN200410030945  20040401  
PCT/CN2005/000441 WO2005096274A1 (en)  20040401  20050401  An enhanced audio encoding/decoding device and method 
Publications (1)
Publication Number  Publication Date 

EP1852851A1 true true EP1852851A1 (en)  20071107 
Family
ID=35064018
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

EP20050738242 Withdrawn EP1852851A1 (en)  20040401  20050401  An enhanced audio encoding/decoding device and method 
Country Status (2)
Country  Link 

EP (1)  EP1852851A1 (en) 
WO (1)  WO2005096274A1 (en) 
Cited By (5)
Publication number  Priority date  Publication date  Assignee  Title 

WO2012110476A1 (en) *  20110214  20120823  FraunhoferGesellschaft zur Förderung der angewandten Forschung e.V.  Linear prediction based coding scheme using spectral domain noise shaping 
US8731917B2 (en)  20070302  20140520  Telefonaktiebolaget Lm Ericsson (Publ)  Methods and arrangements in a telecommunications network 
US9311926B2 (en)  20101018  20160412  Samsung Electronics Co., Ltd.  Apparatus and method for determining weighting function having for associating linear predictive coding (LPC) coefficients with line spectral frequency coefficients and immittance spectral frequency coefficients 
US9401152B2 (en)  20120518  20160726  Dolby Laboratories Licensing Corporation  System for maintaining reversible dynamic range control information associated with parametric audio coders 
RU2616863C2 (en) *  20100311  20170418  ФраунхоферГезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф.  Signal processor, window provider, encoded media signal, method for processing signal and method for providing window 
Families Citing this family (5)
Publication number  Priority date  Publication date  Assignee  Title 

US8027242B2 (en) *  20051021  20110927  Qualcomm Incorporated  Signal coding and decoding based on spectral dynamics 
US8392176B2 (en) *  20060410  20130305  Qualcomm Incorporated  Processing of excitation in audio coding and decoding 
EP2538405B1 (en)  20061110  20150708  Panasonic Intellectual Property Corporation of America  CELPcoded speech parameter decoding method and apparatus 
US8428957B2 (en)  20070824  20130423  Qualcomm Incorporated  Spectral noise shaping in audio coding based on spectral dynamics in frequency subbands 
ES2624419T3 (en)  20130121  20170714  Dolby Laboratories Licensing Corporation  System and method for optimizing the loudness and dynamic range by different playback devices 
Family Cites Families (3)
Publication number  Priority date  Publication date  Assignee  Title 

KR960012475B1 (en) *  19940118  19960920  배순훈  Digital audio coder of channel bit 
EP0720316B1 (en) *  19941230  19991208  Daewoo Electronics Co., Ltd  Adaptive digital audio encoding apparatus and a bit allocation method thereof 
CN1154084C (en) *  20020605  20040616  北京阜国数字技术有限公司  Audio coding/decoding technology based on pseudo wavelet filtering 
NonPatent Citations (1)
Title 

See references of WO2005096274A1 * 
Cited By (23)
Publication number  Priority date  Publication date  Assignee  Title 

US9076453B2 (en)  20070302  20150707  Telefonaktiebolaget Lm Ericsson (Publ)  Methods and arrangements in a telecommunications network 
US8731917B2 (en)  20070302  20140520  Telefonaktiebolaget Lm Ericsson (Publ)  Methods and arrangements in a telecommunications network 
RU2616863C2 (en) *  20100311  20170418  ФраунхоферГезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф.  Signal processor, window provider, encoded media signal, method for processing signal and method for providing window 
US9773507B2 (en)  20101018  20170926  Samsung Electronics Co., Ltd.  Apparatus and method for determining weighting function having for associating linear predictive coding (LPC) coefficients with line spectral frequency coefficients and immittance spectral frequency coefficients 
US9311926B2 (en)  20101018  20160412  Samsung Electronics Co., Ltd.  Apparatus and method for determining weighting function having for associating linear predictive coding (LPC) coefficients with line spectral frequency coefficients and immittance spectral frequency coefficients 
WO2012110476A1 (en) *  20110214  20120823  FraunhoferGesellschaft zur Förderung der angewandten Forschung e.V.  Linear prediction based coding scheme using spectral domain noise shaping 
US9047859B2 (en)  20110214  20150602  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Apparatus and method for encoding and decoding an audio signal using an aligned lookahead portion 
US9037457B2 (en)  20110214  20150519  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Audio codec supporting timedomain and frequencydomain coding modes 
US9153236B2 (en)  20110214  20151006  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Audio codec using noise synthesis during inactive phases 
CN103477387B (en) *  20110214  20151125  弗兰霍菲尔运输应用研究公司  Noise shaping using the spectral domain based on a linear predictive coding scheme 
RU2575993C2 (en) *  20110214  20160227  ФраунхоферГезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.  Linear predictionbased coding scheme using spectral domain noise shaping 
US8825496B2 (en)  20110214  20140902  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Noise generation in audio codecs 
US9384739B2 (en)  20110214  20160705  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Apparatus and method for error concealment in lowdelay unified speech and audio coding 
JP2014510306A (en) *  20110214  20140424  フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン  Linear prediction base coding scheme using spectral range noise shaping 
US9536530B2 (en)  20110214  20170103  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Information signal representation using lapped transform 
US9583110B2 (en)  20110214  20170228  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Apparatus and method for processing a decoded audio signal in a spectral domain 
US9595262B2 (en)  20110214  20170314  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Linear prediction based coding scheme using spectral domain noise shaping 
US9595263B2 (en)  20110214  20170314  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Encoding and decoding of pulse positions of tracks of an audio signal 
US9620129B2 (en)  20110214  20170411  FraunhoferGesellschaft Zur Foerderung Der Angewandten Forschung E.V.  Apparatus and method for coding a portion of an audio signal using a transient detection and a quality result 
CN103477387A (en) *  20110214  20131225  弗兰霍菲尔运输应用研究公司  Linear prediction based coding scheme using spectral domain noise shaping 
US9721578B2 (en)  20120518  20170801  Dolby Laboratories Licensing Corporation  System for maintaining reversible dynamic range control information associated with parametric audio coders 
US9401152B2 (en)  20120518  20160726  Dolby Laboratories Licensing Corporation  System for maintaining reversible dynamic range control information associated with parametric audio coders 
US9881629B2 (en)  20120518  20180130  Dolby Laboratories Licensing Corporation  System for maintaining reversible dynamic range control information associated with parametric audio coders 
Also Published As
Publication number  Publication date  Type 

WO2005096274A1 (en)  20051013  application 
Similar Documents
Publication  Publication Date  Title 

US7299190B2 (en)  Quantization and inverse quantization for audio  
US6351730B2 (en)  Lowcomplexity, lowdelay, scalable and embedded speech and audio coding with adaptive frame loss concealment  
US5809459A (en)  Method and apparatus for speech excitation waveform coding using multiple error waveforms  
US6721700B1 (en)  Audio coding method and apparatus  
US7469206B2 (en)  Methods for improving high frequency reconstruction  
US20050216262A1 (en)  Lossless multichannel audio codec  
US6011824A (en)  Signalreproduction method and apparatus  
US20030088400A1 (en)  Encoding device, decoding device and audio data distribution system  
US6826526B1 (en)  Audio signal coding method, decoding method, audio signal coding apparatus, and decoding apparatus where first vector quantization is performed on a signal and second vector quantization is performed on an error component resulting from the first vector quantization  
US6104996A (en)  Audio coding with loworder adaptive prediction of transients  
US20060004566A1 (en)  Lowbitrate encoding/decoding method and system  
US20060074642A1 (en)  Apparatus and methods for multichannel digital audio coding  
US20110004479A1 (en)  Harmonic transposition  
US20050267763A1 (en)  Multichannel audio extension  
US6904404B1 (en)  Multistage inverse quantization having the plurality of frequency bands  
US7275036B2 (en)  Apparatus and method for coding a timediscrete audio signal to obtain coded audio data and for decoding coded audio data  
US20030233236A1 (en)  Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components  
US20080133223A1 (en)  Method and apparatus to extract important frequency component of audio signal and method and apparatus to encode and/or decode audio signal using the same  
US20110035212A1 (en)  Transform coding of speech and audio signals  
US20100023336A1 (en)  Compression of audio scalefactors by twodimensional transformation  
US20090271204A1 (en)  Audio Compression  
US20070063877A1 (en)  Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding  
US20050159941A1 (en)  Method and apparatus for audio compression  
US20070016404A1 (en)  Method and apparatus to extract important spectral component from audio signal and low bitrate audio signal coding and/or decoding method and apparatus using the same  
JP2004004530A (en)  Encoding apparatus, decoding apparatus and its method 
Legal Events
Date  Code  Title  Description 

17P  Request for examination filed 
Effective date: 20070817 

AK  Designated contracting states: 
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR 

RIN1  Inventor (correction) 
Inventor name: DENG, HAO Inventor name: EHRET, ANDREAS Inventor name: HENN, FREDRIK Inventor name: HOERICH, HOLGER Inventor name: MARTIN, DIETZ Inventor name: PAN, XINGDE Inventor name: REN, WEIMIN Inventor name: SCHUG, MICHAEL Inventor name: WANG, LEI Inventor name: ZHU, XIAOMING 

RIN1  Inventor (correction) 
Inventor name: DENG, HAO Inventor name: EHRET, ANDREAS Inventor name: HENN, FREDRIK Inventor name: HOERICH, HOLGER Inventor name: MARTIN, DIETZ Inventor name: PAN, XINGDE Inventor name: REN, WEIMIN Inventor name: SCHUG, MICHAEL Inventor name: WANG, LEI Inventor name: ZHU, XIAOMING 

18W  Withdrawn 
Effective date: 20090323 