CN1677491A

CN1677491A - Intensified audio-frequency coding-decoding device and method

Info

Publication number: CN1677491A
Application number: CNA2004100461540A
Authority: CN
Inventors: 潘兴德; 安德斯·叶瑞特; 朱晓明; 麦可·舒克; 任为民; 王磊; 豪格·何瑞施; 邓昊; 佛里德理克·海恩
Original assignee: BEIJING FUGUO DIGITAL TECHN Co Ltd; Coding Technology Ltd; GONGYU DIGITAL TECHNOLOGY Co Ltd BEIJNG
Current assignee: BEIJING FUGUO DIGITAL TECHN Co Ltd; Coding Technology Ltd; GONGYU DIGITAL TECHNOLOGY Co Ltd BEIJNG
Priority date: 2004-04-01
Filing date: 2004-06-02
Publication date: 2005-10-05

Abstract

An enhanced audio encoding device is consisted of a psychoacoustical analyzing module, a time-frequency mapping module, a frequency-domain linear prediction and vector quantization module, a quantization and entropy encoding module, and a bit-stream multiplexing module, in which the psychoacoustical analyzing module calculates a masking threshold of the input signal and determine the signal types; after the time-frequency mapping module converts the input time domain audio signal into a frequency domain coefficient, the frequency-domain linear prediction and vector quantization module carries out a linear prediction and a multilevel vector quantization, which outputs a residual sequence to the quantization and entropy encoding module for processing, and at the same time outputs the edge information to the bit-stream multiplexing module; and the bit-stream multiplexing module multiplexes the edge information and the code signals, so as to form an audio encoding code stream. The invention is applicable to the Hi-Fi compression encoding of audio signals with the configuration of multiple sampling rates and sound channels, and it supports audio signals with the sampling range of 8kHz to 192kHz. Meanwhile, it supports all possible sound channel configurations and supports audio encoding/decoding with a wide range of target code rate.

Description

A kind of enhancing audio coding and decoding device and method

Technical field

The present invention relates to the audio encoding and decoding technique field, specifically, relate to a kind of enhancing audio coding and decoding device and method based on sensor model.

Background technology

For obtaining the digital audio and video signals of high-fidelity, need digital audio and video signals is carried out audio coding or audio compression so that storage and transmission.Purpose to coding audio signal is to represent for example almost do not have difference between the sound signal of the sound signal of original input and the output of encoded back with the least possible bit number realization the transparent of sound signal.

In the early 1980s, the appearance of CD has embodied the plurality of advantages with the numeral sound signal, for example high fidelity, great dynamic range and strong robustness.Yet these advantages all are to be cost with very high data rate.For example the desired sampling rate of the digitizing of the stereophonic signal of CD quality is 44.1kHz, and each sampled value needs to carry out uniform quantization with 15 bits, like this, the data rate through overcompression has not just reached 1.41Mb/s, so high data rate brings great inconvenience for the transmission and the storage of data, particularly under the occasion of multimedia application and wireless transmission application, be subjected to the restriction of bandwidth and cost especially.In order to keep high-quality sound signal, the therefore network that will look for novelty and the radio multimedium digital audio system speed that must reduce data, and do not damage the quality of audio frequency simultaneously.At the problems referred to above, proposed at present multiplely can obtain the audio compression techniques that high compression ratio can produce the sound signal of high-fidelity again very much, the MPEG-1/-2/-4 technology of ISO (International Standards Organization) ISO/IEC, the AC-2/AC-3 technology of Doby company, the ATRAC/MiniDisc/SDDS technology of Sony and the PAC/EPAC/MPAC technology of Lucent Technologies etc. have typically been arranged.Select the AC-3 technology of MPEG-2 AAC technology, Doby company to carry out specific description below.

MPEG-1 technology and MPEG-2 BC technology are the high tone quality coding techniquess that is mainly used in monophony and stereo audio signal, growing along with to the demand of the multi-channel audio coding that is issued to higher coding quality at low code check, because MPEG-2 BC coding techniques is emphasized the backwards compatibility with the MPEG-1 technology, therefore can't realize the high tone quality coding of five-sound channel with the code check that is lower than 540kbps.At this deficiency, MPEG-2 AAC technology has been proposed, this technology can adopt the speed of 320kbps the five-sound channel signal to be realized the coding of better quality.

Fig. 1 has provided the block scheme of MPEG-2 AAC scrambler, this scrambler comprise behind gain controller 101, bank of filters 102, time-domain noise reshaping module 103, intensity/coupling module 104, psychoacoustic model, the second order to adaptive predictor 105 and/difference stereo module 106, Bit Allocation in Discrete and quantization encoding module 107 and bit stream Multiplexing module 108, wherein Bit Allocation in Discrete and quantization encoding module 107 further comprise ratio of compression/distortion processing controller, scale factor module, non-uniform quantizing device and entropy coding module.

Bank of filters 102 adopts improved discrete cosine transform (MDCT), and its resolution is signal adaptive, promptly adopts 2048 MDCT conversion for steady-state signal, then adopts 256 MDCT conversion for transient signal; Like this, for the signal of 48kHz sampling, its maximum frequency resolution is 23Hz, maximum time resolution be 2.6ms.In bank of filters 102, can use simultaneously sinusoidal windows and Kaiser-Bessel window, when the harmonic interval of input signal is used sinusoidal windows during less than 140Hz, when composition very strong in the input signal uses the Kaiser-Bessel window during at interval greater than 220Hz.

Sound signal enters bank of filters 102 through behind the gain controller 101, carry out filtering according to different signals, handle by the spectral coefficient of 103 pairs of bank of filters of time-domain noise reshaping module, 102 outputs then, the time-domain noise reshaping technology is on frequency domain spectral coefficient to be carried out linear prediction analysis, according to the shape of above-mentioned analysis and Control quantizing noise on time domain, reach the purpose of control Pre echoes then with this.

Intensity/coupling module 104 is the stereo codings that are used for signal intensity, because signal for high band (greater than 2kHz), the sense of direction of the sense of hearing is relevant with the variation of relevant signal intensity (signal envelope), and it is irrelevant with the waveform of signal, be that constant envelope signal does not have influence to sense of hearing sense of direction, therefore can utilize the relevant information between these characteristics and multichannel, the synthetic common sound channel of some sound channels is encoded, this has just formed intensity/coupling technique.

Be used to eliminate the redundancy of steady-state signal behind the second order to adaptive predictor 105, improve code efficiency.With the difference stereo (M/S) module 106 be at sound channel to operating, sound channel is to being meant two sound channels such as left and right acoustic channels in binaural signal or the multi-channel signal or left and right sides surround channel.M/S module 106 utilizes the correlativity between two sound channels of sound channel centering to reduce code check and the effect that improves code efficiency to reach.Bit Allocation in Discrete and quantization encoding module 107 realize that by a nested loop process wherein the non-uniform quantizing device is to carry out lossy coding, and the entropy coding module is to carry out lossless coding, can remove redundant relevant with minimizing like this.Nested loop comprises interior loop and outer circulation, and wherein the step-length of interior loop adjustment non-uniform quantizing device uses up up to the bit that is provided, and outer circulation then utilizes the coding quality of the estimated signal recently of quantizing noise and masking threshold.Form the audio stream output of coding at last by bit stream Multiplexing module 108 through encoded signals.

Under the telescopic situation of sampling rate, input signal carries out producing in the four frequency range polyphase filters groups (PQF) frequency band of four equibands simultaneously, and each band utilization MDCT produces 256 spectral coefficients, always has 1024.In each frequency band, all use gain controller 101.And the PQF frequency band that can ignore high frequency in demoder obtains the low sampling rate signal.

Fig. 2 has provided the block diagram of corresponding MPEG-2 AAC demoder.This demoder comprise bit stream demultiplexing module 201, losslessly encoding module 202, inverse quantizer 203, scale factor module 204 and/difference stereo (M/S) module 205, prediction module 206, intensity/coupling module 207, time-domain noise reshaping module 208, bank of filters 209 and gain control module 210.The audio stream of coding carries out demultiplexing through bit stream demultiplexing module 201, obtains corresponding data stream and control stream.Above-mentioned signal obtains the integer representation of scale factor and the quantized value of signal spectrum by after the decoding of losslessly encoding module 202.Inverse quantizer 203 is one group of non-uniform quantizing device group that realize by the companding function, is used for the integer quantisation value is converted to the reconstruction spectrum.Because the scale factor module in the scrambler is that current scale factor and last scale factor are carried out difference, then difference value is adopted the Huffman coding, therefore the scale factor module 204 in the demoder is carried out the Huffman decoding and can be obtained corresponding difference value, recovers real scale factor again.M/S module 205 will convert left and right acoustic channels to the difference sound channel under the control of side information.Owing to eliminate the redundant of steady-state signal and improve code efficiency to adaptive predictor 105 after in scrambler, adopting second order, therefore in demoder, carry out prediction decoding by prediction module 206.Intensity/coupling module 207 carries out intensity/coupling decoding under the control of side information, output to then and carry out the time-domain noise reshaping decoding in the time-domain noise reshaping module 208, carry out integrated filter by bank of filters 209 at last, bank of filters 209 adopts reverse improvement discrete cosine transform (IMDCT) technology.

For the telescopic situation of sample frequency, can ignore the PQF frequency band of high frequency by gain control module 210, to obtain the low sampling rate signal.

MPEG-2 AAC technology in the sound signal of high code check can reach very high coding quality, but then coding quality is relatively poor to the sound signal of low code check or very low code check.In addition, this technology coding/decoding module is more, and the complexity of realization is higher, is unfavorable for real-time implementation.

Fig. 3 has provided the structural representation of the scrambler that adopts Doby AC-3 technology, comprises that transient signal detection module 301, improved discrete cosine transform wave filter MDCT 302, spectrum envelope/index coding module 303, mantissa's coding module 304, forward direction-back are to self-adaptation sensor model 305, parameter Bit Allocation in Discrete module 306 and bit stream Multiplexing module 307.

It is steady-state signal or transient signal that sound signal is differentiated by transient signal detection module 301, by signal adaptive MDCT bank of filters 302 time domain data is mapped to frequency domain data simultaneously, wherein 512 long window is applied to steady-state signal, and a pair of short window is applied to transient signal.

Spectrum envelope/index coding module 303 adopts three kinds of patterns that the exponential part of signal is encoded according to the requirement of code check and frequency resolution, is respectively D15, D25 and D45 coding mode.The AC-3 technology is taked differential coding to spectrum envelope on frequency, because need ± 2 increments at most, on behalf of the level of 6dB, each increment change, and adopts the absolute value coding for first DC terms, and its coexponent just adopts differential coding.In D15 spectrum envelope index coding, each index approximately needs 2.33 bits, and 3 differential set are encoded in the word length of one 7 bit, and the D15 coding mode provides meticulous frequency resolution by sacrificing temporal resolution.Owing to just relative signal is stably just needed meticulous frequency resolution, and the frequency spectrum of such signal on many keeps constant relatively, therefore, and for steady-state signal, D15 is transmitted once in a while, and the spectrum envelope of normally per 6 sound chunk (Frame) is transmitted once.When signal spectrum is unstable, need the normal spectrum estimation value of upgrading.Estimated value adopts less frequency resolution coding, uses D25 and D45 coding mode usually.The D25 coding mode provides suitable frequency resolution and temporal resolution, just carries out differential coding every a coefficient of frequency, and each index approximately needs 1.15 bits like this.When frequency spectrum all is stable, when changing suddenly then, can adopt the D25 coding mode on 2 to 3 pieces.The D45 coding mode is to carry out differential coding every three coefficient of frequencies, and each index approximately needs 0.58 bit like this.The D45 coding mode provides very high temporal resolution and lower frequency resolution, so generally be applied in the coding to transient signal.

Forward direction-back is used to estimate the masking threshold of every frame signal to self-adaptation sensor model 305.Wherein forward direction self-adaptation part only is applied in encoder-side, under the restriction of code check, estimates the sensor model parameter of one group of the best by iterative loop, then these parameters be passed to the back to the self-adaptation part to estimate the masking threshold of every frame.The back is applied in encoder-side and decoder end simultaneously to the self-adaptation part.

Parameter Bit Allocation in Discrete module 306 is according to the spectrum envelope of sheltering criterion analyzing audio signal, with the bit number of determining to distribute to each mantissa.This module 306 utilizes a bit pond that all sound channels are carried out overall Bit Allocation in Discrete.When encoding in mantissa's coding module 304, circulation is taken out Bit Allocation in Discrete and is given all sound channels from the bit pond, adjusts the quantification of mantissa according to the bit number that can obtain.For reaching the purpose of compressed encoding, the AC-3 scrambler also adopts the technology of high-frequency coupling, and the HFS that is coupled signal is divided into 18 frequency sub-band according to people's ear critical bandwidth, selects some sound channel to begin to be coupled from certain subband then.Form the output of AC-3 audio stream by bit stream Multiplexing module 307 at last.

Fig. 4 has provided the schematic flow sheet that adopts Doby AC-3 decoding.At first input flows to line data frame synchronization and Error detection through the bit stream of AC-3 encoder encodes to bit, if detect a data error code, then carries out code error shielding or off beat and handles.Then bit stream is unpacked, obtain main information and side information, carry out index decoder again.When carrying out index decoder, need two side informations: the one, the index number of packing; One is the index strategy that is adopted, as D15, D25 or D45 pattern.The exponential sum Bit Allocation in Discrete side information of having decoded carries out Bit Allocation in Discrete again, points out the used bit number of mantissa of each packing, obtains one group of Bit Allocation in Discrete pointer, the mantissa of the corresponding coding of each Bit Allocation in Discrete pointer.The Bit Allocation in Discrete pointer points out to be used for the quantizer of mantissa and the bit number that takies in each mantissa of code stream.Single encoded mantissa value is carried out de-quantization, be converted into the value of a de-quantization, the mantissa that takies zero bit is resumed zero, perhaps replaces with a randomized jitter value under the control of shake sign.Carry out uncoupled operation then, de is to recover the HFS that is coupled sound channel from public coupling track and coupling factor, comprises exponential sum mantissa.If when coding side adopts 2/0 pattern-coding, can adopt matrix to handle to certain subband, need to recover to convert this subband to the left and right acoustic channels value with the difference channel value in decoding end so by matrix.In code stream, include the dynamic range control value of each audio block, this value is carried out dynamic range compression,, comprise exponential sum mantissa to change the amplitude of coefficient.Frequency coefficient is carried out inverse transformation, be transformed into time domain samples, then time domain samples is carried out windowing process, adjacent piece carries out overlap-add, reconstructs the pcm audio signal.When the channel number of decoding output during less than the channel number in the coded bit stream, also need sound signal descend mix processing, export PCM at last and flow.

Doby AC-3 coding techniques is primarily aimed at the signal of high bit rate multitrack surround sound, but when the coding bit rate of 5.1 sound channels was lower than 384kbps, its coding effect was relatively poor; And it is also lower for the code efficiency of monophony and stereophony.

To sum up, existing encoding and decoding technique can't solve from very low code check, low code check to high code check sound signal comprehensively and the encoding and decoding quality of monophony, binaural signal, realizes comparatively complicated.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of device and method that strengthens audio coding decoding, to solve code efficiency low, the ropy problem of prior art for low code check sound signal.

Enhancing audio coding apparatus of the present invention comprises psychoacoustic analysis module, time-frequency mapping block, frequency-domain linear prediction and vector quantization module, quantification and entropy coding module and bit stream Multiplexing module; Described psychoacoustic analysis module be used to calculate the masking threshold of input audio signal and letter cover than and the type of judgement signal, and export to described quantification and entropy coding module; Described time-frequency mapping block is used for the time-domain audio signal of input is transformed into frequency coefficient; Described frequency-domain linear prediction and vector quantization module are used for frequency coefficient is carried out linear prediction and multi-stage vector quantization, and the output residual sequence is exported side information to described bit stream Multiplexing module simultaneously to described quantification and entropy coding module; Described quantification and entropy coding module be used for the letter of described psychoacoustic analysis module output cover than control under frequency coefficient/residual sequence is quantized and entropy coding, and output to described bit stream Multiplexing module; The data that described bit stream Multiplexing module is used for receiving are carried out multiplexing, form the code stream of audio coding.

Enhancing audio decoding apparatus of the present invention comprises bit stream demultiplexing module, entropy decoder module, inverse quantizer group, contrary frequency-domain linear prediction and vector quantization module and frequency-time map module; Described bit stream demultiplexing module is used for audio compressed data stream is carried out demultiplexing, and exports corresponding data-signal and control signal to described entropy decoder module and described contrary frequency-domain linear prediction and vector quantization module; Described entropy decoder module is used for above-mentioned signal is carried out decoding processing, recovers the quantized value of spectrum, outputs to described inverse quantizer group; Described inverse quantizer group is used to rebuild the re-quantization spectrum, and outputs in described contrary frequency-domain linear prediction and the vector quantization module; Described contrary frequency-domain linear prediction and vector quantization module are used for the re-quantization spectrum is carried out re-quantization processing and contrary linear prediction filtering, the spectrum before obtaining predicting, and export described to frequency-time map module; Described frequency-time map module is used for spectral coefficient is carried out frequency-time map, obtains the time-domain audio signal of low-frequency band.

The present invention is applicable to the high-fidelity compressed encoding of the sound signal of multiple sampling rate, channel configuration, can support that sampling rate is that 8kHz is to the sound signal between the 192kHz; Can support all possible channel configuration; And the audio coding/decoding of the target bit rate that the support scope is very wide.

Description of drawings

Fig. 1 is the block scheme of MPEG-2 AAC scrambler;

Fig. 2 is the block scheme of MPEG-2 AAC demoder;

Fig. 3 is the structural representation that adopts the scrambler of Doby AC-3 technology;

Fig. 4 is the decoding process synoptic diagram that adopts Doby AC-3 technology;

Fig. 5 is the structural representation of audio coding apparatus of the present invention;

Fig. 6 is the structural representation of audio decoding apparatus of the present invention;

Fig. 7 is the structural representation of the embodiment one of code device of the present invention;

Fig. 8 is the filter structure synoptic diagram that adopts Harr wavelet basis wavelet transformation;

Fig. 9 is that the time-frequency that adopts Harr wavelet basis wavelet transformation to obtain is divided synoptic diagram;

Figure 10 is the structural representation of the embodiment one of decoding device of the present invention;

Figure 11 is the structural representation of the embodiment two of code device of the present invention;

Figure 12 is the structural representation of the embodiment two of decoding device of the present invention;

Figure 13 is the structural representation of the embodiment three of code device of the present invention;

Figure 14 is the structural representation of the embodiment three of decoding device of the present invention.

Embodiment

Fig. 1 to Fig. 4 is the structural representation of several scramblers of prior art, introduces in background technology, repeats no more herein.

Need to prove: adopt corresponding mode to illustrate for the present invention, the specific embodiment of following coding and decoding device conveniently, clearly are described, but do not show that code device and decoding device must be one to one.

As shown in Figure 5, audio coding apparatus provided by the invention comprises psychoacoustic analysis module 501, time-frequency mapping block 502, frequency-domain linear prediction and vector quantization module 503, quantification and entropy coding module 504 and bit stream Multiplexing module 505; Wherein psychoacoustic analysis module 501 masking threshold and the letter that are used to calculate sound signal covered ratio, and the type of judging signal; Time-frequency mapping block 502 is used for the time-domain audio signal of input is transformed into frequency coefficient; Frequency-domain linear prediction and vector quantization module 503 are used for frequency coefficient is carried out linear prediction and multi-stage vector quantization, and the output residual sequence outputs to side information bit stream Multiplexing module 505 simultaneously to quantizing and entropy coding module 504; Quantize and entropy coding module 504 be used for the letter that psychoacoustic analysis module 501 is exported cover than control under residual error coefficient is quantized and entropy coding, and output to bit stream Multiplexing module 505; The data that bit stream Multiplexing module 505 is used for receiving are carried out multiplexing, form the code stream of audio coding.

After digital audio and video signals is imported psychoacoustic analysis module 501 and time-frequency mapping block 502 respectively, masking threshold and the letter that calculates this frame sound signal in psychoacoustic analysis module 501 covered ratio on the one hand, judge that this frame signal is to become type signal or gradual type signal soon, and letter covered to liken to control signal sends to quantize and entropy coding module 504; The sound signal of time domain is transformed into frequency coefficient by time-frequency mapping block 502 on the other hand.Above-mentioned frequency coefficient is sent in frequency-domain linear prediction and the vector quantization module 503, if the gain threshold of frequency coefficient satisfies given condition, then frequency coefficient is carried out linear prediction filtering, the predictive coefficient that obtains converts line spectrum pair coefficient of frequency LSF (Line SpectrumFrequency) to, adopt best distortion metrics criterion searching and computing to go out the codewords indexes of code books at different levels again, and codewords indexes is sent to bit stream Multiplexing module 505 as side information, the residual sequence that obtains through forecast analysis then outputs to and quantizes and entropy coding module 504.Above-mentioned residual sequence/frequency coefficient the letter of psychoacoustic analysis module 501 output cover than control under, quantize and entropy coding module 504 in quantize and entropy coding.Be input in the bit stream Multiplexing module 505 through data and side information behind the coding, strengthen the code stream of audio coding through multiplexing formation.

Each composition module to above-mentioned audio coding apparatus specifically explains below.

In the present invention, psychoacoustic analysis module 501 is mainly used in the masking threshold, perceptual entropy and the letter that calculate input audio signal and covers ratio, and the signal type of analyzing audio signal.The perceptual entropy that calculates according to psychoacoustic analysis module 51 is dynamically analyzed the current demand signal frame and is carried out the required bit number of transparent coding, thereby adjusts the Bit Allocation in Discrete of interframe.The letter of psychoacoustic analysis module 501 each subbands of output is covered than to quantizing and entropy coding module 504, and it is controlled.

Time-frequency mapping block 502 is used to realize the conversion of sound signal from the time-domain signal to the frequency coefficient, being made of bank of filters, specifically can be discrete Fourier transform (DFT) (DFT) bank of filters, discrete cosine transform (DCT) bank of filters, correction discrete cosine transform (MDCT) bank of filters, cosine modulation bank of filters, wavelet transform filter group etc.

The frequency coefficient of time-frequency mapping block 502 outputs is sent in frequency-domain linear prediction and the vector quantization module 503 and carries out linear prediction and vector quantization.Frequency-domain linear prediction and vector quantization module 503 are made of linear prediction analysis device, linear prediction filter, converter and vector quantizer.Frequency coefficient is input to and carries out forecast analysis in the linear prediction analysis device, obtains prediction gain and predictive coefficient, to satisfying the frequency coefficient of certain condition, outputs to and carries out filtering in the linear prediction filter, obtains residual sequence; Residual sequence directly outputs in quantification and the entropy coding module 504, and predictive coefficient converts line spectrum pair coefficient of frequency LSF to by converter, enter and carry out multi-stage vector quantization in the vector quantizer, the signal after the quantification is sent in the bit stream Multiplexing module 505.

Sound signal is carried out the frequency-domain linear prediction processing can be suppressed Pre echoes effectively and obtain bigger coding gain.Suppose real signal x (t), its square Hilbert envelope e (t) is expressed as:

e(t)＝F ^-1{∫C(ξ)·C ^*(ξ-f)dξ}，

Wherein C (f) is that promptly the Hilbert envelope of signal is relevant with the autocorrelation function of this signal spectrum corresponding to the monolateral spectrum of signal x (t) positive frequency composition.And the pass of the autocorrelation function of the power spectral density function of signal and its time domain waveform is: PSD (f)=F{ ∫ x (τ) x ^*(the d τ of τ-t) }.Therefore signal is duality relation each other at square Hilbert envelope of time domain and signal at the power spectral density function of frequency domain.As from the foregoing, part bandpass signal in each certain frequency scope, if it is constant that its Hilbert envelope keeps, the auto-correlation of adjacent spectral values also will keep constant so, this just means that the spectral coefficient sequence is the stable state sequence with regard to frequency, thereby can come the spectrum value is handled with predictive coding, represent this signal effectively with one group of public predictive coefficient.

Quantification and entropy coding module 504 have further comprised nonlinear quantizer group and scrambler, and wherein quantizer can be scalar quantizer or vector quantizer.Vector quantizer is further divided into memoryless vector quantizer and memory vector quantizer two big classes is arranged.For memoryless vector quantizer, each input vector is independently to quantize, and is irrelevant with each former vector; It is the vector before the consideration when quantizing a vector that the memory vector quantizer is arranged, and has promptly utilized the correlativity between the vector.Main memoryless vector quantizer comprises full search vector quantizer, tree search vector quantizer, multi-stage vector quantization device, gain/wave vector quantizer and separates the mean value vector quantizer; The main memory vector quantizer that has comprises predictive vector quantizer and finite state vector quantizer.

If the employing scalar quantizer, then the nonlinear quantizer group further comprises M subband quantizer.In each quantized subband device, mainly utilize scale factor to quantize, specifically: frequency coefficients all in M the scale factor band is carried out non-linear compression, utilize scale factor that the frequency coefficient of this subband is quantized again, the quantized spectrum that obtains integer representation outputs to scrambler, first scale factor in every frame signal is outputed to bit stream Multiplexing module 505 as the common scale factor, and other scale factor scale factor previous with it carries out outputing to scrambler after the difference processing.

Scale factor in the above-mentioned steps is the value that constantly changes, and can adjust according to the Bit Allocation in Discrete strategy.The invention provides a kind of Bit Allocation in Discrete strategy of overall perceptual distortion minimum, specific as follows:

At first, each quantized subband device of initialization, the quantized value of the spectral coefficient in all subbands is 0.The quantizing noise of each subband equals the energy value of each subband at this moment, and the masking by noise of each subband is covered than SMR than the letter that NMR equals it, and the bit number that quantizes to be consumed is 0, and remaining bits is counted B _lEqual target bit B.

Secondly, search the subband of masking by noise than NMR maximum, smaller or equal to 1, then scale factor is constant as if maximum noise masking ratio NMR, the output allocation result, and bit allocation procedures finishes; Otherwise, the scale factor of the quantized subband device of correspondence is reduced a unit, calculate the bit number Δ B of the required increase of this subband then _iQ _i).If the remaining bits of this subband is counted B _l〉=Δ B _i(Q _i), then confirm the modification of this scale factor, and remaining bits is counted B _lDeduct Δ B _i(Q _i), the masking by noise of recomputating this subband continues to search the subband of masking by noise than NMR maximum then than NMR, repeats subsequent step.If the remaining bits of this subband is counted B _l＜Δ B _i(Q _i), then cancellation is this time revised, and keeps last scale factor and remaining bits number, exports allocation result at last, and bit allocation procedures finishes.

If employing vector quantizer, then frequency coefficient is formed a plurality of M n dimensional vector ns and is input in the nonlinear quantizer group, all compose smooth for each M n dimensional vector n according to the smooth factor, promptly dwindle the dynamic range of spectrum, in code book, find code word with vector distance minimum to be quantified by vector quantizer according to subjective perception distance measure criterion then, the codewords indexes of correspondence is passed to scrambler.The smooth factor is to adjust according to the Bit Allocation in Discrete strategy of vector quantization, and the Bit Allocation in Discrete of vector quantization is then controlled according to perceptual important degree between different sub-band.

Through after the above-mentioned quantification treatment, utilize entropy coding further to remove the coefficient after the quantification and the statistical redundancy of side information.Entropy coding is a kind of source coding technique, and its basic thought is: give the code word of shorter length to the bigger symbol of probability of occurrence, and give long code word to the little symbol of probability of occurrence, the length of so average code word is the shortest.According to the noiseless coding theorem of Shannon, if N source symbol of message of transmission is independently, use suitable elongated degree coding so, the average length n of code word will satisfy

[\frac{H (x)}{\log_{2} (D)}] \leq \overset{&OverBar;}{n} < [\frac{H (x)}{\log_{2} (D)} + \frac{1}{N}],

Wherein H (x) represents the entropy of information source, and x represents symbolic variable.Because entropy H (x) is the shortest limit of average code length, above-mentioned formula shows that the average length of code word this moment approaches its lower limit entropy H (x) very much, and therefore this elongated degree coding techniques becomes " entropy coding " again.Entropy coding mainly contains methods such as Huffman coding, arithmetic coding or Run-Length Coding, and the entropy coding among the present invention all can adopt any of above-mentioned coding method.

The quantized spectrum and the scale factor after the difference processing that quantize back output through scalar quantizer are carried out entropy coding in scrambler, obtain code book sequence number, scale factor encoded radio and lossless coding quantized spectrum, again the code book sequence number is carried out entropy coding, obtain code book SEQ.XFER value, the lossless coding value with scale factor encoded radio, code book SEQ.XFER value and quantized spectrum outputs in the bit stream Multiplexing module 505 then.

Carry out the one or more dimensions entropy coding through the codewords indexes that obtains after the vector quantizer quantification in scrambler, obtain the encoded radio of codewords indexes, the encoded radio with codewords indexes outputs in the bit stream Multiplexing module 505 then.

Bit stream Multiplexing module 505 is received side information and quantification and the code stream that comprises the common scale factor, scale factor encoded radio, code book SEQ.XFER value and lossless coding quantized spectrum of entropy coding module 504 outputs or the encoded radio of codewords indexes of frequency-domain linear prediction and 503 outputs of vector quantization module, it is carried out multiplexing, obtain audio compressed data stream.

Coding method based on above-mentioned scrambler specifically comprises: calculate the masking threshold of input audio signal, and analyze the type of input signal; Input audio signal is carried out the time-frequency mapping, obtain the frequency coefficient of sound signal; Frequency coefficient is carried out the linear prediction analysis of standard, obtain prediction gain and predictive coefficient; Judge whether prediction gain surpasses preset threshold, if surpass, then according to predictive coefficient frequency coefficient is carried out the filtering of frequency-domain linear prediction error, obtains the prediction residual sequence of frequency coefficient; Predictive coefficient is changed into the line spectrum pair coefficient of frequency, and the line spectrum pair coefficient of frequency is carried out multi-stage vector quantization handle, obtain side information; Residual sequence is quantized and entropy coding; If prediction gain does not surpass preset threshold, then frequency coefficient is quantized and entropy coding; Carry out the sound signal behind side information and the coding multiplexing, obtain the compressed audio code stream.

The method of time-domain audio signal being carried out time-frequency conversion has a lot, as discrete Fourier transform (DFT) (DFT), discrete cosine transform (DCT), correction discrete cosine transform (MDCT), cosine modulation bank of filters, wavelet transformation etc.Below to revise the process that discrete cosine transform MDCT and cosine modulation are filtered into the mapping of example explanation time-frequency.

For adopting the situation that discrete cosine transform MDCT carries out time-frequency conversion of revising, at first choose the time-domain signal of a former frame M sample and a present frame M sample, time-domain signal to common 2M the sample of this two frame carries out the windowing operation again, then the signal after the process windowing is carried out the MDCT conversion, thereby obtain M frequency coefficient.

The impulse response of MDCT analysis filter is:

h_{k} (n) = w (n) \sqrt{\frac{2}{M}} \cos [\frac{(2 n + M + 1) (2 k + 1) π}{4 M}],

Then MDCT is transformed to:

X (k) = Σ_{n = 0}^{2 M - 1} x (n) h_{k} (n) - - 0 \leq k \leq M - 1,

Wherein: w (n) is a window function; X (n) is the input time-domain signal of MDCT conversion; X (k) is the output frequency-region signal of MDCT conversion.

For satisfying the condition of the complete reconstruct of signal, the window function w (n) of MDCT conversion must satisfy following two conditions:

W (2M-1-n)=w (n) and w ²(n)+w ²(n+M)=1.

In practice, can select for use the Sine window as window function.Certainly, also can revise above-mentioned restriction with specific analysis filter and composite filter by using biorthogonal conversion to window function.

Carry out the situation of time-frequency conversion for adopting cosine modulation filtering, then at first choose the time-domain signal of a former frame M sample and a present frame M sample, time-domain signal to common 2M the sample of this two frame carries out the windowing operation again, then the signal after the process windowing is carried out the cosine modulation conversion, thereby obtain M frequency coefficient.

The shock response of traditional cosine modulation filtering technique is

h_{k} (n) = 2 p_{a} (n) \cos (\frac{π}{M} (k + 0.5) (n - \frac{D}{2}) + θ_{k}),

n＝0，1，...，N _h-1

f_{k} (n) = 2 p_{s} (n) \cos (\frac{π}{M} (k + 0.5) (n - \frac{D}{2}) - θ_{k}),

n＝0，1，...，N _f-1

0≤k＜M-1 wherein, 0≤n＜2KM-1, K are the integer greater than zero,

θ_{k} = {(- 1)}^{k} \frac{π}{4} .

Suppose analysis window (analysis prototype filter) p of M subband cosine modulation bank of filters _a(n) shock response length is N _a, comprehensive window (comprehensive prototype filter) p _s(n) shock response length is N _sWhen analysis window and comprehensive window equate, i.e. p _a(n)=p _sAnd N (n), _a=N _s, be the orthogonal filter group by the represented cosine modulation bank of filters of top two formulas, this moment matrix H and F ([H] _{N, k}=h _k(n), [F] _{N, k}=f _k(n)) be orthogonal transform matrix.For obtaining the linear-phase filter group, further stipulate symmetry-windows p _a(2KM-1-n)=p _a(n).For guaranteeing the complete reconstruct of quadrature and biorthogonal system, window function also need meet some requirements, and sees document " Multirate Systems and Filter Banks " for details, P.P.Vaidynathan, Prentice Hall, Englewood Cliffs, NJ, 1993.

Masking threshold, the letter that calculates input audio signal cover than with carry out the signal type analysis and may further comprise the steps:

The first step, sound signal is carried out time domain to the mapping of frequency domain.Can adopt fast fourier transform and Hanning window (hanning window) technology, convert time domain data to frequency coefficient X[k].X[k] use amplitude r[k] and phase [k] be expressed as X[k]=r[k] e ^{J φ [k]}, the energy e[b of each subband so] be in this subband all spectral line energy and, promptly

e [b] = Σ_{k = k_{l}}^{k = k_{h}} r^{2} [k],

K wherein _lAnd k _hThe up-and-down boundary of representing subband b respectively.

Second goes on foot, determines tone and non-pitch composition in the sound signal.The tonality of signal is estimated by each spectral line is carried out inter prediction, the predicted value of each spectral line and the Euclidean distance of actual value are mapped as unpredictablely to be estimated, it is very strong that the spectrum composition of high predictability is considered to tonality, and the spectrum composition of low predictability is considered to noise like.

The amplitude r of predicted value _PredAnd phase _PredAvailable following formula is represented:

r _pred[k]＝r _t-1[k]+(r _t-1[k]-r _t-2[k])

φ _pred[k]＝φ _t-1[k]+(φ _t-1[k]-φ _t-2[k])，

Wherein, t represents the coefficient of present frame; T-1 represents the coefficient of former frame; T-2 represents the coefficient of front cross frame.

So, the unpredictable c[k of estimating] computing formula be:

c [k] = \frac{dist (X [k], X_{pred} [k])}{r [k] + | r_{pred} [k] |}

Wherein, and Euclidean distance dist (X[k], X _Pred[k]) adopt following formula to calculate:

dist(X[k]，X _pred[k])＝|X[k]-X _pred[k]|

＝((r[k]cos(φ[k])-r _pred[k]cos(φ _pred[k])) ²+(r[k]sin(φ[k])-r _pred[k]sin(φ _pred[k])) ²) ^。

Therefore, the unpredictable degree c[b of each subband] be the weighted sum of the energy of all spectral lines in this subband to its unpredictable degree, promptly

c [b] = Σ_{k = k_{l}}^{k = k_{h}} c [k] r^{2} [k] .

Sub belt energy e[b] and unpredictable degree c[b] carry out convolution algorithm with spread function respectively, obtain sub belt energy expansion e _sThe unpredictable degree expansion of [b] and subband c _s[b], mask i is expressed as s[i to the spread function of subband b, b].In order to eliminate the influence of spread function to the energy change of variable, need be to the unpredictable degree expansion of subband c _s[b] does normalized, and its normalized result uses

Be expressed as

{\tilde{c}}_{s} [b] = \frac{c_{s} [b]}{e_{s} [b]} .

Equally, for eliminating the influence of spread function, the expansion of definition normalized energy to sub belt energy For:

{\tilde{e}}_{s} [b] = \frac{e_{s} [b]}{n [b]},

Normalized factor n[b wherein] be:

n [b] = Σ_{i = 1}^{b_{\max}} s [i, b],

b _MaxThe sub band number of dividing for this frame signal.

According to the unpredictable degree expansion of normalization Can calculate the tonality t[b of subband]:

t [b] = - 0.299 - {0.43 \log}_{e} ({\tilde{c}}_{s} [b]),

And 0≤t[b]≤1.

As t[b]=1 the time, represent that this subband signal is a pure pitch; As t[b]=0 the time, represent that this subband signal is a white noise.

The 3rd goes on foot, calculates the required signal to noise ratio (S/N ratio) of each subband (Signal-to-Noise Ratio is called for short SNR).Masking by noise tone (Noise-Masking-Tone with all subbands, abbreviation NMT) value is made as 5dB, tone mask noise (Tone-Masking-Noise, abbreviation TMN) value is made as 18dB, if will make noise not perceived, then the required signal to noise ratio snr [b] of each subband is SNR[b]=18t[b]+6 (1-t[b]).

The 4th step, calculate the perceptual entropy of the masking threshold and the signal of each subband, and carry out the signal type analysis.The normalized signal energy of each subband that obtains according to abovementioned steps and required signal to noise ratio snr calculate the noise energy threshold value n[b of each subband] be

n [b] = {\tilde{e}}_{s} [b] 10^{- SNR [b] / 10} .

For fear of the influence of Pre echoes, with the noise energy threshold value n[b of present frame] with the noise energy threshold value n of former frame _Prev[b] compares, and the masking threshold that obtains signal is n[b]=min (n[b], 2n _Prev[b]), can guarantee that like this masking threshold can not have high-octane impact generation because of the proximal end at analysis window and deviation occur.

Further, consider static masking threshold qsthr[b] influence, the masking threshold of selecting final signal is the big person of numerical value in the masking threshold of static masking threshold and aforementioned calculation, i.e. n[b]=max (n[b], qsthr[b]).Adopt following formula to calculate perceptual entropy then, promptly

pe = - Σ_{b = 0}^{b \max} ({cbwidth}_{b} \times \log_{10} (n [b] / (e [b] + 1))),

Cbwidth wherein _bRepresent the spectral line number that each subband comprises.

Judge whether the perceptual entropy of a certain frame signal surpasses the thresholding PE_SWITCH of appointment, if surpass, then this frame signal is fast change type, otherwise is gradual type.As from the foregoing, perceptual entropy pe comes down to the logarithm weighted sum of each subband signal to noise ratio (S/N ratio), is also illustrated in ideally, realizes the needed least number of bits of unaware distortion, and it and signal type characteristic be directly contact not.But because at calculating noise energy threshold n[b] time carried out Pre echoes control, make the last noise energy threshold value of calculating also will be subjected to the control of the noise energy threshold value of former frame.When rinforzando signal occurring, along with the enhancing of signal energy, the noise energy threshold value is from n _Prev[b] is to n[b] trend that improves also appears, and Pre echoes control has limited the raising degree of noise energy threshold value, thus indirect raising perceptual entropy.In this sense, perceptual entropy can be used for carrying out the analysis of signal type.

The 5th step: the letter that calculates each subband signal was covered than (Signal-to-Mask Ratio is called for short SMR).The letter of each subband is covered than SMR[b] be

SMR [b] = {10 \log}_{10} (\frac{e [b]}{n [b]}) .

After having obtained frequency coefficient, frequency coefficient is carried out linear prediction and vector quantization.At first frequency coefficient is carried out the linear prediction analysis of standard, comprise and calculate autocorrelation matrix, recursion execution Levinson-Durbin algorithm acquisition prediction gain and predictive coefficient.Judge that whether the prediction gain that calculates surpasses pre-set threshold, if surpass, then carries out frequency-domain linear prediction error Filtering Processing according to predictive coefficient to frequency coefficient; Otherwise frequency coefficient is not dealt with, carry out next step, frequency coefficient is quantized and entropy coding.

Linear prediction can be divided into two kinds of forward prediction and back forecasts, and forward prediction is meant the value prediction currency that utilizes before a certain moment, then is meant value prediction currency after utilizing a certain moment to prediction.Be predicted as the filtering of example explanation linear prediction error with forward direction below, the transport function of linear prediction error wave filter is

A (z) = 1 - Σ_{i = 1}^{p} a_{i} z^{- i},

A wherein _iThe expression predictive coefficient, p is a prediction order.Frequency coefficient X (k) after elapsed time-frequency transformation obtains predicated error E (k) through after the filtering, also claims residual sequence, satisfies relation between the two

E (k) = X (k) \cdot A (z) = X (k) - Σ_{i = 1}^{p} a_{i} X (k - i) .

Like this, through frequency-domain linear prediction filtering, the frequency coefficient X (k) of T/F conversion output just can use residual sequence E (k) and one group of predictive coefficient a _iExpression.Then with this group predictive coefficient a _iConvert line spectrum pair coefficient of frequency LSF to, and it is carried out multi-stage vector quantization, and vector quantization is selected best distortion metrics criterion (as the arest neighbors criterion), and searching and computing goes out the codewords indexes of code books at different levels, with this code word that can determine the predictive coefficient correspondence, codewords indexes is exported as side information.Simultaneously, residual sequence E (k) is quantized and entropy coding.By the linear prediction analysis coding principle as can be known, the dynamic range of the residual sequence of spectral coefficient is less than the dynamic range of original spectrum coefficient, therefore when quantizing, less bit number can be distributed, perhaps, improved coding gain can be obtained for the condition of same number of bits.

The letter that has obtained subband signal cover than after, cover comparison frequency coefficient or residual sequence quantizes and entropy coding according to letter, wherein quantizing can be scalar quantization or vector quantization.

Scalar quantization may further comprise the steps: the frequency coefficient in all scale factor bands is carried out non-linear companding; The scale factor of utilizing each subband again quantizes the frequency coefficient of this subband, obtains the quantized spectrum of integer representation; Select first scale factor in every frame signal as the common scale factor; Other scale factor scale factor previous with it carried out difference processing.

Vector quantization may further comprise the steps: frequency coefficient is constituted a plurality of multidimensional vector signals; All compose smooth for each M n dimensional vector n according to the smooth factor; In code book, search code word with vector distance minimum to be quantified according to subjective perception distance measure criterion, obtain its codewords indexes.

The entropy coding step comprises: the scale factor after quantized spectrum and the difference processing is carried out entropy coding, obtain code book sequence number, scale factor encoded radio and lossless coding quantized spectrum; The code book sequence number is carried out entropy coding, obtain code book SEQ.XFER value.

Or: codewords indexes is carried out the one or more dimensions entropy coding, obtain the encoded radio of codewords indexes.

Above-mentioned entropy coding method can adopt any in the methods such as existing Huffman coding, arithmetic coding or Run-Length Coding.

Through quantification and the entropy coding processing after, obtain audio coding signal, this signal is carried out multiplexing with the common scale factor, side information, obtain the compressed audio code stream.

Fig. 6 is the structural representation of audio decoding apparatus of the present invention.Audio decoding apparatus comprises bit stream demultiplexing module 601, entropy decoder module 602, inverse quantizer group 603, contrary frequency-domain linear prediction and vector quantization module 604 and frequency-time map module 605.Behind the demultiplexing of audio compressed data stream through bit stream demultiplexing module 601, obtain corresponding data-signal and control signal, output to entropy decoder module 602 and contrary frequency-domain linear prediction and vector quantization module 604.Data-signal and control signal are carried out decoding processing in entropy decoder module 602, recover the quantized value of spectrum.Above-mentioned quantized value is rebuild in inverse quantizer group 603, obtain the spectrum behind the re-quantization, the re-quantization spectrum outputs in contrary frequency-domain linear prediction and the vector quantization module 604, carry out re-quantization processing and contrary linear prediction filtering, spectrum before obtaining predicting, and output in frequency-time map module 605, spectral coefficient obtains the sound signal of time domain after overfrequency-time map is handled.

601 pairs of audio compressed data streams of bit stream demultiplexing module decompose, and obtain corresponding data-signal and control signal, for other modules provide corresponding decoded information.Audio compressed data stream is through behind the demultiplexing, and the signal that outputs to entropy decoder module 602 comprises the common scale factor, scale factor encoded radio, code book SEQ.XFER value and lossless coding quantized spectrum, or the encoded radio of codewords indexes; What output to contrary linear prediction and vector quantization module 604 is contrary frequency-domain linear prediction vector quantization control information.

In code device, if adopt scalar quantizer in quantification and the entropy coding module 504, then in decoding device, what entropy decoder module 602 was received is the lossless coding value of the common scale factor, scale factor encoded radio, code book SEQ.XFER value and the quantized spectrum of 601 outputs of bit stream demultiplexing module, then it is carried out the decoding of code book sequence number, spectral coefficient decoding and scale factor decoding, reconstruct quantized spectrum, and to the integer representation of inverse quantizer group 603 output scale factors and the quantized value of spectrum.The coding/decoding method that entropy decoder module 602 adopts is corresponding with the coding method of entropy coding in the code device, as Huffman decoding, arithmetic decoding or runs decoding etc.

After inverse quantizer group 603 receives the integer representation of the quantized value of spectrum and scale factor,, and export re-quantizations to contrary frequency-domain linear prediction and vector quantization module 604 and compose the integer quantisation value re-quantization of spectrum reconstruction spectrum value (re-quantization spectrum) for no convergent-divergent.Inverse quantizer group 603 can be the uniform quantizer group, also can be the non-uniform quantizing device group that realizes by the companding function.In code device, what the quantizer group adopted is scalar quantizer, and then in decoding device, inverse quantizer group 603 also adopts the scalar inverse quantizer.In the scalar inverse quantizer, at first the quantized value to spectrum carries out non-linear expansion, utilizes each scale factor to obtain spectral coefficient all in the corresponding scale factor band (re-quantization spectrum) then.

If adopt vector quantizer in quantification and the entropy coding module 504, then in decoding device, entropy decoder module 602 is received the encoded radio of the codewords indexes of bit stream demultiplexing module 601 outputs, the corresponding entropy decoding method of entropy coding method when adopting the encoded radio of codewords indexes with coding is decoded, and obtains the codewords indexes of correspondence.

Codewords indexes outputs in the inverse quantizer group 603, by the inquiry code book, obtains quantized value (re-quantization spectrum), outputs to frequency-time map module 605.Inverse quantizer group 603 adopts the inverse vector quantizer.

In scrambler, adopt the frequency-domain linear prediction vector quantization technology to suppress Pre echoes, and obtain bigger coding gain.Therefore in demoder, contrary frequency-domain linear prediction of re-quantization spectrum and bit and vector quantization module 604 comprise inverse vector quantizer, inverse converter and against linear prediction filter, and wherein the inverse vector quantizer is used for that codewords indexes is carried out re-quantization and obtains line spectrum pair coefficient of frequency LSF; Inverse converter then is used for line spectrum pair coefficient of frequency LSF reverse is changed to predictive coefficient; Contrary linear prediction filter is used for according to predictive coefficient the re-quantization spectrum being carried out the linear prediction building-up process, the spectrum before obtaining predicting, and output to frequency-time map module 605.

Spectrum before re-quantization spectrum or the prediction is handled by the mapping of frequency-time map module 605, can obtain the time-domain audio signal of low-frequency range.Frequency-time map module 605 can adopt inverse discrete cosine transform (IDCT) bank of filters, contrary discrete Fourier transform (DFT) (IDFT) bank of filters, contrary discrete cosine transform (IMDCT) bank of filters, inverse wavelet transform filters group and the cosine modulation bank of filters revised.

Coding/decoding method based on above-mentioned demoder comprises: the compressed audio code stream is carried out demultiplexing, obtain data message and control information; Above-mentioned information is carried out entropy decoding, the quantized value that obtains composing; Quantized value to spectrum carries out the re-quantization processing, obtains the re-quantization spectrum; Judge that whether comprising the re-quantization spectrum in the control information needs through the information against the frequency-domain linear prediction vector quantization, if contain, then carry out the inverse vector quantification treatment, obtain predictive coefficient, and re-quantization spectrum is carried out contrary linear prediction filtering, the spectrum before obtaining predicting according to predictive coefficient; Spectrum before the prediction is carried out frequency-time map, obtain the time-domain audio signal of low-frequency band; Do not need through information if comprise the re-quantization spectrum in the control information, then the re-quantization spectrum is carried out frequency-time map, obtain the time-domain audio signal of low-frequency band against the frequency-domain linear prediction vector quantization.

If comprise code book SEQ.XFER value, the common scale factor, scale factor encoded radio and lossless coding quantized spectrum in the information behind the demultiplexing, show that then spectral coefficient is to adopt the scalar quantization technology to quantize in code device, then the step of entropy decoding comprises: code book SEQ.XFER value is decoded, obtain the code book sequence number of all scale factor bands; According to the code book of code book sequence number correspondence, the quantization parameter of all scale factor bands of decoding; The decode scale factor of all scale factor bands is rebuild quantized spectrum.Entropy coding method in the corresponding coding method of the entropy decoding method that said process adopted is as runs decoding method, Huffman coding/decoding method, arithmetic decoding method etc.

Be example to adopt runs decoding method decoding code book sequence number, to adopt Huffman coding/decoding method decoding quantization parameter and adopt Huffman coding/decoding method decoding scale factor below, the process of entropy decoding is described.

At first obtain the sign indicating number book number of all scale factor bands by runs decoding method, decoded sign indicating number book number is the integer in a certain interval, as suppose that this interval is [0,11], so only be positioned at this effective range, promptly the code book sequence number between 0 to 11 is just with corresponding spectral coefficient Huffman code book corresponding.For complete zero subband, can select a certain code book sequence number correspondence, typical optional 0 sequence number.

Obtain the sign indicating number book number of each scale factor band when decoding after, use the spectral coefficient Huffman code book corresponding, the quantization parameter of all scale factor bands is decoded with this yard book number.If the sign indicating number book number of a scale factor band is in effective range, present embodiment is as between 1 to 11, corresponding spectral coefficient code book of this yard book number so, the codewords indexes of the quantization parameter that then uses this code book to decode from quantized spectrum to obtain the scale factor band unpacks from codewords indexes then and obtains quantization parameter.If the sign indicating number book number of scale factor band not between 1 to 11, the not corresponding any spectral coefficient code book of this yard book number so, the quantization parameter of this scale factor band also just need not be decoded, directly the quantization parameter with this subband all is changed to zero.

Scale factor is used for reconstruct spectrum value on re-quantization spectral coefficient basis.If the sign indicating number book number of scale factor band is in the effective range, all corresponding scale factor of each yard book number then.When above-mentioned scale factor is decoded, at first read the shared code stream of first scale factor, then other scale factor is carried out the Huffman decoding, obtain the difference between each scale factor and the last scale factor successively, with this difference and the addition of last scale factor value, obtain each scale factor.If the quantization parameter of current sub all is zero, the scale factor of this subband does not need decoding so.

Through behind the above-mentioned entropy decode procedure, the quantized value that obtains composing and the integer representation of scale factor, the quantized value to spectrum carries out the re-quantization processing then, obtains the re-quantization spectrum.Re-quantization is handled and comprised: the quantized value to spectrum carries out non-linear expansion; Obtain all spectral coefficients (re-quantization spectrum) in the corresponding scale factor band according to each scale factor.

If comprise the encoded radio of codewords indexes in the information behind the demultiplexing, then show and adopt vector quantization technology that spectral coefficient is quantized in the code device, then the step of entropy decoding comprises: adopt the entropy decoding method corresponding with entropy coding method in the code device that the encoded radio of codewords indexes is decoded, obtain codewords indexes.Then codewords indexes is carried out re-quantization and handle, obtain the re-quantization spectrum.

The re-quantization spectrum is carried out contrary frequency-domain linear prediction vector quantization.At first judge according to control information whether this frame signal passes through the frequency-domain linear prediction vector quantization, if then from control information, obtain the codewords indexes behind the predictive coefficient vector quantization; The line spectrum pair coefficient of frequency LSF that obtains quantizing according to codewords indexes again, and calculate predictive coefficient with this; It is synthetic then the re-quantization spectrum to be carried out linear prediction, the spectrum before obtaining predicting.

The transport function A (z) that the linear prediction error Filtering Processing is adopted is:

A (z) = 1 - Σ_{i = 1}^{p} a_{i} z^{- i}

Wherein: a _iIt is predictive coefficient; P is a prediction order.Therefore residual sequence E (k) satisfies with predicting preceding spectrum X (k):

X (k) = E (k) \cdot \frac{1}{A (z)} = E (k) + Σ_{i = 1}^{p} a_{i} X (k - i) .

Like this, residual sequence E (k) and the predictive coefficient a that calculates _iThrough the frequency-domain linear prediction building-up process, the spectrum X (k) before just can obtaining predicting carries out frequency-time map with the spectrum X (k) before the prediction and handles.

If control information shows this signal frame and does not pass through the frequency-domain linear prediction vector quantization, then do not carry out contrary frequency-domain linear prediction vector quantization and handle, the re-quantization spectrum is directly carried out frequency-time map handle.

To re-quantization spectrum carry out in method that frequency-time map handles and the coding method the time-frequency mapping treatment method is corresponding, can adopt inverse discrete cosine transform (IDCT), contrary discrete Fourier transform (DFT) (IDFT), finish against methods such as correction discrete cosine transform (IMDCT), inverse wavelet transforms.

Be example explanation frequency-time map process with the contrary discrete cosine transform IMDCT that revises below.Frequency-time map process comprises three steps: IMDCT conversion, time-domain windowed are handled and time domain stack computing.

At first spectrum before predicting or re-quantization spectrum are carried out the IMDCT conversion, obtain the time-domain signal x after the conversion _{I, n}The expression formula of IMDCT conversion is:

x_{i, n} = \frac{2}{N} Σ_{k = 0}^{\frac{N}{2} - 1} spec [i] [k] \cos (\frac{2 π}{N} (n + n_{0}) (k + \frac{1}{2})),

Wherein, n represents the sample sequence number, and 0≤n＜N, and N represents the time domain samples number, and value is 2048, n ₀=(N/2+1)/2; I represents frame number; K represents the spectral coefficient sequence number.

Secondly, the time-domain signal that the IMDCT conversion is obtained carries out windowing process in time domain.For satisfying complete reconstruction condition, window function w (n) must satisfy following two condition: w (2M-1-n)=w (n) and w ²(n)+w ²(n+M)=1.

Typical window function has Sine window, Kaiser-Bessel window etc.The present invention adopts a kind of fixing window function, and its window function is: w (N+k)=cos (pi/2* ((k+0.5)/N-0.94*sin (2*pi/N* (k+0.5))/(2*pi))), wherein k=0...N-1; K coefficient of w (k) expression window function has w (k)=w (2*N-1-k); The sample number of N presentation code frame, value are N=1024.Can utilize biorthogonal conversion in addition, adopt specific analysis filter and composite filter to revise above-mentioned restriction window function.

At last, above-mentioned windowing time-domain signal is carried out overlap-add procedure, obtain time-domain audio signal.Specifically: preceding N/2 sample of the signal that windowing operation back is obtained and back N/2 sample overlap-add of former frame signal obtain N/2 the time-domain audio sample of exporting, i.e. timeSam _{I, n}=preSam _{I, n}+ preSam _{I-1, n+N/2}, wherein i represents frame number, n represents the sample sequence number, has

0 \leq n \leq \frac{N}{2},

And the value of N is 2048.

Audio compressed data stream obtains the time-domain audio signal of low-frequency band through after the processing of above-mentioned steps.

Fig. 7 has provided the structural representation of an embodiment of code device of the present invention.This embodiment has increased multiresolution analysis module 506 between the input of the output of frequency-domain linear prediction and vector quantization module 503 and quantification and entropy coding module 504 on the basis of Fig. 5.

For fast change type signal, for effectively overcoming the Pre echoes phenomenon that produces in the cataloged procedure, improve coding quality, code device of the present invention improves the temporal resolution of coding fast changed signal by multiresolution analysis module 506.The residual sequence or the frequency coefficient of frequency-domain linear prediction and 503 outputs of vector quantization module are input in the multiresolution analysis module 506, if become type signal soon, then carry out the short correction of frequency-domain small wave conversion or frequency domain discrete cosine transform (MDCT), obtain the multi-resolution representation of frequency coefficient/residual error coefficient, output in quantification and the entropy coding module 504.If gradual type signal is not then handled signal, directly output to quantification and entropy coding module 504.

The frequency domain data of 59 pairs of inputs of multiresolution analysis module carries out the reorganization of time and frequency zone, the temporal resolution that cost improves frequency domain data that is reduced to frequency accuracy, thereby automatically adapt to the fast time-frequency characteristic that becomes type signal, reach the effect that suppresses Pre echoes.The form of these time-frequency mapping block 502 median filter groups can need not to adjust at any time.Multiresolution analysis module 506 comprises frequency coefficient conversion module and recombination module, and wherein the frequency coefficient conversion module is used for frequency coefficient is transformed to the time-frequency plane coefficient; Recombination module is used for the time-frequency plane coefficient is recombinated according to certain rule.The frequency coefficient conversion module can adopt frequency-domain small wave transformed filter group, frequency domain MDCT transformed filter group etc.

Be transformed to example with frequency-domain small wave conversion and frequency domain MDCT below, the course of work of multiresolution analysis module 506 is described.

1) frequency-domain small wave conversion

Suppose time series x (i), i=0,1 ..., 2M-1, the frequency coefficient that obtains through time-frequency mapping back be for being X (k), k=0,1 ..., M-1.The wavelet basis of frequency-domain small wave or wavelet package transforms can be fixed, and also can be adaptive.

Be example with the simplest Harr wavelet basis wavelet transformation below, illustrate frequency coefficient is carried out multiresolution analysis

The specific implementation method.

The scale coefficient of Harr wavelet basis is Wavelet coefficient is Fig. 8 shows the filter structure synoptic diagram that employing Harr wavelet basis carries out wavelet transformation, wherein H ₀(filter factor is the expression low-pass filtering

), H ₁(filter factor is in the expression high-pass filtering ), the down-sampling operation that " ↓ 2 " expression is 2 times.Medium and low frequency part X for the MDCT coefficient ₁(k), k=0 ..., k ₁Do not carry out wavelet transformation, the HFS of MDCT coefficient is carried out the Harr wavelet transformation, obtain the coefficient X in different T/F intervals ₂(k), X ₃(k), X ₄(k), X ₅(k), X ₆(k) and X ₇(k), time corresponding-frequency plane is divided as shown in Figure 9.Select different wavelet basiss, can select for use different wavelet transformation structures to handle, can obtain other similar T/F planes and divide.Therefore can be as required, the time-frequency plane when adjusting signal analysis is arbitrarily divided, and satisfies the analysis requirement of different time and frequency resolution.

Above-mentioned time-frequency plane coefficient is recombinated according to certain rule in recombination module, for example: can earlier the time-frequency plane coefficient be organized in frequency direction, coefficient in each frequency band is organized at time orientation, then with the coefficient the organized series arrangement according to sub-window, scale factor band.

2) frequency domain MDCT conversion

If the frequency domain data of input frequency domain MDCT transformed filter group is X (k), k=0,1 ..., N-1 carries out the MDCT conversion that M is ordered to this N point frequency domain data successively, makes time-frequency domain data frequency precision descend to some extent, and time precision has then correspondingly improved.In different frequency domain scopes, use the frequency domain MDCT conversion of different length, can obtain when different-frequency plane is divided when different, the frequency precision.

Coding method based on code device shown in Figure 7, basic procedure is identical with the coding method based on code device shown in Figure 5, difference is to have increased following step: residual sequence/frequency coefficient is quantized with entropy coding before, if become type signal soon, then residual sequence/frequency coefficient is carried out multiresolution analysis; If not fast change type signal, then directly residual sequence/frequency coefficient is quantized and entropy coding.

Multiresolution analysis can adopt frequency-domain small wave converter technique or frequency domain MDCT converter technique.The frequency-domain small wave analytic approach comprises: frequency coefficient is carried out wavelet transformation, obtain the time-frequency plane coefficient; Above-mentioned time-frequency plane coefficient is recombinated according to certain rule.And the MDCT transformation law comprises: after frequency coefficient is carried out the MDCT conversion, obtain the time-frequency plane coefficient; Above-mentioned time-frequency plane coefficient is recombinated according to certain rule.The method of reorganization can comprise: earlier the time-frequency plane coefficient is organized in frequency direction, the coefficient in each frequency band is organized at time orientation, then with the coefficient the organized series arrangement according to sub-window, scale factor band.

Figure 10 is the structural representation of the embodiment one of decoding device of the present invention.This decoding device increases the comprehensive module 606 of multiresolution on the basis of decoding device shown in Figure 6.The comprehensive module 606 of multiresolution is between the input of the output of inverse quantizer group 603 and contrary frequency-domain linear prediction and vector quantization module 604, and it is comprehensive to be used for that the re-quantization spectrum is carried out multiresolution.

In scrambler, fast change type signal has been adopted the temporal resolution of multiresolution filtering technique with the frequency domain data that improves fast change type signal.Correspondingly, in demoder, need to adopt 606 pairs of fast frequency coefficients that become before type signal recovers multiresolution analysis of the comprehensive module of multiresolution.The comprehensive module 606 of multiresolution comprises: coefficient recombination module and transformation of coefficient module, wherein the transformation of coefficient module can adopt frequency domain inverse wavelet transform filters group or frequency domain IMDCT transformed filter group.

Coding/decoding method based on as shown in figure 10 decoding device, basic procedure is identical with coding/decoding method based on decoding device shown in Figure 6, difference is to have increased following step: after having obtained the re-quantization spectrum, it is comprehensive that re-quantization spectrum is carried out multiresolution, judges whether that again re-quantization spectrum after need be to multiresolution comprehensive carries out contrary frequency-domain linear prediction vector quantization and handle.

Be transformed to example with the short IMDCT of frequency domain below and illustrate that multiresolution is comprehensive, specifically comprise: the re-quantization spectral coefficient is recombinated; Each coefficient is carried out a plurality of IMDCT conversion, obtain the preceding re-quantization spectrum of multiresolution analysis.Describe this process in detail with 128 IMDCT conversion (8 inputs, 16 outputs) below.At first, with the series arrangement of re-quantization spectral coefficient according to sub-window, scale factor band; Recombinate according to the frequency preface, 128 of each sub-window coefficients are organized in together by the frequency preface so again.Then, will press the frequency direction tissue for per 8 one group by the coefficient that sub-window is arranged, every group of 8 coefficients are arranged chronologically, have 128 groups of coefficients in frequency direction like this.Every group of coefficient carried out 16 IMDCT conversion, and 16 coefficient overlap-adds exporting after every group of IMDCT conversion obtain 8 frequency domain datas.Carry out similar operation by low frequency 128 times to high frequency direction successively, obtain 1024 frequency coefficients.

Figure 11 is the synoptic diagram of second embodiment of code device of the present invention.This embodiment is on the basis of Fig. 5, increased and difference stereo (M/S) coding module 507, between the input of the output of frequency-domain linear prediction and vector quantization module 503 and quantification and entropy coding module 505, psychoacoustic analysis module 501 will output to the masking threshold of difference sound channel and quantize and entropy coding module 505.For multi-channel signal, the masking threshold of sound channel also will be calculated and differ to psychoacoustic analysis module 501 except calculating the monaural masking threshold of sound signal.And difference stereo coding module 507 can also be between the quantizer group and scrambler in quantification and the entropy coding module 505.

And difference stereo coding module 507 is correlativitys of utilizing between two sound channels of sound channel centering, frequency coefficient/the residual sequence of left and right acoustic channels is converted to and differs from the frequency coefficient/residual sequence of sound channel, to reach the purpose that improves code efficiency and stereo sound image effect, therefore only be applicable to that the sound channel of signal type unanimity is to signal.If the inconsistent sound channel of monophonic signal or signal type to signal, does not then carry out and differs from stereo coding handling.

Based on the coding method of code device shown in Figure 11 with basic identical based on the coding method of code device shown in Figure 5, difference is to have increased following step: before residual sequence/frequency coefficient being quantized handle with entropy coding, judge whether sound signal is multi-channel signal, if multi-channel signal, then judge a left side, whether the signal type of right-channel signals is consistent, if signal type unanimity, then judge whether satisfy between the scale factor band of two sound channel correspondences and difference stereo coding condition, if satisfy, then residual sequence/frequency coefficient is carried out and differ from stereo coding, obtain and differ from the residual sequence/frequency coefficient of sound channel; If do not satisfy, then do not carry out and differ from stereo coding; If the inconsistent multi-channel signal of monophonic signal or signal type is not then handled frequency coefficient.

With the difference stereo coding except being applied in before the quantification treatment, can also be applied in after the quantification, before the entropy coding, that is: after residual sequence/frequency coefficient is quantized, judge whether sound signal is multi-channel signal, if multi-channel signal, then judge a left side, whether the signal type of right-channel signals is consistent, if signal type unanimity, then judge whether satisfy between the scale factor band of two sound channel correspondences and difference stereo coding condition, if satisfy, then the quantized spectrum in this scale factor band of two sound channels is carried out and differ from stereo coding, obtain and differ from the quantized spectrum of sound channel; If do not satisfy, then do not carry out and differ from stereo coding; If the inconsistent multi-channel signal of monophonic signal or signal type is not then handled frequency coefficient.

It is a lot of to judge whether the scale factor band can carry out and differ from the method for stereo coding, and the determination methods that the present invention adopts is: by Karhunen-Loeve transformation.Concrete deterministic process is as follows:

If the spectral coefficient of L channel scale factor band is l (k), the spectral coefficient of the corresponding scale factor band of R channel is r (k), and its correlation matrix C is

C = (\begin{matrix} C_{ll} & C_{lr} \\ C_{lr} & C_{rr} \end{matrix}),

Wherein,

C_{ll} = \frac{1}{N} Σ_{k = 0}^{N - 1} l (k} * l (k);

C_{lr} = \frac{1}{N} Σ_{k = 0}^{N - 1} l (k} * r (k);

C_{rr} = \frac{1}{N} Σ_{k = 0}^{N - 1} r (k} * r (k);

N is the spectral line number of scale factor band.

Correlation matrix C is carried out Karhunen-Loeve transformation, obtain

{RCR}^{T} = Λ = (\begin{matrix} λ_{ii} & 0 \\ 0 & λ_{ee} \end{matrix}),

Wherein,

R = (\begin{matrix} \cos α & - \sin α \\ \sin α & \cos α \end{matrix})

a &Element; [- \frac{π}{2}, \frac{π}{2}] .

Anglec of rotation a satisfies

\tan (2 a) = \frac{{2 C}_{lr}}{C_{ll} - C_{rr}},

When a=± π/4, be exactly and differ from stereo tupe.Therefore depart from π/4 hour when the absolute value of anglec of rotation a, such as 3 π/16＜| a|＜5 π/16, corresponding scale factor band can carry out and differ from stereo coding.

And if the difference stereo coding is applied in before the quantification treatment, then with left and right acoustic channels the residual sequence/frequency coefficient of scale factor band by linear transformation with and residual sequence/frequency coefficient of difference sound channel replace:

[\begin{matrix} M \\ S \end{matrix}] = \frac{1}{2} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} L \\ R \end{matrix}],

Wherein, M represents and sound channel residual sequence/frequency coefficient; S represents difference sound channel residual sequence/frequency coefficient; L represents L channel residual sequence/frequency coefficient; R is expressed as R channel residual sequence/frequency coefficient.

And if the difference stereo coding is applied in after the quantification, then residual sequence/the frequency coefficient of left and right acoustic channels after the quantification of scale factor band by linear transformation with and the residual sequence/frequency coefficient of difference sound channel replace:

[\begin{matrix} \hat{M} \\ \hat{S} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{L} \\ \hat{R} \end{matrix}],

Wherein: After expression quantizes with sound channel residual sequence/frequency coefficient;

Poor sound channel residual sequence/frequency coefficient after expression quantizes; L channel residual sequence/frequency coefficient after expression quantizes; R channel residual sequence/frequency coefficient after expression quantizes.

To be placed on after the quantification treatment with the difference stereo coding, not only can effectively remove the relevant of left and right acoustic channels, and, therefore can reach lossless coding owing to after quantification, carry out.

Figure 12 is the synoptic diagram of the embodiment two of decoding device of the present invention.This decoding device is on the basis of decoding device shown in Figure 6, increased and difference stereo decoding module 607, in the output of inverse quantizer group 603 and against between the input of frequency-domain linear prediction and vector quantization module 604, and the signal type analysis result that receives 601 outputs of bit stream demultiplexing module with and the stereo control signal of difference, be used for will converting the re-quantization spectrum of left and right acoustic channels with the re-quantization spectrum of difference sound channel to according to above-mentioned control information.

With the difference stereo control signal in, there is a zone bit to be used to show that whether current sound channel to needing and differing from stereo decoding, if need, then on each scale factor band, also there is a zone bit to show whether corresponding scale factor band needs and differ from stereo decoding, with difference stereo decoding module 607 zone bits, determine whether and to carry out and differ from stereo decoding the re-quantization spectrum according to the scale factor band.If in code device, carried out and the difference stereo coding, then in decoding device, must carry out and differ from stereo decoding operating to the re-quantization spectrum.

And difference stereo decoding module 607 can also be between the input of the output of entropy decoder module 602 and inverse quantizer group 603, and receive the output of bit stream demultiplexing module with differ from stereo control signal and signal type analysis result.

Identical based on the coding/decoding method of decoding device shown in Figure 12 coding/decoding method basic and based on decoding device shown in Figure 6, difference is to have increased following step: after obtaining the re-quantization spectrum, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out re-quantization spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the re-quantization with the difference sound channel in this scale factor band is composed the re-quantization spectrum that converts left and right acoustic channels to, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the re-quantization spectrum is not handled, directly carry out subsequent treatment.

With the difference stereo decoding can also be after the entropy decoding processing, re-quantization carries out before handling, that is: behind the quantized value that obtains composing, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out the quantized value of spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the quantized value with the spectrum difference sound channel in this scale factor band is converted to the quantized value of the spectrum of left and right acoustic channels, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the quantized value of spectrum is not handled, directly carry out subsequent treatment.And if the difference stereo decoding is after entropy decoding, before the re-quantization, then left and right acoustic channels column operations under the frequency coefficient of scale factor band adopts obtains by the frequency coefficient with the difference sound channel:

[\begin{matrix} \hat{l} \\ \hat{r} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{m} \\ \hat{s} \end{matrix}],

Wherein:

After expression quantizes with the sound channel frequency coefficient; Poor sound channel frequency coefficient after  represents to quantize; L channel frequency coefficient after expression quantizes; R channel frequency coefficient after expression quantizes.

And if the difference stereo decoding after re-quantization, then the frequency coefficient of left and right acoustic channels behind the re-quantization of subband according to following matrix operation by with the difference sound channel frequency coefficient obtain:

[\begin{matrix} l \\ r \end{matrix}] = [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} m \\ s \end{matrix}],

Wherein: m represents and the sound channel frequency coefficient; S represents difference sound channel frequency coefficient; L represents the L channel frequency coefficient; R represents the R channel frequency coefficient.

Figure 13 has provided the structural representation of the 3rd embodiment of code device of the present invention.This embodiment is on the basis of code device shown in Figure 7, increased and difference stereo coding module 507, between the input of the output of multiresolution analysis module 506 and quantification and entropy coding module 504 or between the quantizer group and scrambler in quantification and entropy coding module 54.In the present embodiment and function and the same Figure 11 of principle of work of difference stereo coding module 507, repeat no more herein.

Based on the coding method of code device shown in Figure 13 with basic identical based on the coding method of code device shown in Figure 7, difference is to have increased following step: after residual sequence/frequency coefficient is carried out multiresolution analysis, judge whether sound signal is multi-channel signal, if multi-channel signal, then judge a left side, whether the signal type of right-channel signals is consistent, if signal type unanimity, judge then whether the scale factor band satisfies encoding condition, if satisfy, then residual sequence/frequency coefficient is carried out and differ from stereo coding, obtain and differ from the residual sequence/frequency coefficient of sound channel; If do not satisfy, then do not carry out and differ from stereo coding; If the inconsistent multi-channel signal of monophonic signal or signal type is not then handled frequency coefficient.Concrete flow process is by the agency of in the above, repeats no more herein.

Figure 14 has provided the structural representation of the embodiment three of decoding device of the present invention.This decoding device is on the basis of decoding device shown in Figure 10, increased and difference stereo decoding module 607, between the input of the output of inverse quantizer group 603 and multiresolution comprehensive 606 or between the input of the output of entropy decoder module 602 and inverse quantizer group 603.With function and the same Figure 12 of principle of work of differing from stereo decoding module 607, repeat no more in the present embodiment herein.

Coding/decoding method based on decoding device shown in Figure 14, basic identical with coding/decoding method based on decoding device shown in Figure 10, difference is to have increased following step: after obtaining the re-quantization spectrum, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out re-quantization spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the re-quantization with the difference sound channel in this scale factor band is composed the re-quantization spectrum that converts left and right acoustic channels to, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the re-quantization spectrum is not handled, directly carry out subsequent treatment.Concrete flow process is by the agency of in the above, repeats no more herein.

It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement technical scheme of the present invention, and not breaking away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims

1, a kind of enhancing audio coding apparatus comprises psychoacoustic analysis module, time-frequency mapping block, quantification and entropy coding module and bit stream Multiplexing module; It is characterized in that, also comprise frequency-domain linear prediction and vector quantization module;

Described psychoacoustic analysis module is used to calculate the masking threshold and the letter of input audio signal and covers ratio, and the type of judging signal, and outputs to described quantification and entropy coding module;

Described time-frequency mapping block is used for the time-domain audio signal of input is transformed into frequency coefficient;

Described frequency-domain linear prediction and vector quantization module are used for frequency coefficient is carried out linear prediction and multi-stage vector quantization, and the output residual sequence is exported side information to described bit stream Multiplexing module simultaneously to described quantification and entropy coding module;

Described quantification and entropy coding module be used for the letter of described psychoacoustic analysis module output cover than control under frequency coefficient/residual sequence is quantized and entropy coding, and output to described bit stream Multiplexing module;

The data that described bit stream Multiplexing module is used for receiving are carried out multiplexing, form the code stream of audio coding.

2, enhancing audio coding apparatus according to claim 1 is characterized in that, described frequency-domain linear prediction and vector quantization module are made of linear prediction analysis device, linear prediction filter, converter and vector quantizer;

Described linear prediction analysis device is used for frequency coefficient is carried out forecast analysis, obtains prediction gain and predictive coefficient, and the frequency coefficient that will satisfy certain condition outputs to described linear prediction filter; Directly output to described quantification and entropy coding module for the frequency coefficient that does not satisfy condition;

Described linear prediction filter is used for frequency coefficient is carried out filtering, obtains the linear prediction residual difference sequence of frequency coefficient, and residual sequence is outputed to described quantification and entropy coding module, and predictive coefficient is outputed to converter;

Described converter is used for converting predictive coefficient to the line spectrum pair coefficient of frequency;

Described vector quantizer is used for the line spectrum pair coefficient of frequency is carried out multi-stage vector quantization, and the signal after the quantification is sent to described bit stream Multiplexing module.

3, enhancing audio coding apparatus according to claim 1 and 2, it is characterized in that, also comprise and differ from the stereo coding module, between the input of the output of described frequency-domain linear prediction and vector quantization module or described multiresolution analysis module and described quantification and entropy coding module or between the quantizer group and scrambler in described quantification and entropy coding module, be used for the frequency coefficient/residual sequence of left and right acoustic channels is converted to and differs from the frequency coefficient/residual sequence of sound channel.

4, a kind of enhancing audio coding method is characterized in that, may further comprise the steps:

The letter of step 1, calculating input audio signal is covered ratio, and analyzes the type of input signal;

Step 2, input audio signal is carried out time-frequency mapping, obtain the frequency coefficient of sound signal;

Step 3, frequency coefficient is carried out the linear prediction analysis of standard, obtain prediction gain and predictive coefficient; Judge whether prediction gain surpasses preset threshold, if surpass, then according to predictive coefficient frequency coefficient is carried out the filtering of frequency-domain linear prediction error, obtains the prediction residual sequence of frequency coefficient; Predictive coefficient is changed into the line spectrum pair coefficient of frequency, and the line spectrum pair coefficient of frequency is carried out multi-stage vector quantization handle, obtain side information; If prediction gain does not surpass preset threshold, then frequency coefficient is not handled, go to step 4;

Step 4, residual sequence/frequency coefficient is quantized and entropy coding;

Step 5, carry out the sound signal behind side information and the coding multiplexing, obtain the compressed audio code stream.

According to the described enhancing audio coding method of claim 4, it is characterized in that 5, quantizing in the described step 4 is scalar quantization, specifically comprises: the frequency coefficient in all scale factor bands is carried out non-linear companding; The scale factor of utilizing each subband quantizes the frequency coefficient of this subband, obtains the quantized spectrum of integer representation; Select first scale factor in every frame signal as the common scale factor; Other scale factor scale factor previous with it carried out difference processing;

Described entropy coding comprises: the scale factor after quantized spectrum and the difference processing is carried out entropy coding, obtain code book sequence number, scale factor encoded radio and lossless coding quantized spectrum; The code book sequence number is carried out entropy coding, obtain code book SEQ.XFER value.

6, according to claim 4 or 5 described enhancing audio coding methods, it is characterized in that described step 4 further comprises: residual sequence/frequency coefficient is quantized; Judge whether sound signal is multi-channel signal, if multi-channel signal, whether the signal type of then judging the left and right sound channels signal is consistent, if signal type unanimity, then judge whether satisfy between the scale factor band of two sound channel correspondences and difference stereo coding condition, if satisfy, then residual sequence/the frequency coefficient in this scale factor band is carried out and differ from stereo coding, obtain and differ from the residual sequence/frequency coefficient of sound channel; If do not satisfy, then residual sequence/the frequency coefficient in this scale factor band does not carry out and differs from stereo coding; If the inconsistent multi-channel signal of monophonic signal or signal type is not then handled residual sequence/frequency coefficient; Residual sequence/frequency coefficient is carried out entropy coding; Wherein

The described method of judging whether the scale factor band satisfies encoding condition is: Karhunen-Loeve transformation, specifically: the correlation matrix that calculates the spectral coefficient of left and right acoustic channels scale factor band; Correlation matrix is carried out Karhunen-Loeve transformation; If the absolute value of anglec of rotation α departs from π/4 hour, as 3 π/16＜| α |＜5 π/16, then Dui Ying scale factor band can carry out and differ from stereo coding; Described and poor stereo coding is:

[\begin{matrix} \hat{M} \\ \hat{S} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{L} \\ \hat{R} \end{matrix}],

Wherein: After expression quantizes with the sound channel frequency coefficient; Poor sound channel frequency coefficient after expression quantizes;

L channel frequency coefficient after expression quantizes;

R channel frequency coefficient after expression quantizes.

7, a kind of enhancing audio decoding apparatus comprises bit stream demultiplexing module, entropy decoder module, inverse quantizer group and frequency-time map module, it is characterized in that, also comprises contrary frequency-domain linear prediction and vector quantization module;

Described bit stream demultiplexing module is used for audio compressed data stream is carried out demultiplexing, and exports corresponding voiceband data signal and control signal to described entropy decoder module and described contrary frequency-domain linear prediction and vector quantization module;

Described entropy decoder module is used for above-mentioned signal is carried out decoding processing, recovers the quantized value of spectrum, outputs to described inverse quantizer group;

Described inverse quantizer group is used to rebuild the re-quantization spectrum, and outputs in described contrary frequency-domain linear prediction and the vector quantization module;

Described contrary frequency-domain linear prediction and vector quantization module are used for the re-quantization spectrum is carried out re-quantization processing and contrary linear prediction filtering, the spectrum before obtaining predicting, and output to described frequency-time map module;

Described frequency-time map module is used for spectral coefficient is carried out frequency-time map, obtains the time-domain audio signal of low-frequency band.

8, enhancing audio decoding apparatus according to claim 7 is characterized in that, described contrary frequency-domain linear prediction and vector quantization module comprise inverse vector quantizer, inverse converter and contrary linear prediction filter; Described inverse vector quantizer is used for codewords indexes is carried out re-quantization, obtains the line spectrum pair coefficient of frequency; Described inverse converter then is used for the reverse of line spectrum pair coefficient of frequency is changed to predictive coefficient; Described contrary linear prediction filter is used for according to predictive coefficient the re-quantization spectrum being carried out liftering, the spectrum before obtaining predicting.

9, according to claim 7 or 8 described enhancing audio decoding apparatus, it is characterized in that, also comprise and differ from the stereo decoding module, between the input of the output of described inverse quantizer group and comprehensive or described contrary frequency-domain linear prediction of described multiresolution and vector quantization module or between the input of the output of described entropy decoder module and described inverse quantizer group, the signal type analysis result that receives the output of described bit stream demultiplexing module with and the stereo control signal of difference, be used for will converting the re-quantization spectrum of left and right acoustic channels with the re-quantization spectrum of difference sound channel to according to above-mentioned control information.

10, a kind of enhancing audio-frequency decoding method is characterized in that, may further comprise the steps:

Step 1, audio compressed data stream is carried out demultiplexing, obtain data message and control information;

Step 2, above-mentioned information is carried out entropy decoding, the quantized value that obtains composing;

Step 3, the quantized value of spectrum is carried out re-quantization handle, obtain the re-quantization spectrum;

Step 4, judge that whether comprising re-quantization spectrum in the control information needs information through contrary frequency-domain linear prediction vector quantization, if contain, then the re-quantization spectrum is carried out the inverse vector quantification treatment, obtain predictive coefficient, and re-quantization spectrum is carried out the linear prediction building-up process, the spectrum before obtaining predicting according to predictive coefficient; Do not need through information if comprise the re-quantization spectrum, then the re-quantization spectrum is not handled, go to step 5 against the frequency-domain linear prediction vector quantization;

Step 5, the spectrum before predicting/re-quantization spectrum is carried out frequency-time map, obtained time-domain audio signal.

11, enhancing audio-frequency decoding method according to claim 10 is characterized in that, the inverse vector quantification treatment step of described step 4 further comprises: the codewords indexes from control information behind the acquisition predictive coefficient vector quantization; The line spectrum pair coefficient of frequency that obtains quantizing according to codewords indexes again, and calculate predictive coefficient with this.

12, enhancing audio-frequency decoding method according to claim 10 is characterized in that, described step 5 further comprises: re-quantization spectrum is carried out the contrary discrete cosine transform of revising, obtain the time-domain signal after the conversion; Time-domain signal after the conversion is carried out windowing process in time domain; Above-mentioned windowing time-domain signal is carried out overlap-add procedure, obtain time-domain audio signal; Window function in the wherein said windowing process is:

W (N+k)=cos (pi/2* ((k+0.5)/N-0.94*sin (2*pi/N* (k+0.5))/(2*pi))), wherein pi is a circular constant, k=0...N-1; K coefficient of w (k) expression window function has w (k)=w (2*N-1-k); The sample number of N presentation code frame.

13, according to the arbitrary described enhancing audio-frequency decoding method of claim 10 to 12, it is characterized in that, between described step 2 and step 3, also comprise: if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out the quantized value of spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the quantized value with the spectrum difference sound channel in this scale factor band is converted to the quantized value of the spectrum of left and right acoustic channels, go to step 3; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the quantized value of spectrum is not handled, go to step 3; Wherein said and poor stereo decoding is:

[\begin{matrix} \hat{l} \\ \hat{r} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{m} \\ \hat{s} \end{matrix}],

Wherein: After expression quantizes and the quantized value sound channel spectrum; The quantized value of the poor sound channel spectrum after expression quantizes; The quantized value of the L channel spectrum after expression quantizes;

The quantized value of the R channel spectrum after expression quantizes.