CN1677492A

CN1677492A - Intensified audio-frequency coding-decoding device and method

Info

Publication number: CN1677492A
Application number: CNA200410046332XA
Authority: CN
Inventors: 潘兴德; 安德斯·叶瑞特; 朱晓明; 麦可·舒克; 任为民; 王磊; 豪格·何瑞施; 邓昊; 佛里德理克·海恩
Original assignee: BEIJING FUGUO DIGITAL TECHN Co Ltd; Coding Technology Ltd; GONGYU DIGITAL TECHNOLOGY Co Ltd BEIJNG
Current assignee: BEIJING FUGUO DIGITAL TECHN Co Ltd; Coding Technology Ltd; GONGYU DIGITAL TECHNOLOGY Co Ltd BEIJNG
Priority date: 2004-04-01
Filing date: 2004-06-03
Publication date: 2005-10-05

Abstract

The present invention discloses an enhanced audio encoding device is consisted of a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, a bit-stream multiplexing module, a signal characteristic analyzing module and a multi-resolution analyzing module, in which the signal characteristic analyzing module is configured to analyze the signal type of the input audio signal; the psychoacoustical analyzing module calculates a masking threshold and a signal-to-masking ratio of the audio signal, and outputs to said quantization and entropy encoding module; the multi-resolution analyzing module is configured to perform an analysis of the multi-resolution based on the signal type; the quantization and entropy encoding module performs a quantization and an entropy encoding of the frequency-domain coefficients under the control of the signal-to-masking ratio; and the bit-stream multiplexing module forms an audio encoding code stream. The invention is applicable to the Hi-Fi compression encoding of audio signals with the configuration of multiple sampling rates and sound channels, and it supports audio signals with the sampling range of 8kHz to 192kHz. Meanwhile, it supports all possible sound channel configurations and supports audio encoding/decoding with a wide range of target code rate.

Description

A kind of enhancing audio coding and decoding device and method

Technical field

The present invention relates to the audio encoding and decoding technique field, specifically, relate to a kind of enhancing audio coding and decoding device and method based on sensor model.

Background technology

For obtaining the digital audio and video signals of high-fidelity, need digital audio and video signals is carried out audio coding or audio compression so that storage and transmission.Purpose to coding audio signal is to represent for example almost do not have difference between the sound signal of the sound signal of original input and the output of encoded back with the least possible bit number realization the transparent of sound signal.

In the early 1980s, the appearance of CD has embodied the plurality of advantages with the numeral sound signal, for example high fidelity, great dynamic range and strong robustness.Yet these advantages all are to be cost with very high data rate.For example the desired sampling rate of the digitizing of the stereophonic signal of CD quality is 44.1kHz, and each sampled value needs to carry out uniform quantization with 15 bits, like this, the data rate through overcompression has not just reached 1.41Mb/s, so high data rate brings great inconvenience for the transmission and the storage of data, particularly under the occasion of multimedia application and wireless transmission application, be subjected to the restriction of bandwidth and cost especially.In order to keep high-quality sound signal, the therefore network that will look for novelty and the radio multimedium digital audio system speed that must reduce data, and do not damage the quality of audio frequency simultaneously.At the problems referred to above, proposed at present multiplely can obtain the audio compression techniques that high compression ratio can produce the sound signal of high-fidelity again very much, the MPEG-1/-2/-4 technology of ISO (International Standards Organization) ISO/IEC, the AC-2/AC-3 technology of Doby company, the ATRAC/MiniDisc/SDDS technology of Sony and the PAC/EPAC/MPAC technology of Lucent Technologies etc. have typically been arranged.Select the AC-3 technology of MPEG-2 AAC technology, Doby company to carry out specific description below.

MPEG-1 technology and MPEG-2 BC technology are the high tone quality coding techniquess that is mainly used in monophony and stereo audio signal, growing along with to the demand of the multi-channel audio coding that is issued to higher coding quality at low code check, because MPEG-2 BC coding techniques is emphasized the backwards compatibility with the MPEG-1 technology, therefore can't realize the high tone quality coding of five-sound channel with the code check that is lower than 540kbps.At this deficiency, MPEG-2 AAC technology has been proposed, this technology can adopt the speed of 320kbps the five-sound channel signal to be realized the coding of better quality.

Fig. 1 has provided the block scheme of MPEG-2 AAC scrambler, this scrambler comprise behind gain controller 101, bank of filters 102, time-domain noise reshaping module 103, intensity/coupling module 104, psychoacoustic model, the second order to adaptive predictor 105 and/difference stereo module 106, Bit Allocation in Discrete and quantization encoding module 107 and bit stream Multiplexing module 108, wherein Bit Allocation in Discrete and quantization encoding module 107 further comprise ratio of compression/distortion processing controller, scale factor module, non-uniform quantizing device and entropy coding module.

Bank of filters 102 adopts improved discrete cosine transform (MDCT), and its resolution is signal adaptive, promptly adopts 2048 MDCT conversion for steady-state signal, then adopts 256 MDCT conversion for transient signal; Like this, for the signal of 48kHz sampling, its maximum frequency resolution is 23Hz, maximum time resolution be 2.6ms.In bank of filters 102, can use simultaneously sinusoidal windows and Kaiser-Bessel window, when the harmonic interval of input signal is used sinusoidal windows during less than 140Hz, when composition very strong in the input signal uses the Kaiser-Bessel window during at interval greater than 220Hz.

Sound signal enters bank of filters 102 through behind the gain controller 101, carry out filtering according to different signals, handle by the spectral coefficient of 103 pairs of bank of filters of time-domain noise reshaping module, 102 outputs then, the time-domain noise reshaping technology is on frequency domain spectral coefficient to be carried out linear prediction analysis, according to the shape of above-mentioned analysis and Control quantizing noise on time domain, reach the purpose of control Pre echoes then with this.

Intensity/coupling module 104 is the stereo codings that are used for signal intensity, because signal for high band (greater than 2kHz), the sense of direction of the sense of hearing is relevant with the variation of relevant signal intensity (signal envelope), and it is irrelevant with the waveform of signal, be that constant envelope signal does not have influence to sense of hearing sense of direction, therefore can utilize the relevant information between these characteristics and multichannel, the synthetic common sound channel of some sound channels is encoded, this has just formed intensity/coupling technique.

Be used to eliminate the redundancy of steady-state signal behind the second order to adaptive predictor 105, improve code efficiency.With the difference stereo (M/S) module 106 be at sound channel to operating, sound channel is to being meant two sound channels such as left and right acoustic channels in binaural signal or the multi-channel signal or left and right sides surround channel.M/S module 106 utilizes the correlativity between two sound channels of sound channel centering to reduce code check and the effect that improves code efficiency to reach.Bit Allocation in Discrete and quantization encoding module 107 realize that by a nested loop process wherein the non-uniform quantizing device is to carry out lossy coding, and the entropy coding module is to carry out lossless coding, can remove redundant relevant with minimizing like this.Nested loop comprises interior loop and outer circulation, and wherein the step-length of interior loop adjustment non-uniform quantizing device uses up up to the bit that is provided, and outer circulation then utilizes the coding quality of the estimated signal recently of quantizing noise and masking threshold.Form the audio stream output of coding at last by bit stream Multiplexing module 108 through encoded signals.

Under the telescopic situation of sampling rate, input signal carries out producing in the four frequency range polyphase filters groups (PQF) frequency band of four equibands simultaneously, and each band utilization MDCT produces 256 spectral coefficients, always has 1024.In each frequency band, all use gain controller 101.And the PQF frequency band that can ignore high frequency in demoder obtains the low sampling rate signal.

Fig. 2 has provided the block diagram of corresponding MPEG-2 AAC demoder.This demoder comprise bit stream demultiplexing module 201, losslessly encoding module 202, inverse quantizer 203, scale factor module 204 and/difference stereo (M/S) module 205, prediction module 206, intensity/coupling module 207, time-domain noise reshaping module 208, bank of filters 209 and gain control module 210.The audio stream of coding carries out demultiplexing through bit stream demultiplexing module 201, obtains corresponding data stream and control stream.Above-mentioned signal obtains the integer representation of scale factor and the quantized value of signal spectrum by after the decoding of losslessly encoding module 202.Inverse quantizer 203 is one group of non-uniform quantizing device group that realize by the companding function, is used for the integer quantisation value is converted to the reconstruction spectrum.Because the scale factor module in the scrambler is that current scale factor and last scale factor are carried out difference, then difference value is adopted the Huffman coding, therefore the scale factor module 204 in the demoder is carried out the Huffman decoding and can be obtained corresponding difference value, recovers real scale factor again.M/S module 205 will convert left and right acoustic channels to the difference sound channel under the control of side information.Owing to eliminate the redundant of steady-state signal and improve code efficiency to adaptive predictor 105 after in scrambler, adopting second order, therefore in demoder, carry out prediction decoding by prediction module 206.Intensity/coupling module 207 carries out intensity/coupling decoding under the control of side information, output to then and carry out the time-domain noise reshaping decoding in the time-domain noise reshaping module 208, carry out integrated filter by bank of filters 209 at last, bank of filters 209 adopts reverse improvement discrete cosine transform (IMDCT) technology.

For the telescopic situation of sample frequency, can ignore the PQF frequency band of high frequency by gain control module 210, to obtain the low sampling rate signal.

The sound signal of high code check during MPEG-2 AAC encoding and decoding technique is applicable to, but relatively poor to the coding quality of the sound signal of low code check or very low code check; The coding/decoding module that relates to of this encoding and decoding technique is more simultaneously, and the complexity of realization is higher, is unfavorable for real-time implementation.

Fig. 3 has provided the structural representation of the scrambler that adopts Doby AC-3 technology, comprises that transient signal detection module 301, improved discrete cosine transform wave filter MDCT 302, spectrum envelope/index coding module 303, mantissa's coding module 304, forward direction-back are to self-adaptation sensor model 305, parameter Bit Allocation in Discrete module 306 and bit stream Multiplexing module 307.

It is steady-state signal or transient signal that sound signal is differentiated by transient signal detection module 301, by signal adaptive MDCT bank of filters 302 time domain data is mapped to frequency domain data simultaneously, wherein 512 long window is applied to steady-state signal, and a pair of short window is applied to transient signal.

Spectrum envelope/index coding module 303 adopts three kinds of patterns that the exponential part of signal is encoded according to the requirement of code check and frequency resolution, is respectively D15, D25 and D45 coding mode.The AC-3 technology is taked differential coding to spectrum envelope on frequency, because need ± 2 increments at most, on behalf of the level of 6dB, each increment change, and adopts the absolute value coding for first DC terms, and its coexponent just adopts differential coding.In D15 spectrum envelope index coding, each index approximately needs 2.33 bits, and 3 differential set are encoded in the word length of one 7 bit, and the D15 coding mode provides meticulous frequency resolution by sacrificing temporal resolution.Owing to just relative signal is stably just needed meticulous frequency resolution, and the frequency spectrum of such signal on many keeps constant relatively, therefore, and for steady-state signal, D15 is transmitted once in a while, and the spectrum envelope of normally per 6 sound chunk (Frame) is transmitted once.When signal spectrum is unstable, need the normal spectrum estimation value of upgrading.Estimated value adopts less frequency resolution coding, uses D25 and D45 coding mode usually.The D25 coding mode provides suitable frequency resolution and temporal resolution, just carries out differential coding every a coefficient of frequency, and each index approximately needs 1.15 bits like this.When frequency spectrum all is stable, when changing suddenly then, can adopt the D25 coding mode on 2 to 3 pieces.The D45 coding mode is to carry out differential coding every three coefficient of frequencies, and each index approximately needs 0.58 bit like this.The D45 coding mode provides very high temporal resolution and lower frequency resolution, so generally be applied in the coding to transient signal.

Forward direction-back is used to estimate the masking threshold of every frame signal to self-adaptation sensor model 305.Wherein forward direction self-adaptation part only is applied in encoder-side, under the restriction of code check, estimates the sensor model parameter of one group of the best by iterative loop, then these parameters be passed to the back to the self-adaptation part to estimate the masking threshold of every frame.The back is applied in encoder-side and decoder end simultaneously to the self-adaptation part.

Parameter Bit Allocation in Discrete module 306 is according to the spectrum envelope of sheltering criterion analyzing audio signal, with the bit number of determining to distribute to each mantissa.This module 306 utilizes a bit pond that all sound channels are carried out overall Bit Allocation in Discrete.When encoding in mantissa's coding module 304, circulation is taken out Bit Allocation in Discrete and is given all sound channels from the bit pond, adjusts the quantification of mantissa according to the bit number that can obtain.For reaching the purpose of compressed encoding, the AC-3 scrambler also adopts the technology of high-frequency coupling, and the HFS that is coupled signal is divided into 18 frequency sub-band according to people's ear critical bandwidth, selects some sound channel to begin to be coupled from certain subband then.Form the output of AC-3 audio stream by bit stream Multiplexing module 307 at last.

Fig. 4 has provided the schematic flow sheet that adopts Doby AC-3 decoding.At first input flows to line data frame synchronization and Error detection through the bit stream of AC-3 encoder encodes to bit, if detect a data error code, then carries out code error shielding or off beat and handles.Then bit stream is unpacked, obtain main information and side information, carry out index decoder again.When carrying out index decoder, need two side informations: the one, the index number of packing; One is the index strategy that is adopted, as D15, D25 or D45 pattern.The exponential sum Bit Allocation in Discrete side information of having decoded carries out Bit Allocation in Discrete again, points out the used bit number of mantissa of each packing, obtains one group of Bit Allocation in Discrete pointer, the mantissa of the corresponding coding of each Bit Allocation in Discrete pointer.The Bit Allocation in Discrete pointer points out to be used for the quantizer of mantissa and the bit number that takies in each mantissa of code stream.Single encoded mantissa value is carried out de-quantization, be converted into the value of a de-quantization, the mantissa that takies zero bit is resumed zero, perhaps replaces with a randomized jitter value under the control of shake sign.Carry out uncoupled operation then, de is to recover the HFS that is coupled sound channel from public coupling track and coupling factor, comprises exponential sum mantissa.If when coding side adopts 2/0 pattern-coding, can adopt matrix to handle to certain subband, need to recover to convert this subband to the left and right acoustic channels value with the difference channel value in decoding end so by matrix.In code stream, include the dynamic range control value of each audio block, this value is carried out dynamic range compression,, comprise exponential sum mantissa to change the amplitude of coefficient.Frequency coefficient is carried out inverse transformation, be transformed into time domain samples, then time domain samples is carried out windowing process, adjacent piece carries out overlap-add, reconstructs the pcm audio signal.When the channel number of decoding output during less than the channel number in the coded bit stream, also need sound signal descend mix processing, export PCM at last and flow.

Doby AC-3 coding techniques is primarily aimed at the signal of high bit rate multitrack surround sound, but when the coding bit rate of 5.1 sound channels was lower than 384kbps, its coding effect was relatively poor; And it is also lower for the code efficiency of monophony and stereophony.

To sum up, existing encoding and decoding technique can't solve from very low code check, low code check to high code check sound signal comprehensively and the encoding and decoding quality of monophony, binaural signal, realizes comparatively complicated.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of device and method that strengthens audio coding/decoding, to solve code efficiency low, the ropy problem of prior art for low code check sound signal.

Enhancing audio coding apparatus of the present invention comprises psychoacoustic analysis module, time-frequency mapping block, quantification and entropy coding module and bit stream Multiplexing module, signal properties analysis module and multiresolution analysis module; Wherein said signal properties analysis module is used for input audio signal is carried out type analysis, and outputs to described psychoacoustic analysis module and described time-frequency mapping block, simultaneously signal type analysis result information is outputed to described bit stream Multiplexing module; Described psychoacoustic analysis module, the masking threshold and the letter that are used to calculate sound signal are covered ratio, and output to described quantification and entropy coding module; Described time-frequency mapping block is used for time-domain audio signal is transformed into frequency coefficient, and outputs to the multiresolution analysis module; Described multiresolution analysis module is used for the signal type analysis result according to the output of described signal properties analysis module, and the frequency coefficient of fast change type signal is carried out multiresolution analysis, and outputs to and quantize and the entropy coding module; Described quantification and entropy coding module, the letter of described psychoacoustic analysis module output cover than control under, be used for frequency coefficient is quantized and entropy coding, and output to described bit stream Multiplexing module; The data that described bit stream Multiplexing module is used for receiving are carried out multiplexing, form stream of audio codes.

Enhancing audio decoding apparatus of the present invention comprises: bit stream demultiplexing module, entropy decoder module, inverse quantizer group, frequency-time map module and the comprehensive module of multiresolution; Described bit stream demultiplexing module is used for audio compressed data stream is carried out demultiplexing, and exports corresponding data-signal and control signal to described entropy decoder module and the comprehensive module of described multiresolution; Described entropy decoder module is used for above-mentioned signal is carried out decoding processing, recovers the quantized value of spectrum, outputs to described inverse quantizer group; Described inverse quantizer group is used to rebuild the re-quantization spectrum, and outputs to described to the comprehensive module of multiresolution; It is comprehensive that the comprehensive module of described multiresolution is used for that re-quantization spectrum is carried out multiresolution, and output to described frequency-time map module; Described frequency-time map module is used for spectral coefficient is carried out frequency-time map, the output time-domain audio signal.

The present invention is applicable to the high-fidelity compressed encoding of the sound signal of multiple sampling rate, channel configuration, can support that sampling rate is that 8kHz is to the sound signal between the 192kHz; Can support all possible channel configuration; And the audio coding/decoding of the target bit rate that the support scope is very wide.

Description of drawings

Fig. 1 is the block scheme of MPEG-2 AAC scrambler;

Fig. 2 is the block scheme of MPEG-2 AAC demoder;

Fig. 3 is the structural representation that adopts the scrambler of Doby AC-3 technology;

Fig. 4 is the decoding process synoptic diagram that adopts Doby AC-3 technology;

Fig. 5 is the structural representation of code device of the present invention;

Fig. 6 is the filter structure synoptic diagram that adopts Harr wavelet basis wavelet transformation;

Fig. 7 is that the time-frequency that adopts Harr wavelet basis wavelet transformation to obtain is divided synoptic diagram;

Fig. 8 is the structural representation of decoding device of the present invention;

Fig. 9 is the structural representation of the embodiment one of code device of the present invention;

Figure 10 is the structural representation of the embodiment one of decoding device of the present invention;

Figure 11 is the structural representation of the embodiment two of code device of the present invention;

Figure 12 is the structural representation of the embodiment two of decoding device of the present invention;

Figure 13 is the structural representation of the embodiment three of code device of the present invention;

Figure 14 is the structural representation of the embodiment three of decoding device of the present invention;

Figure 15 is the structural representation of the embodiment four of code device of the present invention;

Figure 16 is the structural representation of the embodiment four of decoding device of the present invention;

Figure 17 is the structural representation of the embodiment five of code device of the present invention;

Figure 18 is the structural representation of the embodiment five of decoding device of the present invention;

Figure 19 is the structural representation of the embodiment six of code device of the present invention;

Figure 20 is the structural representation of the embodiment six of decoding device of the present invention;

Figure 21 is the structural representation of the embodiment seven of code device of the present invention;

Figure 22 is the structural representation of the embodiment seven of decoding device of the present invention.

Embodiment

Fig. 1 to Fig. 4 is the structural representation of several scramblers of prior art, introduces in background technology, repeats no more herein.

Need to prove: for convenient, clearly illustrate that the present invention, the specific embodiment of following coding and decoding device adopt corresponding mode to illustrate, but restricted code device and decoding device must not be one to one.

As shown in Figure 5, audio coding apparatus provided by the invention comprises signal properties analysis module 50, psychoacoustic analysis module 51, time-frequency mapping block 52, multiresolution analysis module 53, quantification and entropy coding module 54 and bit stream Multiplexing module 55; Wherein signal properties analysis module 50 is used for input audio signal is carried out type analysis, and audio signal output to psychoacoustic analysis module 51 and time-frequency mapping block 52, is outputed to bit stream Multiplexing module 55 with the signal type analysis result simultaneously; Psychoacoustic analysis module 51 is used to calculate the masking threshold and the letter of input audio signal and covers ratio, outputs to quantize and entropy coding module 54; Time-frequency mapping block 52 is used for time-domain audio signal is transformed into frequency coefficient, and outputs to multiresolution analysis module 53; Multiresolution analysis module 53 is used for the frequency coefficient of fast change type signal is carried out multiresolution analysis according to the signal type analysis result of psychoacoustic analysis module 51 outputs, and outputs to quantification and entropy coding module 54; Quantize and entropy coding module 54 the letter that psychoacoustic analysis module 51 is exported cover than control under, be used for frequency coefficient is quantized and entropy coding, and output to bit stream Multiplexing module 55; The data that bit stream Multiplexing module 55 is used for receiving are carried out multiplexing, form stream of audio codes.

Digital audio and video signals carries out the signal type analysis in signal properties analysis module 50, the type information of sound signal is outputed to bit stream Multiplexing module 55; And simultaneously with audio signal output in described psychoacoustic analysis module 51 and described time-frequency mapping block 52, masking threshold and the letter that calculates this frame sound signal in psychoacoustic analysis module 51 covered ratio on the one hand, letter covered to liken to control signal sends to then to quantize and entropy coding module 54; The sound signal of time domain is transformed into frequency coefficient by time-frequency mapping block 52 on the other hand; Above-mentioned frequency coefficient carries out multiresolution analysis to fast changed signal in multiresolution analysis module 53, improve the temporal resolution of fast changed signal, and the result is outputed in quantification and the entropy coding module 54; The letter of psychoacoustic analysis module 51 output cover than control under, quantize and entropy coding module 54 in quantize and entropy coding, carry out multiplexingly at bit stream Multiplexing module 55 through the data behind the coding and control signal, form the code stream of enhancing audio coding.

Each composition module to above-mentioned audio coding apparatus specifically explains below.

Signal properties analysis module 50, the sound signal that is used to import is carried out the signal type analysis, and the type information of sound signal is outputed to bit stream Multiplexing module 55; Simultaneously audio signal output is arrived psychoacoustic analysis module 51 and time-frequency mapping block 52.

Signal properties analysis module 50 predicts that based on adaptive threshold and waveform carrying out the forward and backward masking effect analyzes to determine that the type of signal is tempolabile signal or fast changed signal, if become type signal soon, then continue to calculate the correlation parameter information of mutagenic components, as the position of jump signal generation and the intensity of jump signal etc.

Psychoacoustic analysis module 51 be mainly used in the masking threshold, the letter that calculate input audio signal cover than and perceptual entropy.The perceptual entropy that calculates according to psychoacoustic analysis module 51 is dynamically analyzed the current demand signal frame and is carried out the required bit number of transparent coding, thereby adjusts the Bit Allocation in Discrete of interframe.The letter of psychoacoustic analysis module 51 each subbands of output is covered than to quantizing and entropy coding module 54, and it is controlled.

Time-frequency mapping block 52 is used to realize the conversion of sound signal from the time-domain signal to the frequency coefficient, being made of bank of filters, specifically can be discrete Fourier transform (DFT) (DFT) bank of filters, discrete cosine transform (DCT) bank of filters, correction discrete cosine transform (MDCT) bank of filters, cosine modulation bank of filters, wavelet transform filter group etc.The frequency coefficient that obtains by the time-frequency mapping is output in quantification and the entropy coding module 54, quantizes and encoding process.

For fast change type signal, for effectively overcoming the Pre echoes phenomenon that produces in the cataloged procedure, improve coding quality, code device of the present invention improves the temporal resolution of coding fast changed signal by multiresolution analysis module 53.The frequency coefficient of time-frequency mapping block 52 outputs is input in the multiresolution analysis module 53, if become type signal soon, then carry out frequency-domain small wave conversion or frequency domain correction discrete cosine transform (MDCT), obtain multi-resolution representation, output in quantification and the entropy coding module 54 frequency coefficient.If gradual type signal is not then handled frequency coefficient, directly output to quantification and entropy coding module 54.

Multiresolution analysis module 53 comprises frequency coefficient conversion module and recombination module, and wherein the frequency coefficient conversion module is used for frequency coefficient is transformed to the time-frequency plane coefficient; Recombination module is used for the time-frequency plane coefficient is recombinated according to certain rule.The frequency coefficient conversion module can adopt frequency-domain small wave transformed filter group, frequency domain MDCT transformed filter group etc.

Quantification and entropy coding module 54 have further comprised nonlinear quantizer group and scrambler, and wherein quantizer can be scalar quantizer or vector quantizer.Vector quantizer is further divided into memoryless vector quantizer and memory vector quantizer two big classes is arranged.For memoryless vector quantizer, each input vector is independently to quantize, and is irrelevant with each former vector; It is the vector before the consideration when quantizing a vector that the memory vector quantizer is arranged, and has promptly utilized the correlativity between the vector.Main memoryless vector quantizer comprises full search vector quantizer, tree search vector quantizer, multi-stage vector quantization device, gain/wave vector quantizer and separates the mean value vector quantizer; The main memory vector quantizer that has comprises predictive vector quantizer and finite state vector quantizer.

If the employing scalar quantizer, then the nonlinear quantizer group further comprises M subband quantizer.In each quantized subband device, mainly utilize scale factor to quantize, specifically: frequency coefficients all in M the scale factor band is carried out non-linear compression, utilize scale factor that the frequency coefficient of this subband is quantized again, the quantized spectrum that obtains integer representation outputs to scrambler, first scale factor in every frame signal is outputed to bit stream Multiplexing module 55 as the common scale factor, and other scale factor scale factor previous with it carries out outputing to scrambler after the difference processing.

Scale factor in the above-mentioned steps is the value that constantly changes, and adjusts according to the Bit Allocation in Discrete strategy.The invention provides a kind of Bit Allocation in Discrete strategy of overall perceptual distortion minimum, specific as follows:

At first, each quantized subband device of initialization is selected suitable scale factor, and the quantized value that makes the spectral coefficient in all subbands is 0.The quantizing noise of each subband equals the energy value of each subband at this moment, and the masking by noise of each subband is covered than SMR than the letter that NMR equals it, and the bit number that quantizes to be consumed is 0, and remaining bits is counted B _lEqual target bit B.

Secondly, search the subband of masking by noise than NMR maximum, smaller or equal to 1, then scale factor is constant as if maximum noise masking ratio NMR, the output allocation result, and bit allocation procedures finishes; Otherwise, the scale factor of the quantized subband device of correspondence is reduced a unit, calculate the bit number Δ B of the required increase of this subband then _i(Q _i).If the remaining bits of this subband is counted B _l〉=Δ B _i(Q _i), then confirm the modification of this scale factor, and remaining bits is counted B _lDeduct Δ B _i(Q _i), the masking by noise of recomputating this subband continues to search the subband of masking by noise than NMR maximum then than NMR, repeats subsequent step.If the remaining bits of this subband is counted B _l＜Δ B _i(Q _i), then cancellation is this time revised, and keeps last scale factor and remaining bits number, exports allocation result at last, and bit allocation procedures finishes.

If employing vector quantizer, then frequency coefficient is formed a plurality of M n dimensional vector ns and is input in the nonlinear quantizer group, all compose smooth for each M n dimensional vector n according to the smooth factor, promptly dwindle the dynamic range of spectrum, in code book, find code word with vector distance minimum to be quantified by vector quantizer according to subjective perception distance measure criterion then, the codewords indexes of correspondence is passed to scrambler.The smooth factor is to adjust according to the Bit Allocation in Discrete strategy of vector quantization, and the Bit Allocation in Discrete of vector quantization is then controlled according to perceptual important degree between different sub-band.

Through after the above-mentioned quantification treatment, utilize entropy coding further to remove the coefficient after the quantification and the statistical redundancy of side information.Entropy coding is a kind of source coding technique, and its basic thought is: give the code word of shorter length to the bigger symbol of probability of occurrence, and give long code word to the little symbol of probability of occurrence, the length of so average code word is the shortest.According to the noiseless coding theorem of Shannon, if N source symbol of message of transmission is independently, use suitable elongated degree coding so, the average length n of code word will satisfy

[\frac{H (x)}{\log_{2} (D)}] \leq \overset{&OverBar;}{n} < [\frac{H (x)}{\log_{2} (D)} + \frac{1}{N}],

Wherein H (x) represents the entropy of information source, and x represents symbolic variable.Because entropy H (x) is the shortest limit of average code length, above-mentioned formula shows that the average length of code word this moment approaches its lower limit entropy H (x) very much, and therefore this elongated degree coding techniques becomes " entropy coding " again.Entropy coding mainly contains methods such as Huffman coding, arithmetic coding or Run-Length Coding, and the entropy coding among the present invention all can adopt any of above-mentioned coding method.

The quantized spectrum and the scale factor after the difference processing that quantize back output through scalar quantizer are carried out entropy coding in scrambler, obtain code book sequence number, scale factor encoded radio and lossless coding quantized spectrum, again the code book sequence number is carried out entropy coding, obtain code book SEQ.XFER value, then scale factor encoded radio, code book SEQ.XFER value and lossless coding quantized spectrum are outputed in the bit stream Multiplexing module 55.

Carry out the one or more dimensions entropy coding through the codewords indexes that obtains after the vector quantizer quantification in scrambler, obtain the encoded radio of codewords indexes, the encoded radio with codewords indexes outputs in the bit stream Multiplexing module 55 then.

Coding method based on above-mentioned scrambler specifically comprises: input audio signal is carried out the signal type analysis; Calculate the letter of sound signal and cover ratio; Sound signal is carried out the time-frequency mapping, obtain the frequency coefficient of sound signal; Frequency coefficient is carried out multiresolution analysis and quantification and entropy coding; Carry out the audio code stream behind signal type analysis result and the coding multiplexing, obtain the compressed audio code stream.

The analytic signal type is based on that adaptive threshold and waveform prediction carrying out forward and backward masking effect analysis determines, concrete steps are: the voice data of input is decomposed framing; Incoming frame is resolved into a plurality of subframes, and search the local maximum point of PCM data absolute value on each subframe; In the local maximum point of each subframe, select the subframe peak value; To certain subframe peak value, utilize a plurality of (typical desirable 3) subframe peak value of this subframe front to predict the typical sample value of a plurality of (typical desirable 4) subframe of this subframe forward delay relatively; The difference and the ratio of the typical sample value of calculating this subframe peak value and being doped; If prediction difference and ratio are all greater than preset threshold, judge that then there is the hop signal in this subframe, confirm that this subframe possesses the local maximum peak dot of backward masking Pre echoes ability, if have the subframe that peak value is enough little between the 2.5ms place at this subframe front end and before sheltering peak dot, judge that then this frame signal belongs to fast change type signal; If prediction difference and ratio are not more than preset threshold, then repeating above-mentioned steps is to become type signal soon or arrive last subframe up to judging this frame signal, not judge this frame signal yet be to become type signal soon if arrive last subframe, and then this frame signal belongs to gradual type signal.

The method of time-domain audio signal being carried out time-frequency conversion has a lot, as discrete Fourier transform (DFT) (DFT), discrete cosine transform (DCT), correction discrete cosine transform (MDCT), cosine modulation bank of filters, wavelet transformation etc.Below to revise the process that discrete cosine transform MDCT and cosine modulation are filtered into the mapping of example explanation time-frequency.

For adopting the situation that discrete cosine transform MDCT carries out time-frequency conversion of revising, at first choose the time-domain signal of a former frame M sample and a present frame M sample, time-domain signal to common 2M the sample of this two frame carries out the windowing operation again, then the signal after the process windowing is carried out the MDCT conversion, thereby obtain M frequency coefficient.

The impulse response of MDCT analysis filter is:

h_{k} (n) = w (n) \sqrt{\frac{2}{M}} \cos [\frac{(2 n + M + 1) (2 k + 1) π}{4 M}],

Then MDCT is transformed to:

X (k) = Σ_{n = 0}^{2 M - 1} x (n) h_{k} (n)

0≤k≤M-1, wherein: w (n) is a window function; X (n) is the input time-domain signal of MDCT conversion; X (k) is the output frequency-region signal of MDCT conversion.

For satisfying the condition of the complete reconstruct of signal, the window function w (n) of MDCT conversion must satisfy following two conditions:

W (2M-1-n)=w (n) and w ²(n)+w ²(n+M)=1.

In practice, can select for use the Sine window as window function.Certainly, also can revise above-mentioned restriction with specific analysis filter and composite filter by using biorthogonal conversion to window function.

Carry out the situation of time-frequency conversion for adopting cosine modulation filtering, then at first choose the time-domain signal of a former frame M sample and a present frame M sample, time-domain signal to common 2M the sample of this two frame carries out the windowing operation again, then the signal after the process windowing is carried out the cosine modulation conversion, thereby obtain M frequency coefficient.

The shock response of traditional cosine modulation filtering technique is

h_{k} (n) = {2 p}_{a} (n) \cos (\frac{π}{M} (k + 0.5) (n - \frac{D}{2}) + θ_{k}),

n＝0，1，...，N _h-1

f_{k} (n) = {2 p}_{s} (n) \cos (\frac{π}{M} (k + 0.5) (n - \frac{D}{2}) - θ_{k}),

n＝0，1，...，N _f-1

0≤k＜M-1 wherein, 0≤n＜2KM-1, K are the integer greater than zero,

θ_{k} = {(- 1)}^{k} \frac{π}{4} .

Suppose analysis window (analysis prototype filter) p of M subband cosine modulation bank of filters _a(n) shock response length is N _a, comprehensive window (comprehensive prototype filter) p _s(n) shock response length is N _sWhen analysis window and comprehensive window equate, i.e. p _a(n)=p _sAnd N (n), _a=N _s, be the orthogonal filter group by the represented cosine modulation bank of filters of top two formulas, this moment matrix H and F ([H] _{N, k}=h _k(n), [F] _{N, k}=f _k(n)) be orthogonal transform matrix.For obtaining the linear-phase filter group, further the regulation symmetry-windows satisfies p _a(2KM-1-n)=p _a(n).For guaranteeing the complete reconstruct of quadrature and biorthogonal system, window function also need meet some requirements, and sees document " Multirate Systems and Filter Banks " for details, P.P.Vaidynathan, Prentice Hall, Englewood Cliffs, NJ, 1993.

Calculating the masking threshold and the letter of the back signal that resamples covers than may further comprise the steps:

The first step, signal is carried out time domain to the mapping of frequency domain.Can adopt fast fourier transform and Hanning window (hanningwindow) technology to convert time domain data to frequency coefficient X[k], X[k] use amplitude r[k] and phase [k] be expressed as X[k]=r[k] e ^{J φ [k]}, the energy e[b of each subband so] be in this subband all spectral line energy and, promptly

e [b] = Σ_{k = k_{l}}^{k = k_{h}} r^{2} [k],

K wherein _lAnd k _hThe up-and-down boundary of representing subband b respectively.

Second goes on foot, determines tone and non-pitch composition in the signal.The tonality of signal is estimated by each spectral line is carried out inter prediction, the predicted value of each spectral line and the Euclidean distance of actual value are mapped as unpredictablely to be estimated, it is very strong that the spectrum composition of high predictability is considered to tonality, and the spectrum composition of low predictability is considered to noise like.

The amplitude r of predicted value _PredAnd phase _PredAvailable following formula is represented:

r _pred[k]＝r _t-1[k]+(r _t-1[k]-r _t-2[k])

φ _pred[k]＝φ _t-1[k]+(φ _t-1[k]-φ _t-2[k])，

Wherein, t represents the coefficient of present frame; T-1 represents the coefficient of former frame; T-2 represents the coefficient of front cross frame.

So, the unpredictable c[k of estimating] computing formula be:

c [k] = \frac{dist (X [k], X_{pred} [k])}{r [k] + | r_{pred} [k] |}

Wherein, and Euclidean distance dist (X[k], X _Pred[k]) adopt following formula to calculate:

dist(X[k]，X _pred[k])＝|X[k]-X _pred[k]|

＝((r[k]cos(φ[k])-r _pred[k]cos(φ _pred[k])) ²+(r[k]sin(φ[k])-r _pred[k]sin(φ _pred[k])) ²) ^。

Therefore, the unpredictable degree c[b of each subband] be the weighted sum of the energy of all spectral lines in this subband to its unpredictable degree, promptly

c [b] = Σ_{k = k_{l}}^{k = k_{h}} {c [k] r}^{2} [k] .

Sub belt energy e[b] and unpredictable degree c[b] carry out convolution algorithm with spread function respectively, obtain sub belt energy expansion e _sThe unpredictable degree expansion of [b] and subband c _s[b], mask i is expressed as s[i to the spread function of subband b, b].In order to eliminate the influence of spread function to the energy change of variable, need be to the unpredictable degree expansion of subband c _s[b] does normalized, and its normalized result uses

Be expressed as

{\tilde{c}}_{s} [b] = \frac{c_{s} [b]}{e_{s} [b]} .

Equally, for eliminating the influence of spread function, the expansion of definition normalized energy to sub belt energy

For

{\tilde{e}}_{s} [b] = \frac{e_{s} [b]}{n [b]},

Normalized factor n[b wherein] be:

n [b] = Σ_{i = 1}^{b \max} s [i, b],

b _MaxThe sub band number of dividing for this frame signal.

According to the unpredictable degree expansion of normalization Can calculate the tonality t[b of subband]:

t [b] = - 0.299 - {0.43 \log}_{e} ({\tilde{c}}_{s} [b]),

And 0≤t[b]≤1.As t[b]=1 the time, represent that this subband signal is a pure pitch; As t[b]=0 the time, represent that this subband signal is a white noise.

The 3rd goes on foot, calculates the required signal to noise ratio (S/N ratio) of each subband (Signal-to-Noise Ratio is called for short SNR).Masking by noise tone (Noise-Masking-Tone with all subbands, abbreviation NMT) value is made as 5dB, tone mask noise (Tone-Masking-Noise, abbreviation TMN) value is made as 18dB, if will make noise not perceived, then the required signal to noise ratio snr [b] of each subband is SNR[b]=18t[b]+6 (1-t[b]).

The 4th goes on foot, calculates the perceptual entropy of the masking threshold and the signal of each subband.The normalized signal energy of each subband that obtains according to abovementioned steps and required signal to noise ratio snr calculate the noise energy threshold value n[b of each subband] be

n [b] = {\tilde{e}}_{s} [b] 10^{- SNR [b] / 10} .

For fear of the influence of Pre echoes, with the noise energy threshold value n[b of present frame] with the noise energy threshold value n of former frame _Prev[b] compares, and the masking threshold that obtains signal is n[b]=min (n[b], 2n _Prev[b]), can guarantee that like this masking threshold can not have high-octane impact generation because of the proximal end at analysis window and deviation occur.

Further, consider static masking threshold qsthr[b] influence, the masking threshold of selecting final signal is the big person of numerical value in the masking threshold of static masking threshold and aforementioned calculation, i.e. n[b]=max (n[b], qsthr[b]).Adopt following formula to calculate perceptual entropy then, promptly

pe = - Σ_{b = 0}^{b \max} ({cbwidth}_{b} \times \log_{10} (n [b] / (e [b] + 1))),

Cbwidth wherein _bRepresent the spectral line number that each subband comprises.

The 5th step: the letter that calculates each subband signal was covered than (Signal-to-Mask Ratio is called for short SMR).The letter of each subband is covered than SMR[b] be

SMR [b] = {10 \log}_{10} (\frac{e [b]}{n [b]}) .

Then frequency coefficient is carried out multiresolution analysis.The frequency domain data of 53 pairs of inputs of multiresolution analysis module carries out the reorganization of time and frequency zone, the temporal resolution that cost improves frequency domain data that is reduced to frequency accuracy, thereby adapt to the fast time-frequency characteristic that becomes type signal automatically, reach the effect that suppresses Pre echoes, and need not to adjust the form of time-frequency mapping block 52 median filter groups.

Multiresolution analysis comprises frequency coefficient conversion and two steps of reorganization, wherein by the frequency coefficient conversion frequency coefficient is transformed to the time-frequency plane coefficient; By reorganization the time-frequency plane coefficient is divided into groups according to certain rule.

Be transformed to example with frequency-domain small wave conversion and frequency domain MDCT below, the process of multiresolution analysis is described.

1) frequency-domain small wave conversion

Suppose time series x (i), i=0,1 ..., 2M-1 is X (k) through the frequency coefficient that obtains after the time-frequency mapping, k=0,1 ..., M-1.The wavelet basis of frequency-domain small wave or wavelet package transforms can be fixed, and also can be adaptive.

Be example with the simplest Harr wavelet basis wavelet transformation below, the process of frequency coefficient being carried out multiresolution analysis is described.

The scale coefficient of Harr wavelet basis is Wavelet coefficient is Fig. 6 shows the filter structure synoptic diagram that employing Harr wavelet basis carries out wavelet transformation, wherein H ₀(filter factor is the expression low-pass filtering

), H ₁(filter factor is in the expression high-pass filtering ), the down-sampling operation that " ↓ 2 " expression is 2 times.Medium and low frequency part X for frequency coefficient ₁(k), k=0 ..., k ₁Do not carry out wavelet transformation, the HFS of frequency coefficient is carried out the Harr wavelet transformation, obtain the coefficient X in different T/F intervals ₂(k), X ₃(k), X ₄(k), X ₅(k), X ₆(k) and X ₇(k), time corresponding-frequency plane is divided as shown in Figure 7.Select different wavelet basiss, can select for use different wavelet transformation structures to handle, obtain other similar T/F planes and divide.Therefore can be as required, the time-frequency plane when adjusting signal analysis is arbitrarily divided, and satisfies the analysis requirement of different time and frequency resolution.

Above-mentioned time-frequency plane coefficient is recombinated according to certain rule in recombination module, for example: can earlier the time-frequency plane coefficient be organized in frequency direction, coefficient in each frequency band is organized at time orientation, then with the coefficient the organized series arrangement according to sub-window, scale factor band.

2) frequency domain MDCT conversion

If the frequency domain data of input frequency domain MDCT transformed filter group is X (k), k=0,1 ..., N-1 carries out the MDCT conversion that M is ordered to this N point frequency domain data successively, makes time-frequency domain data frequency precision descend to some extent, and time precision has then correspondingly improved.In different frequency domain scopes, use the frequency domain MDCT conversion of different length, can obtain when different-frequency plane is divided when different, the frequency precision.Recombination module is recombinated to the time and frequency zone data of frequency domain MDCT transformed filter group output, a kind of recombination method is earlier the time-frequency plane coefficient to be organized in frequency direction, simultaneously the coefficient in each frequency band is organized at time orientation, then with the coefficient the organized series arrangement according to sub-window, scale factor band.

Quantification and entropy coding have further comprised nonlinear quantization and two steps of entropy coding, and wherein quantizing can be scalar quantization or vector quantization.

Scalar quantization may further comprise the steps: the frequency coefficient in all scale factor bands is carried out non-linear compression; The scale factor of utilizing each subband again quantizes the frequency coefficient of this subband, obtains the quantized spectrum of integer representation; Select first scale factor in every frame signal as the common scale factor; Other scale factor scale factor previous with it carried out difference processing.

Vector quantization may further comprise the steps: frequency coefficient is constituted a plurality of multidimensional vector signals; All compose smooth for each M n dimensional vector n according to the smooth factor; In code book, search code word with vector distance minimum to be quantified according to subjective perception distance measure criterion, obtain its codewords indexes.

The entropy coding step comprises: the scale factor after quantized spectrum and the difference processing is carried out entropy coding, obtain code book sequence number, scale factor encoded radio and lossless coding quantized spectrum; The code book sequence number is carried out entropy coding, obtain code book SEQ.XFER value.

Or: codewords indexes is carried out the one or more dimensions entropy coding, obtain the encoded radio of codewords indexes.

Above-mentioned entropy coding method can adopt any in the methods such as existing Huffman coding, arithmetic coding or Run-Length Coding.

Through quantification and the entropy coding processing after, the audio code stream after obtaining encoding carries out this code stream multiplexingly with the common scale factor, signal type analysis result, obtain the compressed audio code stream.

Fig. 8 is the structural representation of audio decoding apparatus of the present invention.Audio decoding apparatus comprises bit stream demultiplexing module 60, entropy decoder module 61, inverse quantizer group 62, the comprehensive module 63 of multiresolution and frequency-time map module 64.Behind the demultiplexing of compressed audio code stream through bit stream demultiplexing module 60, obtain corresponding data-signal and control signal, output to the comprehensive module 63 of entropy decoder module 61 and multiresolution; Data-signal and control signal are carried out decoding processing in entropy decoder module 61, recover the quantized value of spectrum.Above-mentioned quantized value is rebuild in inverse quantizer group 62, obtain the spectrum behind the re-quantization, the re-quantization spectrum outputs in the comprehensive module 63 of multiresolution, outputs in frequency-time map module 64 after the process multiresolution is comprehensive, obtains the sound signal of time domain again through overfrequency-time map.

60 pairs of compressed audio code streams of bit stream demultiplexing module decompose, and obtain corresponding data-signal and control signal, for other modules provide corresponding decoded information.Audio compressed data stream is through behind the demultiplexing, and the signal that outputs to entropy decoder module 61 comprises the common scale factor, scale factor encoded radio, code book SEQ.XFER value and lossless coding quantized spectrum, or the encoded radio of codewords indexes; The output signal type information is to the comprehensive module 63 of multiresolution.

If in code device, adopt scalar quantizer in quantification and the entropy coding module 54, then in decoding device, what entropy decoder module 61 was received is the common scale factor, scale factor encoded radio, code book SEQ.XFER value and the lossless coding quantized spectrum of 60 outputs of bit stream demultiplexing module, then it is carried out the decoding of code book sequence number, spectral coefficient decoding and scale factor decoding, reconstruct quantized spectrum, and to the integer representation of inverse quantizer group 62 output scale factors and the quantized value of spectrum.The coding/decoding method that entropy decoder module 61 adopts is corresponding with the coding method of entropy coding in the code device, as Huffman decoding, arithmetic decoding or runs decoding etc.

After inverse quantizer group 62 receives the integer representation of the quantized value of spectrum and scale factor,, and export re-quantizations to the comprehensive module 63 of multiresolution and compose the quantized value re-quantization of spectrum reconstruction spectrum (re-quantization spectrum) for no convergent-divergent.Inverse quantizer group 62 can be the uniform quantizer group, also can be the non-uniform quantizing device group that realizes by the companding function.In code device, what the quantizer group adopted is scalar quantizer, and then inverse quantizer group 62 also adopts the scalar inverse quantizer in decoding device.In the scalar inverse quantizer, at first the quantized value to spectrum carries out non-linear expansion, utilizes each scale factor to obtain spectral coefficient all in the corresponding scale factor band (re-quantization spectrum) then.

If adopt vector quantizer in quantification and the entropy coding module 54, then in decoding device, entropy decoder module 61 is received the encoded radio of the codewords indexes of bit stream demultiplexing module 60 outputs, the corresponding entropy decoding method of entropy coding method when adopting the encoded radio of codewords indexes with coding is decoded, and obtains the codewords indexes of correspondence.

Codewords indexes outputs in the inverse quantizer group 62, by the inquiry code book, obtains quantized value (re-quantization spectrum), outputs to the comprehensive module 63 of multiresolution.Inverse quantizer group 62 adopts the inverse vector quantizer.After re-quantization spectrum process multiresolution is comprehensive, handle, obtain time-domain audio signal by the mapping of frequency-time map module 64.Frequency-time map module 64 can be inverse discrete cosine transform (IDCT) bank of filters, contrary discrete Fourier transform (DFT) (IDFT) bank of filters, contrary discrete cosine transform (IMDCT) bank of filters, inverse wavelet transform filters group and the cosine modulation bank of filters etc. revised.

Coding/decoding method based on above-mentioned demoder comprises: the compressed audio code stream is carried out demultiplexing, obtain data message and control information; Above-mentioned information is carried out entropy decoding, the quantized value that obtains composing; Quantized value to spectrum carries out the re-quantization processing, obtains the re-quantization spectrum; With re-quantization spectrum carry out multiresolution comprehensive after, carry out frequency-time map again, obtain time-domain audio signal.

If comprise code book SEQ.XFER value, the common scale factor, scale factor encoded radio and lossless coding quantized spectrum in the information behind the demultiplexing, show that then spectral coefficient is to adopt the scalar quantization technology to quantize in code device, then the step of entropy decoding comprises: code book SEQ.XFER value is decoded, obtain the code book sequence number of all scale factor bands; According to the code book of code book sequence number correspondence, the quantization parameter of all scale factor bands of decoding; The decode scale factor of all scale factor bands is rebuild quantized spectrum.Entropy coding method in the corresponding coding method of the entropy decoding method that said process adopted is as runs decoding method, Huffman coding/decoding method, arithmetic decoding method etc.

Be example to adopt runs decoding method decoding code book sequence number, to adopt Huffman coding/decoding method decoding quantization parameter and adopt Huffman coding/decoding method decoding scale factor below, the process of entropy decoding is described.

At first obtain the code book sequence number of all scale factor bands by runs decoding method, decoded code book sequence number is the integer in a certain interval, as suppose that this interval is [0,11], so only be positioned at this effective range, promptly the code book sequence number between 0 to 11 is just with corresponding spectral coefficient Huffman code book corresponding.For complete zero subband, can select a certain code book sequence number correspondence, typical optional 0 sequence number.

Obtain the sign indicating number book number of each scale factor band when decoding after, use the spectral coefficient Huffman code book corresponding, the quantization parameter of all scale factor bands is decoded with this yard book number.If the sign indicating number book number of a scale factor band is in effective range, present embodiment is as between 1 to 11, corresponding spectral coefficient code book of this yard book number so, the codewords indexes of the quantization parameter that then uses this code book to decode from quantized spectrum to obtain the scale factor band unpacks from codewords indexes then and obtains quantization parameter.If the sign indicating number book number of scale factor band not between 1 to 11, the not corresponding any spectral coefficient code book of this yard book number so, the quantization parameter of this scale factor band also just need not be decoded, directly the quantization parameter with this subband all is changed to zero.

Scale factor is used for reconstruct spectrum value on re-quantization spectral coefficient basis, if the sign indicating number book number of scale factor band is in the effective range, and all corresponding scale factor of each yard book number then.When above-mentioned scale factor is decoded, at first read the shared code stream of first scale factor, then other scale factor is carried out the Huffman decoding, obtain the difference between each scale factor and the last scale factor successively, with this difference and the addition of last scale factor value, obtain each scale factor.If the quantization parameter of current sub all is zero, the scale factor of this subband does not need decoding so.

Through behind the above-mentioned entropy decode procedure, the quantized value that obtains composing and the integer representation of scale factor, the quantized value to spectrum carries out the re-quantization processing then, obtains the re-quantization spectrum.Re-quantization is handled and comprised: the quantized value to spectrum carries out non-linear expansion; Obtain all spectral coefficients (re-quantization spectrum) in the corresponding scale factor band according to each scale factor.

If comprise the encoded radio of codewords indexes in the information behind the demultiplexing, then show and adopt vector quantization technology that spectral coefficient is quantized in the code device, then the step of entropy decoding comprises: adopt the entropy decoding method corresponding with entropy coding method in the code device that the encoded radio of codewords indexes is decoded, obtain codewords indexes.Then codewords indexes is carried out re-quantization and handle, obtain the re-quantization spectrum.

For the re-quantization spectrum, if become type signal soon, then frequency coefficient is carried out multiresolution analysis, the multi-resolution representation to frequency coefficient quantizes and entropy coding then; If not fast change type signal, then directly frequency coefficient is quantized and entropy coding.

Multiresolution comprehensively can adopt frequency-domain small wave converter technique or frequency domain MDCT converter technique.The frequency-domain small wave overall approach comprises: earlier above-mentioned time-frequency plane coefficient is recombinated according to certain rule; Again frequency coefficient is carried out wavelet transformation, obtain the time-frequency plane coefficient.And the MDCT transformation law comprises: earlier above-mentioned time-frequency plane coefficient is recombinated according to certain rule, again frequency coefficient is carried out several MDCT conversion, obtain the time-frequency plane coefficient.The method of reorganization can comprise: earlier the time-frequency plane coefficient is organized in frequency direction, the coefficient in each frequency band is organized at time orientation, then with the coefficient the organized series arrangement according to sub-window, scale factor band.

To frequency coefficient carry out in method that frequency-time map handles and the coding method the time-frequency mapping treatment method is corresponding, can adopt inverse discrete cosine transform (IDCT), contrary discrete Fourier transform (DFT) (IDFT), finish against methods such as correction discrete cosine transform (IMDCT), inverse wavelet transforms.

Be example explanation frequency-time map process with the contrary discrete cosine transform IMDCT that revises below.Frequency-time map process comprises three steps: IMDCT conversion, time-domain windowed are handled and time domain stack computing.

At first spectrum before predicting or re-quantization spectrum are carried out the IMDCT conversion, obtain the time-domain signal x after the conversion _{I, n}The expression formula of IMDCT conversion is:

x_{i, n} = \frac{2}{N} Σ_{k = 0}^{\frac{N}{2} - 1} spec [i] [k] \cos (\frac{2 π}{N} (n + n_{0}) (k + \frac{1}{2})),

Wherein, n represents the sample sequence number, and 0≤n＜N, and N represents the time domain samples number, and value is 2048, n ₀=(N/2+1)/2; I represents frame number; K represents to compose sequence number.

Secondly, the time-domain signal that the IMDCT conversion is obtained carries out windowing process in time domain.For satisfying complete reconstruction condition, window function w (n) must satisfy following two condition: w (2M-1-n)=w (n) and w ²(n)+w ²(n+M)=1.

Typical window function has Sine window, Kaiser-Bessel window etc.The present invention adopts a kind of fixing window function, and its window function is: w (N+k)=cos (pi/2* ((k+0.5)/N-0.94*sin (2*pi/N* (k+0.5))/(2*pi))), wherein k=0...N-1; K coefficient of w (k) expression window function has w (k)=w (2*N-1-k); The sample number of N presentation code frame, value are N=1024.Can utilize biorthogonal conversion in addition, adopt specific analysis filter and composite filter to revise above-mentioned restriction window function.

At last, above-mentioned windowing time-domain signal is carried out overlap-add procedure, obtain time-domain audio signal.Specifically: preceding N/2 sample of the signal that windowing operation back is obtained and back N/2 sample overlap-add of former frame signal obtain N/2 the time-domain audio sample of exporting, i.e. timeSam _{I, n}=preSam _{I, n}+ preSam _{I-1, n+N/2}, wherein i represents frame number, n represents the sample sequence number, has

0 \leq n \leq \frac{N}{2},

And the value of N is 2048.

Fig. 9 is the synoptic diagram of first embodiment of code device of the present invention.This embodiment is on the basis of Fig. 5, frequency-domain linear prediction and vector quantization module 56 have been increased, between the input of the output of multiresolution analysis module 53 and quantification and entropy coding module 54, the output residual sequence outputs to bit stream Multiplexing module 55 with the codewords indexes that quantizes to obtain as side information simultaneously to quantizing and entropy coding module 54.

Because frequency coefficient is being to have the time-frequency coefficient that specific time-frequency plane is divided through what obtain behind the multiresolution analysis, so frequency-domain linear prediction and vector quantization module 56 need carry out linear prediction and multi-stage vector quantization to the frequency coefficient on each time period.

The frequency coefficient of multiresolution analysis module 53 outputs is sent in frequency-domain linear prediction and the vector quantization module 56, after frequency coefficient is carried out multiresolution analysis, the frequency coefficient on each time period is carried out the linear prediction analysis of standard; If prediction gain satisfies given condition, then frequency coefficient is carried out linear prediction error filtering, the predictive coefficient that obtains converts line spectral frequencies coefficient LSF (Line Spectrum Frequency) to, adopt best distortion metrics criterion searching and computing to go out the codewords indexes of code books at different levels again, and codewords indexes is sent to bit stream Multiplexing module 55 as side information, the residual sequence that obtains through forecast analysis then outputs to and quantizes and entropy coding module 54.

Frequency-domain linear prediction and vector quantization module 56 are made of linear prediction analysis device, linear prediction filter, converter and vector quantizer.Frequency coefficient is input to and carries out forecast analysis in the linear prediction analysis device, obtains prediction gain and predictive coefficient, to satisfying the frequency coefficient of certain condition, outputs to and carries out filtering in the linear prediction filter, obtains residual sequence; Residual sequence directly outputs in quantification and the entropy coding module 54, and predictive coefficient converts line spectral frequencies coefficient LSF to by converter, the LSF parameter is sent into again and carried out multi-stage vector quantization in the vector quantizer, the signal after the quantification is sent in the bit stream Multiplexing module 55.

Sound signal is carried out the frequency-domain linear prediction processing can be suppressed Pre echoes effectively and obtain bigger coding gain.Suppose real signal x (t), its square Hilbert envelope e (t) is expressed as: e (t)=F ^-1{ ∫ C (ξ) C ^*(the d ξ of ξ-f) }, wherein C (f) is that promptly the Hilbert envelope of signal is relevant with the autocorrelation function of this signal spectrum corresponding to the monolateral spectrum of signal x (t) positive frequency composition.And the pass of the autocorrelation function of the power spectral density function of signal and its time domain waveform is: PSD (f)=F{ ∫ x (τ) x ^*(the d τ of τ-t) }, so signal is duality relation each other at square Hilbert envelope of time domain and signal at the power spectral density function of frequency domain.As from the foregoing, part bandpass signal in each certain frequency scope, if it is constant that its Hilbert envelope keeps, the auto-correlation of adjacent spectral values also will keep constant so, this just means that the spectral coefficient sequence is the stable state sequence with regard to frequency, thereby can come the spectrum value is handled with predictive coding, represent this signal effectively with one group of public predictive coefficient.

Based on the coding method of code device shown in Figure 9 with basic identical based on the coding method of code device shown in Figure 5, difference is to have increased following step: behind frequency coefficient process multiresolution analysis, frequency coefficient on each time period is carried out the linear prediction analysis of standard, obtain prediction gain and predictive coefficient; Judge whether prediction gain surpasses preset threshold, if surpass, then according to predictive coefficient frequency coefficient is carried out the filtering of frequency-domain linear prediction error, obtains residual sequence; Predictive coefficient is changed into the line spectrum pair coefficient of frequency, and the line spectrum pair coefficient of frequency is carried out multi-stage vector quantization handle, obtain side information; Residual sequence is quantized and entropy coding; If prediction gain does not surpass preset threshold, then frequency coefficient is quantized and entropy coding.

Behind frequency coefficient process multiresolution analysis, at first the frequency coefficient on each time period is carried out the linear prediction analysis of standard, comprise and calculate autocorrelation matrix, recursion execution Levinson-Durbin algorithm acquisition prediction gain and predictive coefficient.Judge that then whether the prediction gain that calculates surpasses pre-set threshold, if surpass, then carries out linear prediction error filtering according to predictive coefficient to frequency coefficient; Otherwise frequency coefficient is not dealt with, carry out next step, frequency coefficient is quantized and entropy coding.

Linear prediction can be divided into two kinds of forward prediction and back forecasts, and forward prediction is meant the value prediction currency that utilizes before a certain moment, then is meant value prediction currency after utilizing a certain moment to prediction.Be predicted as the filtering of example explanation linear prediction error with forward direction below, the transport function of linear prediction error wave filter is

A (z) = 1 - Σ_{i = 1}^{p} a_{i} z^{- 1}

A wherein _iThe expression predictive coefficient, p is a prediction order.Frequency coefficient X (k) after elapsed time-frequency transformation obtains predicated error E (k) through after the filtering, also claims residual sequence, satisfies relation between the two

E (k) = X (k) \cdot A (z) = X (k) - Σ_{i = 1}^{p} a_{i} X (k - i) .

Like this, through linear prediction error filtering, the frequency coefficient X (k) of T/F conversion output just can use residual sequence E (k) and one group of predictive coefficient a _iExpression.Then with this group predictive coefficient a _iConvert line spectral frequencies coefficient LSF to, and it is carried out multi-stage vector quantization, and vector quantization is selected best distortion metrics criterion (as the arest neighbors criterion), and searching and computing goes out the codewords indexes of code books at different levels, with this code word that can determine the predictive coefficient correspondence, codewords indexes is exported as side information.Simultaneously, residual sequence E (k) is quantized and entropy coding.By the linear prediction analysis coding principle as can be known, the dynamic range of the residual sequence of spectral coefficient is less than the dynamic range of original spectrum coefficient, therefore when quantizing, less bit number can be distributed, perhaps, improved coding gain can be obtained for the condition of same number of bits.

Figure 10 is the synoptic diagram of the embodiment one of decoding device, this decoding device is on the basis of decoding device shown in Figure 8, contrary frequency-domain linear prediction and vector quantization module 65 have been increased, should contrary frequency-domain linear prediction and vector quantization module 65 between the input of the output of inverse quantizer group 62 and the comprehensive module 63 of many resolutions, and bit stream demultiplexing module 60 is to the contrary frequency-domain linear prediction vector quantization control information of its output, be used for re-quantization spectrum (residual error spectrum) is carried out re-quantization processing and contrary linear prediction filtering, spectrum before obtaining predicting, and output in the comprehensive module 63 of multiresolution.

In scrambler, adopt the frequency-domain linear prediction vector quantization technology to suppress Pre echoes, and obtain bigger coding gain.Therefore in demoder, the contrary frequency-domain linear prediction vector quantization control information of re-quantization spectrum and 60 outputs of bit stream demultiplexing module is input to and recovers the preceding spectrum of linear prediction in contrary frequency-domain linear prediction and the vector quantization module 65.

Contrary frequency-domain linear prediction and vector quantization module 65 comprise inverse vector quantizer, inverse converter and contrary linear prediction filter, and wherein the inverse vector quantizer is used for that codewords indexes is carried out re-quantization and obtains line spectrum pair coefficient of frequency (LSF); Inverse converter then is used for line spectral frequencies coefficient (LSF) reverse is changed to predictive coefficient; Contrary linear prediction filter is used for according to predictive coefficient the re-quantization spectrum being carried out liftering, the spectrum before obtaining predicting, and output to the comprehensive module 63 of multiresolution.

Based on the coding/decoding method of decoding device shown in Figure 10 with basic identical based on the coding/decoding method of decoding device shown in Figure 8, difference is to have increased following step: after obtaining the re-quantization spectrum, judge and whether comprise the information of re-quantization spectrum needs in the control information through contrary frequency-domain linear prediction vector quantization, if contain, then carry out the inverse vector quantification treatment, obtain predictive coefficient, and it is synthetic according to predictive coefficient the re-quantization spectrum to be carried out linear prediction, the spectrum before obtaining predicting; It is comprehensive that the spectrum that prediction is preceding is carried out multiresolution.

After obtaining the re-quantization spectrum, judge according to control information whether this frame signal passes through the frequency-domain linear prediction vector quantization, if then from control information, obtain the codewords indexes behind the predictive coefficient vector quantization; The line spectral frequencies coefficient (LSF) that obtains quantizing according to codewords indexes again, and calculate predictive coefficient with this; It is synthetic then the re-quantization spectrum to be carried out linear prediction, the spectrum before obtaining predicting.

The transport function A (z) that the linear prediction error Filtering Processing is adopted is:

A (z) = 1 - Σ_{i = 1}^{p} a_{i} z^{- i},

Wherein: a _iIt is predictive coefficient; P is a prediction order.Therefore residual sequence E (k) satisfies with predicting preceding spectrum X (k):

X (k) = E (k) \cdot \frac{1}{A (z)} = E (k) + Σ_{i = 1}^{p} a_{i} X (k - i) .

Like this, residual sequence E (k) and the predictive coefficient a that calculates _iSynthetic through frequency-domain linear prediction, the spectrum X (k) before just can obtaining predicting carries out frequency-time map with the spectrum X (k) before the prediction and handles.

If control information shows this signal frame and does not pass through the frequency-domain linear prediction vector quantization, then do not carry out contrary frequency-domain linear prediction vector quantization and handle, the re-quantization spectrum is directly carried out frequency-time map handle.

Figure 11 has provided the structural representation of second embodiment of code device of the present invention.This embodiment has increased on the basis of Fig. 5 and difference stereo (M/S) coding module 57, and this module is between the input of the output of multiresolution analysis module 53 and quantification and entropy coding module 54.For multi-channel signal, the masking threshold of sound channel also will be calculated and differ to psychoacoustic analysis module 51 except calculating the monaural masking threshold of sound signal, outputs to quantize and entropy coding module 54.And difference stereo coding module 57 can also be between the quantizer group and scrambler in quantification and the entropy coding module 54.

And difference stereo coding module 57 is correlativitys of utilizing between two sound channels of sound channel centering, with the frequency coefficient/residual sequence equivalence of left and right acoustic channels be and differ from the frequency coefficient/residual sequence of sound channel, reach the effect that reduces code check and improve code efficiency with this, therefore only be applicable to the multi-channel signal of signal type unanimity.If the inconsistent multi-channel signal of monophonic signal or signal type does not then carry out and differs from stereo coding handling.

Based on the coding method of code device shown in Figure 11 with basic identical based on the coding method of code device shown in Figure 5, difference is to have increased following step: before frequency coefficient being quantized handle with entropy coding, judge whether sound signal is multi-channel signal, if multi-channel signal, then judge a left side, whether the signal type of right-channel signals is consistent, if signal type unanimity, then judge whether satisfy between the scale factor band of two sound channel correspondences and difference stereo coding condition, if satisfy, then it is carried out and differ from stereo coding, obtain and differ from the frequency coefficient of sound channel; If do not satisfy, then do not carry out and differ from stereo coding; If the inconsistent multi-channel signal of monophonic signal or signal type is not then handled frequency coefficient.

With the difference stereo coding except being applied in before the quantification treatment, can also be applied in after the quantification, before the entropy coding, that is: after frequency coefficient is quantized, judge whether sound signal is multi-channel signal, if multi-channel signal judges then whether the signal type of left and right sound channels signal is consistent, if signal type unanimity, then judge whether satisfy between the scale factor band of two sound channel correspondences and difference stereo coding condition,, then it is carried out and differ from stereo coding if satisfy; If do not satisfy, then do not carry out and differ from stereo coding handling; If the inconsistent multi-channel signal of monophonic signal or signal type does not then carry out frequency coefficient and differs from stereo coding handling.

It is a lot of to judge whether the scale factor band can carry out and differ from the method for stereo coding, and the determination methods that the present invention adopts is: by Karhunen-Loeve transformation.Concrete deterministic process is as follows:

If the spectral coefficient of L channel scale factor band is l (k), the spectral coefficient of the corresponding scale factor band of R channel is r (k), and its correlation matrix C is

C = (\begin{matrix} C_{ll} & C_{lr} \\ C_{lr} & C_{rr} \end{matrix}),

Wherein,

C_{ll} = \frac{1}{N} Σ_{k = 0}^{N - 1} l (k} * l (k);

C_{lr} = \frac{1}{N} Σ_{k = 0}^{N - 1} l (k} * r (k);

C_{rr} = \frac{1}{N} Σ_{k = 0}^{N - 1} r (k} * r (k);

N is the spectral line number of scale factor band.Correlation matrix C is carried out Karhunen-Loeve transformation, obtain

{RCR}^{T} = Λ = (\begin{matrix} λ_{ii} & 0 \\ 0 & λ_{ee} \end{matrix}),

Wherein,

R = (\begin{matrix} \cos α & - \sin α \\ \sin α & \cos α \end{matrix})

a &Element; [- \frac{π}{2}, \frac{π}{2}] .

Anglec of rotation a satisfies

\tan (2 a) = \frac{2 C_{lr}}{C_{ll} - C_{rr}},

When a=± π/4, be exactly and differ from the stereo coding pattern.Therefore depart from π/4 hour when the absolute value of anglec of rotation a, such as 3 π/16＜| a|＜5 π/16, corresponding scale factor band can carry out and differ from stereo coding.

And if the difference stereo coding is applied in before the quantification treatment, then with left and right acoustic channels the frequency coefficient of scale factor band by linear transformation with and the frequency coefficient of difference sound channel replace:

[\begin{matrix} M \\ S \end{matrix}] = \frac{1}{2} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} L \\ R \end{matrix}],

Wherein, M represents and the sound channel frequency coefficient; S represents difference sound channel frequency coefficient; L represents the L channel frequency coefficient; R is expressed as the R channel frequency coefficient.

And if the difference stereo coding is applied in after the quantification, then the frequency coefficient of left and right acoustic channels after the quantification of scale factor band by linear transformation with and the frequency coefficient of difference sound channel replace:

[\begin{matrix} \hat{M} \\ \hat{S} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{L} \\ \hat{R} \end{matrix}]

Wherein:

After expression quantizes with the sound channel frequency coefficient;

Poor sound channel frequency coefficient after expression quantizes; L channel frequency coefficient after expression quantizes; R channel frequency coefficient after expression quantizes.

To be placed on after the quantification treatment with the difference stereo coding, not only can effectively remove the relevant of left and right acoustic channels, and, therefore can reach lossless coding owing to after quantification, carry out.

Figure 12 is the synoptic diagram of the embodiment two of decoding device.This decoding device is on the basis of decoding device shown in Figure 8, increased and difference stereo decoding module 66, between the input of the output of inverse quantizer group 62 and the comprehensive module 63 of multiresolution, the signal type analysis result that receives 60 outputs of bit stream demultiplexing module with and the stereo control signal of difference, be used for will converting the re-quantization spectrum of left and right acoustic channels with the re-quantization spectrum of difference sound channel to according to above-mentioned control information.

With the difference stereo control signal in, there is a zone bit to be used to show that whether current sound channel to needing and differing from stereo decoding, if need, then on each scale factor band, also there is a zone bit to show whether corresponding scale factor band needs and differ from stereo decoding, with difference stereo decoding module 66 zone bits, determine whether and to carry out and differ from stereo decoding the spectrum of the re-quantization in some scale factor band according to the scale factor band.If in code device, carried out and the difference stereo coding, then in decoding device, must carry out and differ from stereo decoding re-quantization spectrum.

And difference stereo decoding module 66 can also be between the input of the output of entropy decoder module 61 and inverse quantizer group 62, receive 60 outputs of bit stream demultiplexing module and differ from stereo control signal and signal type analysis results.

Identical based on the coding/decoding method of decoding device shown in Figure 12 coding/decoding method basic and based on decoding device shown in Figure 8, difference is to have increased following step: after obtaining the re-quantization spectrum, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out re-quantization spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the re-quantization with the difference sound channel in this scale factor band is composed the re-quantization spectrum that converts left and right acoustic channels to, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the re-quantization spectrum is not handled, directly carry out subsequent treatment.

With the difference stereo decoding can also be after the entropy decoding processing, re-quantization carries out before handling, that is: behind the quantized value that obtains composing, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out the quantized value of spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the quantized value with the spectrum difference sound channel in this scale factor band is converted to the quantized value of the spectrum of left and right acoustic channels, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the quantized value of spectrum is not handled, directly carry out subsequent treatment.

And if the difference stereo decoding is after entropy decoding, before the re-quantization, then left and right acoustic channels column operations under the frequency coefficient of scale factor band adopts obtains by the frequency coefficient with the difference sound channel:

[\begin{matrix} \hat{l} \\ \hat{r} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{m} \\ \hat{s} \end{matrix}],

Wherein: After expression quantizes with the sound channel frequency coefficient; Poor sound channel frequency coefficient after  represents to quantize; L channel frequency coefficient after expression quantizes; R channel frequency coefficient after expression quantizes.

And if the difference stereo decoding after re-quantization, then the frequency coefficient of left and right acoustic channels behind the re-quantization of subband according to following matrix operation by with the difference sound channel frequency coefficient obtain:

[\begin{matrix} l \\ r \end{matrix}] = [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} m \\ s \end{matrix}],

Wherein: m represents and the sound channel frequency coefficient; S represents difference sound channel frequency coefficient; L represents the L channel frequency coefficient; R represents the R channel frequency coefficient.

Figure 13 has provided the structural representation of the 3rd embodiment of code device of the present invention.This embodiment is on the basis of Fig. 9, increased and difference stereo coding module 57, between the input of the output of frequency-domain linear prediction and vector quantization module 56 and quantification and entropy coding module 54, psychoacoustic analysis module 51 will output to the masking threshold of difference sound channel and quantize and entropy coding module 54.

And difference stereo coding module 57 also can receive the signal type analysis result of psychoacoustic analysis module 51 outputs between the quantizer group and scrambler in quantification and the entropy coding module 54.

In the present embodiment, with the function of difference stereo coding module 57 and principle of work and its identical in Figure 11, repeat no more herein.

Based on the coding method of code device shown in Figure 13 with basic identical based on the coding method of code device shown in Figure 9, difference is to have increased following step: before frequency coefficient being quantized handle with entropy coding, judge whether sound signal is multi-channel signal, if multi-channel signal, whether the signal type of then judging the left and right sound channels signal is consistent, if signal type unanimity, judge then whether the scale factor band satisfies encoding condition, if satisfy, then this scale factor band is carried out and differ from stereo coding; If do not satisfy, then do not carry out and differ from stereo coding handling; If the inconsistent multi-channel signal of monophonic signal or signal type does not then carry out and differs from stereo coding handling.

With the difference stereo coding except being applied in before the quantification treatment, can also be applied in after the quantification, before the entropy coding, that is: after frequency coefficient is quantized, judge whether sound signal is multi-channel signal, if multi-channel signal judges then whether the signal type of left and right sound channels signal is consistent, if signal type unanimity, judge then whether the scale factor band satisfies encoding condition,, then this scale factor band is carried out and differ from stereo coding if satisfy; If do not satisfy, then do not carry out and differ from stereo coding handling; If the inconsistent multi-channel signal of monophonic signal or signal type is then handled not carrying out and differing from stereo coding.

Figure 14 is the structural drawing of the embodiment three of decoding device.This decoding device is on the basis of decoding device shown in Figure 10, increased and difference stereo decoding module 66, in the output of inverse quantizer group 62 and against between the input of frequency-domain linear prediction and vector quantization module 65, bit stream demultiplexing module 60 is to its output and the stereo control signal of difference.

And difference stereo decoding module 66 also can be between the input of the output of entropy decoder module 61 and inverse quantizer group 62, receive 60 outputs of bit stream demultiplexing module and differ from stereo control signals.

In the present embodiment, with the function of difference stereo decoding module 66 and principle of work and its identical in Figure 10, repeat no more herein.

Based on the coding/decoding method of decoding device shown in Figure 14 with basic identical based on the coding/decoding method of decoding device shown in Figure 10, difference is to have increased following step: after obtaining the re-quantization spectrum, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out re-quantization spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the re-quantization with the difference sound channel in this scale factor band is composed the re-quantization spectrum that converts left and right acoustic channels to, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the re-quantization spectrum is not handled, directly carry out subsequent treatment.

Can also before handling, re-quantization carry out with the difference stereo decoding, that is: behind the quantized value that obtains composing, if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out the quantized value of spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the quantized value with the spectrum difference sound channel in this scale factor band is converted to the quantized value of the spectrum of left and right acoustic channels, carry out subsequent treatment again; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the quantized value of spectrum is not handled, directly carry out subsequent treatment.

Figure 15 has provided the synoptic diagram of the 4th embodiment of code device of the present invention.Present embodiment is on the basis of code device shown in Figure 5, resampling module 590 and band spread module 591 have been increased, 590 pairs of input audio signals of the module that wherein resamples resample, change the sampling rate of sound signal, the audio signal output that will change sampling rate again is to signal properties analysis module 50; Band spread module 591 is used for the sound signal of input is analyzed on whole frequency band, extracts the spectrum envelope of HFS and produces the characteristic of relation with low frequency part, and output to bit stream Multiplexing module 55.

Resampling module 590 is used for input audio signal is resampled, resample to comprise two kinds of up-sampling and down-samplings, below below be sampled as the example explanation and resample.In the present embodiment, resampling module 590 comprises low-pass filter and down-sampler, and wherein low-pass filter is used to limit the frequency band of sound signal, eliminates the aliasing that down-sampling may cause.The sound signal of input is carried out down-sampling through after the low-pass filtering.The sound signal of supposing input is s (n), is the filtered v (n) that is output as of low-pass filter of h (n) through impulse response, then has

v (n) = Σ_{k = - \infty}^{\infty} h (k) s (n - k);

The sequence that v (n) is carried out behind the M down-sampling doubly is x (n), then

x (m) = v (Mm) = Σ_{k = - \infty}^{\infty} h (k) s (Mm - k) .

Like this, the sampling rate of the sound signal x after the resampling (n) has just reduced M doubly than the sampling rate of the sound signal s (n) of original input.

Original input audio signal is analyzed on whole frequency band after being input to band spread module 591, extracts the spectrum envelope of HFS and produces the characteristic that concerns with low frequency part, and control information outputs to bit stream Multiplexing module 55 as band spread.

The ultimate principle of band spread is: for most of sound signals, there are very strong correlativity in the characteristic of its HFS and the characteristic of low frequency part, therefore the HFS of sound signal can by its low frequency part effectively reconstruct come out, like this, the HFS of sound signal can not transmit.Be the reconstruct of guaranteeing that HFS can be correct, it is just passable only to transmit a spot of band spread control signal in the compressed audio code stream.

Band spread module 591 comprises parameter extraction module and spectrum envelope extraction module, input signal enters in the parameter extraction module, be extracted in different time-frequency region and represent the parameter of input signal spectral property, then in the spectrum envelope extraction module, with the spectrum envelope of certain time frequency resolution estimated signal HFS.Be suitable for most the characteristic of current input signal in order to ensure time frequency resolution, the time frequency resolution of spectrum envelope can freely be selected.The parameter of input signal spectral property and the spectrum envelope of HFS output in the bit stream Multiplexing module 55 multiplexing as the control signal of band spread.

Bit stream Multiplexing module 55 receive quantize and the encoded radio of the code stream that comprises the common scale factor, scale factor encoded radio, code book SEQ.XFER value and lossless coding quantized spectrum of entropy coding module 54 outputs or codewords indexes and band spread control signal that band spread module 591 is exported after, it is carried out multiplexing, obtain audio compressed data stream.

Coding method based on code device shown in Figure 15 specifically comprises: analyze input audio signal on whole frequency band, extract its high frequency spectrum envelope and spectral property of signals parameter as the band spread control signal; Input audio signal resampled handle and the signal type analysis; Calculate the letter of the back signal that resamples and cover ratio; Signal after resampling is carried out the time-frequency mapping, obtain the frequency coefficient of sound signal; Frequency coefficient is quantized and entropy coding; Carry out the audio code stream behind band spread control signal and the coding multiplexing, obtain the compressed audio code stream.Wherein resample to handle and comprise two steps: the frequency band of restriction sound signal; The sound signal of restricted band carried out many times down-sampling.

Figure 16 is the structural representation of the embodiment four of decoding device, this embodiment has increased band spread module 68 on the basis of decoding device shown in Figure 8, receive the band spread control information of bit stream demultiplexing module 60 outputs and the time-domain audio signal of frequency-low-frequency range that time map module 64 is exported, rebuild high-frequency signal part, outputting bandwidth sound signal by frequency spectrum shift and high frequency adjustment.

Coding/decoding method based on decoding device shown in Figure 16, coding/decoding method basic and based on decoding device shown in Figure 8 is identical, difference is to have increased following step: after obtaining time-domain audio signal, according to band spread control information and time-domain audio signal, the HFS of reconstructed audio signal obtains the audio frequency of broad band signal.

Figure 17,19 and 21 is the 5th to the 7th embodiment of code device, be respectively on the basis of Figure 11, Fig. 9 and code device shown in Figure 13, resampling module 590 and band spread module 591 have been increased, annexation, function and the principle of these two modules and other modules all with its identical in Figure 15, repeat no more herein.

Figure 18,20 and 22 are the 5th to the 7th embodiment of decoding device, be respectively on the basis of Figure 12, Figure 10 and decoding device shown in Figure 14, increased band spread module 68, receive the band spread control information of bit stream demultiplexing module 60 outputs and the time-domain audio signal of frequency-low-frequency range that time map module 64 is exported, rebuild high-frequency signal part, outputting bandwidth sound signal by frequency spectrum shift and high frequency adjustment.

In 7 embodiment of above-mentioned code device, can also comprise gain control module, the sound signal of received signal property analysis module 50 outputs, the fast dynamic range that becomes type signal of control, eliminate the Pre echoes in the Audio Processing, its output is connected to time-frequency mapping block 52 and psychoacoustic analysis module 51, and the adjustment grain that will gain simultaneously outputs to bit stream Multiplexing module 55.

Gain control module is only controlled fast change type signal, and to gradual type signal, is not then handled according to the signal type of sound signal, directly output.For fast change type signal, gain control module is adjusted the time domain energy envelope of signal, promotes the yield value of fast height front signal, makes that the forward and backward time-domain signal amplitude of fast height is comparatively approaching; The time-domain signal that to adjust the time domain energy envelope then outputs to time-frequency mapping block 52, simultaneously amount of gain adjustment is outputed to bit stream Multiplexing module 55.

Coding method based on this code device is basic identical with the coding method based on above-mentioned code device, and difference has been to increase following step: the signal through the signal type analysis is carried out gain control.

In 7 embodiment of above-mentioned code device, can also comprise inversion benefit control module, be positioned at after the output of frequency-time map module 64, receive the signal type analysis result and the amount of gain adjustment information of 60 outputs of bit stream demultiplexing module, be used to adjust the gain of time-domain signal, the control Pre echoes.Inversion benefit control module is controlled fast change type signal, and gradual type signal is not handled after receiving the reconstruction time-domain signal of frequency-time map module 64 outputs.For fast change type signal, inversion benefit control module is rebuild the energy envelope of time-domain signal according to the adjustment of amount of gain adjustment information, reduce the range value of fast height front signal, energy envelope is recalled to original low early and high after state, the range value meeting of the quantizing noise before so fast height and the range value of signal correspondingly reduce together, thereby have controlled Pre echoes.

Identical based on the coding/decoding method of this decoding device coding/decoding method basic and based on above-mentioned decoding device, difference is to have increased following step: carry out the control of inversion benefit to rebuilding time-domain signal.

It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement technical scheme of the present invention, and not breaking away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims

1, a kind of enhancing audio coding apparatus comprises psychoacoustic analysis module, time-frequency mapping block, quantification and entropy coding module and bit stream Multiplexing module, it is characterized in that, also comprises signal properties analysis module and multiresolution analysis module; Wherein said signal properties analysis module is used for input audio signal is carried out type analysis, and outputs to described psychoacoustic analysis module and described time-frequency mapping block, and the type analysis result with sound signal outputs to described bit stream Multiplexing module simultaneously;

Described psychoacoustic analysis module, the masking threshold and the letter that are used to calculate sound signal are covered ratio, and output to described quantification and entropy coding module;

Described time-frequency mapping block is used for time-domain audio signal is transformed into frequency coefficient, and outputs to the multiresolution analysis module;

Described multiresolution analysis module is used for the signal type analysis result according to the output of described signal properties analysis module, and the frequency coefficient of fast change type signal is carried out multiresolution analysis, and outputs to and quantize and the entropy coding module;

Described quantification and entropy coding module, the letter of described psychoacoustic analysis module output cover than control under, be used for frequency coefficient is quantized and entropy coding, and output to described bit stream Multiplexing module;

The data that described bit stream Multiplexing module is used for receiving are carried out multiplexing, form stream of audio codes.

2, enhancing audio coding apparatus according to claim 1 is characterized in that, described multiresolution analysis module comprises frequency coefficient conversion module and recombination module, and wherein said frequency coefficient conversion module is used for frequency coefficient is transformed to the time-frequency plane coefficient; Described recombination module is used for the time-frequency plane coefficient is recombinated according to certain rule; Wherein said frequency coefficient conversion module is frequency-domain small wave transformed filter group or frequency domain MDCT transformed filter group.

3, enhancing audio coding apparatus according to claim 1 is characterized in that, also comprises frequency-domain linear prediction and vector quantization module, between the input of the output of described multiresolution analysis module and described quantification and entropy coding module; Described frequency-domain linear prediction and vector quantization module specifically are made of linear prediction analysis device, linear prediction filter, converter and vector quantizer;

Described linear prediction analysis device is used for frequency coefficient is carried out forecast analysis, obtains prediction gain and predictive coefficient, and the frequency coefficient that will satisfy certain condition outputs to described linear prediction filter; Directly output to described quantification and entropy coding module for the frequency coefficient that does not satisfy condition;

Described linear prediction filter is used for frequency coefficient is carried out filtering, obtains residual sequence, and residual sequence is outputed to described quantification and entropy coding module, and predictive coefficient is outputed to converter;

Described converter is used for converting predictive coefficient to the line spectrum pair coefficient of frequency;

Described vector quantizer is used for the line spectrum pair coefficient of frequency is carried out multi-stage vector quantization, and the relevant side information that quantizes to obtain is sent to described bit stream Multiplexing module.

4, according to the arbitrary described enhancing audio coding apparatus of claim 1-3, it is characterized in that, also comprise and differ from the stereo coding module, between the input of the output of described frequency-domain linear prediction and vector quantization module and described quantification and entropy coding module, perhaps between the quantizer group and scrambler in described quantification and entropy coding module; Described signal properties analysis module is to its output signal type analysis result; Described and poor stereo coding module is used for the residual sequence/frequency coefficient of left and right acoustic channels is converted to and differs from the residual sequence/frequency coefficient of sound channel.

5, according to the arbitrary described enhancing audio coding apparatus of claim 1-4, it is characterized in that, also comprise resampling module and band spread module;

Described resampling module is used for input audio signal is resampled, and changes the sampling rate of sound signal, and the audio signal output that will change after the sampling rate arrives described psychoacoustic analysis module and described signal properties analysis module; Specifically comprise low-pass filter and down-sampler; Wherein said low-pass filter is used to limit the frequency band of sound signal, and described down-sampler is used for the sound signal of restricted band is carried out down-sampling, reduces the signals sampling rate;

Described band spread module is used for input audio signal is analyzed on whole frequency band, extracts the spectrum envelope of HFS and characterizes the parameter of correlativity between low, the high frequency spectrum, and output to described bit stream Multiplexing module; Specifically comprise parameter extraction module and spectrum envelope extraction module; Described parameter extraction module is used to extract input signal is represented the input signal spectral property in different time-frequency region parameter; Described spectrum envelope extraction module is used for the spectrum envelope with certain time frequency resolution estimated signal HFS, then the parameter of input signal spectral property and the spectrum envelope of HFS is outputed to described bit stream Multiplexing module.

6, a kind of enhancing audio coding method is characterized in that, may further comprise the steps:

Step 1, input audio signal is carried out type analysis, with the part of signal type analysis result as multiplexed information;

Step 2, the signal behind the type analysis is carried out time-frequency mapping, obtain the frequency coefficient of sound signal; Simultaneously, the letter that calculates sound signal is covered ratio;

Step 3 is then carried out multiresolution analysis to frequency coefficient if become type signal soon; If not fast change type signal, then go to step 4;

Step 4, letter cover than control under, frequency coefficient is quantized and entropy coding;

Step 5, the sound signal after will encoding are carried out multiplexing, obtain the compressed audio code stream.

7, according to the described enhancing audio coding method of claim 6, it is characterized in that the quantification of described step 4 is scalar quantization, specifically comprises: the frequency coefficient in all scale factor bands is carried out non-linear compression; The scale factor of utilizing each subband again quantizes the frequency coefficient of this subband, obtains the quantized spectrum of integer representation; Select first scale factor in every frame signal as the common scale factor; Other scale factor scale factor previous with it carried out difference processing;

Described entropy coding comprises: the scale factor after quantized spectrum and the difference processing is carried out entropy coding, obtain the lossless coding value of code book sequence number, scale factor encoded radio and quantized spectrum; The code book sequence number is carried out entropy coding, obtain code book SEQ.XFER value.

According to the described enhancing audio coding method of claim 6, it is characterized in that 8, described step 3 multiresolution analysis comprises: frequency coefficient is carried out the MDCT conversion, obtain the time-frequency plane coefficient; Above-mentioned time-frequency plane coefficient is recombinated according to certain rule; Wherein said recombination method comprises: earlier the time-frequency plane coefficient is organized in frequency direction, the coefficient in each frequency band is organized at time orientation, then with the coefficient the organized series arrangement according to sub-window, scale factor band.

9, according to the arbitrary described enhancing audio coding method of claim 6-8, it is characterized in that, between described step 3 and step 4, also comprise: frequency coefficient is carried out the linear prediction analysis of standard, obtain prediction gain and predictive coefficient; Judge whether prediction gain surpasses preset threshold, if surpass, then according to predictive coefficient frequency coefficient is carried out the filtering of frequency-domain linear prediction error, obtains the linear prediction residual difference sequence of frequency coefficient; Convert predictive coefficient to the line spectrum pair coefficient of frequency, and the line spectrum pair coefficient of frequency is carried out multi-stage vector quantization handle, obtain side information; Residual sequence is quantized and entropy coding; If prediction gain does not surpass preset threshold, then frequency coefficient is quantized and entropy coding.

10, according to the arbitrary described enhancing audio coding method of claim 6-9, it is characterized in that described step 4 further comprises: frequency coefficient is quantized; Judge whether sound signal is multi-channel signal, if multi-channel signal, whether the signal type of then judging the left and right sound channels signal is consistent, if signal type unanimity, then judge whether satisfy between the scale factor band of two sound channel correspondences and difference stereo coding condition, if satisfy, then the spectral coefficient in this scale factor band is carried out and differ from stereo coding, obtain and differ from the frequency coefficient of sound channel; If do not satisfy, then the spectral coefficient in this scale factor band does not carry out and differs from stereo coding; If the inconsistent multi-channel signal of monophonic signal or signal type is not then handled frequency coefficient; Frequency coefficient is carried out entropy coding; Wherein

The described method of judging whether the scale factor band satisfies encoding condition is: Karhunen-Loeve transformation, specifically: the correlation matrix that calculates the spectral coefficient of left and right acoustic channels scale factor band; Correlation matrix is carried out Karhunen-Loeve transformation; If the absolute value of anglec of rotation α departs from π/4 hour, as 3 π/16＜| α |＜5 π/16, then Dui Ying scale factor band can carry out and differ from stereo coding; Described and poor stereo coding is:

[\begin{matrix} \hat{M} \\ \hat{S} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{L} \\ \hat{R} \end{matrix}],

Wherein: After expression quantizes with the sound channel frequency coefficient;

11, according to the arbitrary described enhancing audio coding method of claim 6-10, it is characterized in that, before described step 1, also comprise resampling step and band spread step;

Described resampling step resamples to input audio signal, changes the sampling rate of sound signal;

Described band spread step is analyzed input audio signal on whole frequency band, extract its high frequency spectrum envelope and spectral property of signals parameter, as the part of signal multiplexing.

12, a kind of enhancing audio decoding apparatus comprises: bit stream demultiplexing module, entropy decoder module, inverse quantizer group, frequency-time map module, it is characterized in that, and also comprise the comprehensive module of multiresolution;

Described bit stream demultiplexing module is used for audio compressed data stream is carried out demultiplexing, and exports corresponding data-signal and control signal to described entropy decoder module and the comprehensive module of multiresolution;

Described entropy decoder module is used for above-mentioned signal is carried out decoding processing, recovers the quantized value of spectrum, outputs to described inverse quantizer group;

Described inverse quantizer group is used to rebuild the re-quantization spectrum, and outputs to described to the comprehensive module of multiresolution;

It is comprehensive that the comprehensive module of described multiresolution is used for that re-quantization spectrum is carried out multiresolution, and output to described frequency-time map module;

Described frequency-time map module is used for spectral coefficient is carried out frequency-time map, the output time-domain audio signal.

13, enhancing audio decoding apparatus according to claim 12 is characterized in that, the comprehensive module of described multiresolution comprises: coefficient recombination module and transformation of coefficient module; Described transformation of coefficient module is the contrary discrete cosine transform bank of filters of revising of frequency domain inverse wavelet transform filters group or frequency domain.

14, according to claim 12 or 13 described enhancing audio decoding apparatus, it is characterized in that, also comprise contrary frequency-domain linear prediction and vector quantization module, between the input of the output of described inverse quantizer group and the comprehensive module of described multiresolution; Described contrary frequency-domain linear prediction and vector quantization module specifically comprise inverse vector quantizer, inverse converter and contrary linear prediction filter; Described inverse vector quantizer is used for codewords indexes is carried out re-quantization, obtains the line spectrum pair coefficient of frequency; Described inverse converter then is used for the reverse of line spectrum pair coefficient of frequency is changed to predictive coefficient; Described contrary linear prediction filter is used for according to predictive coefficient the re-quantization spectrum being carried out liftering, the spectrum before obtaining predicting.

15, according to the arbitrary described enhancing audio decoding apparatus of claim 12-14, it is characterized in that, also comprise and differ from the stereo decoding module, after described inverse quantizer group or between the input of the output of described entropy decoder module and described inverse quantizer group, receive the output of described bit stream demultiplexing module with the stereo control signal of difference, be used for basis and the stereo control information of difference and will convert the quantized value of the re-quantization spectrum/spectrum of left and right acoustic channels to the quantized value of the re-quantization spectrum/spectrum of difference sound channel.

16, a kind of enhancing audio-frequency decoding method is characterized in that, may further comprise the steps:

Step 1, audio compressed data stream is carried out demultiplexing, obtain data message and control information;

Step 2, above-mentioned information is carried out entropy decoding, the quantized value that obtains composing;

Step 3, the quantized value of spectrum is carried out re-quantization handle, obtain the re-quantization spectrum;

Step 4, that re-quantization spectrum is carried out multiresolution is comprehensive;

Step 5, carry out frequency-time map, obtain time-domain audio signal.

17, enhancing audio-frequency decoding method according to claim 16, it is characterized in that, the comprehensive step of described step 4 multiresolution is specifically: to the series arrangement of re-quantization spectral coefficient according to sub-window, scale factor band, recombinate according to the frequency preface again, coefficient to reorganization carries out a plurality of contrary discrete cosine transforms of revising then, obtains the preceding re-quantization spectrum of multiresolution analysis.

18, enhancing audio-frequency decoding method according to claim 16 is characterized in that, described step 5 may further include: carry out the contrary discrete cosine transform of revising, obtain the time-domain signal after the conversion; Time-domain signal after the conversion is carried out windowing process in time domain; Above-mentioned windowing time-domain signal is carried out overlap-add procedure, obtain time-domain audio signal; Window function in the wherein said windowing process is:

W (N+k)=cos (pi/2* ((k+0.5)/N-0.94*sin (2*pi/N* (k+0.5))/(2*pi))), wherein k=0...N-1; K coefficient of w (k) expression window function has w (k)=w (2*N-1-k); The sample number of N presentation code frame.

19, according to claim 17 or 18 described enhancing audio-frequency decoding methods, it is characterized in that, between described step 3 and step 4, also comprise: judge whether include the information of re-quantization spectrum needs in the control information through contrary frequency-domain linear prediction vector quantization, if contain, then carry out the inverse vector quantification treatment, obtain predictive coefficient, and it is synthetic to utilize predictive coefficient that the re-quantization spectrum is carried out linear prediction, the spectrum before obtaining predicting; Spectrum before the prediction is carried out frequency-time map; Wherein said inverse vector quantification treatment further comprises: the codewords indexes from control information behind the acquisition predictive coefficient vector quantization; The line spectrum pair coefficient of frequency that obtains quantizing according to codewords indexes again, and calculate predictive coefficient with this.

20, according to the arbitrary described enhancing audio-frequency decoding method of claim 16-19, it is characterized in that, between described step 2 and step 3, also comprise: if the signal type analysis result shows the signal type unanimity, then basis and the stereo control signal of difference judge whether that needs carry out re-quantization spectrum and differ from stereo decoding; If desired, then judge this scale factor band whether needs and difference stereo decoding according to the zone bit on each scale factor band, if desired, then the re-quantization with the difference sound channel in this scale factor band is composed the re-quantization spectrum that converts left and right acoustic channels to, go to step 3; If signal type is inconsistent or do not need to carry out and differ from stereo decoding, then the re-quantization spectrum is not handled, go to step 3;

Wherein said and poor stereo decoding is:

[\begin{matrix} \hat{l} \\ \hat{r} \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} \hat{m} \\ \hat{s} \end{matrix}],

Wherein:

After expression quantizes with the sound channel frequency coefficient;

Poor sound channel frequency coefficient after expression quantizes;

L channel frequency coefficient after expression quantizes; R channel frequency coefficient after expression quantizes.