EP1873753A1 - Enhanced audio encoding/decoding device and method - Google Patents
Enhanced audio encoding/decoding device and method Download PDFInfo
- Publication number
- EP1873753A1 EP1873753A1 EP05742018A EP05742018A EP1873753A1 EP 1873753 A1 EP1873753 A1 EP 1873753A1 EP 05742018 A EP05742018 A EP 05742018A EP 05742018 A EP05742018 A EP 05742018A EP 1873753 A1 EP1873753 A1 EP 1873753A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- frequency
- module
- coefficients
- signal
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims description 104
- 238000013139 quantization Methods 0.000 claims abstract description 194
- 230000005236 sound signal Effects 0.000 claims abstract description 102
- 238000004458 analytical method Methods 0.000 claims abstract description 54
- 238000013507 mapping Methods 0.000 claims abstract description 49
- 238000005070 sampling Methods 0.000 claims abstract description 41
- 230000000873 masking effect Effects 0.000 claims abstract description 27
- 238000001228 spectrum Methods 0.000 claims description 205
- 230000009466 transformation Effects 0.000 claims description 86
- 239000013598 vector Substances 0.000 claims description 79
- 230000010354 integration Effects 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 25
- 230000007480 spreading Effects 0.000 claims description 23
- 238000001914 filtration Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 22
- 230000008521 reorganization Effects 0.000 claims description 11
- 230000001131 transforming effect Effects 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 6
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 description 12
- 238000010168 coupling process Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000005859 coupling reaction Methods 0.000 description 8
- 230000008878 coupling Effects 0.000 description 7
- 238000007493 shaping process Methods 0.000 description 6
- 230000001052 transient effect Effects 0.000 description 6
- 238000005481 NMR spectroscopy Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000009499 grossing Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000005311 autocorrelation function Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000011426 transformation method Methods 0.000 description 2
- 101500028021 Drosophila melanogaster Immune-induced peptide 16 Proteins 0.000 description 1
- 101000848724 Homo sapiens Rap guanine nucleotide exchange factor 3 Proteins 0.000 description 1
- 102100034584 Rap guanine nucleotide exchange factor 3 Human genes 0.000 description 1
- JDZPLYBLBIKFHJ-UHFFFAOYSA-N Sulfamoyldapsone Chemical compound C1=CC(N)=CC=C1S(=O)(=O)C1=CC=C(N)C=C1S(N)(=O)=O JDZPLYBLBIKFHJ-UHFFFAOYSA-N 0.000 description 1
- 229920000535 Tan II Polymers 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000004350 spin decoupling difference spectroscopy Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
Definitions
- the invention relates to audio encoding and decoding, and in particular, to an enhanced audio encoding/decoding device and method based on a sensor model.
- the digital audio signals need to be audio encoded or audio compressed for storage and transmission.
- the object of encoding the audio signals is to realize transparent representation thereof by using as less number of bits as possible, for example, the originally input audio signals are almost the same as the output audio signals after being encoded.
- CD came into existence, which reflects many advantages of representing the audio signals by digits, such as high fidelity, large dynamic range and great robustness.
- all these advantages are achieved at the cost of a very high data rate.
- the sampling rate requested by the digitization of the stereo signal of CD quality is 44.1kHz, and each sampling rate has to be uniformly quantized by 15 bits, thus the non-compressed data rate reaches 1.41Mb/s which brings great inconvenience to the transmission and storage of data, and the transmission and storage of data are limited by the bandwidth and cost especially in the situation of multimedia application and wireless transmission application.
- the data rate in new network and wireless multimedia digital audio system must be reduced without damaging the quality of the audio.
- MPEG-1 and MPEG-2 BC techniques are high sound quality encoding technique mainly used for mono and stereo audio signals.
- the MPEG-2 BC encoding technique gives emphasis to backward compatibility with the MPEG-1 technique, it is impossible to realize high sound quality encoding of five sound channels at a code rate lower than 540kbps.
- the MPEG-2 AAC technique was put forward, which can realize a high quality encoding of the five channel signals at a rate of 320kbps.
- Fig. 1 is a block diagram of the MPEG-2 AAC encoder.
- Said encoder comprisesagaincontroller101, a filter bank 102, atime-domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustical model, a second order backward adaptive predictor 105, a sum-difference stereo module 106, a bit allocation and quantization encoding module 107, and a bit stream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further comprises a compression ratio/distortion processing controller, a scale factor module, a non-uniform quantizer, and an entropy encoding module.
- the filter bank 102 uses a modified discrete cosine transformation (MDCT), whose resolution is signal-adaptive, that is, an MDCT transformation of 2048 dots is used for the steady state signal, while a MDCT transformation of 256 dots is used for the transient state signal, thus for a signal sampled at 48kHz, the maximum frequency resolution is 23Hz and the maximum time resolution is 2.6ms.
- MDCT modified discrete cosine transformation
- sine window and Kaiser-Bessel window can be used in the filter bank 102, and the sine window is used when the harmonic wave interval of the input signal is less than 140Hz, while the Kaiser-Bessel window is used when the strong component interval in the input signal is greater than 220Hz.
- the time-domain noise shaping technique performs linear prediction analysis on the frequency spectrum coefficients in the frequency domain, then controls the shape of the quantized noise according to said analysis to thereby control the pre-echo.
- the intensity/coupling module 104 is used for stereo encoding of the signal intensity.
- the sense of direction of audition is related to the change in the relevant signal intensity (signal envelope), but is irrelevant to the waveform of the signal, that is, a constant envelope signal has no influence on the sense of direction of audition. Therefore, this characteristic and the relevant information among multiple sound channels can be utilized to combine several sound channels into one common sound channel to be encoded, thereby forming the intensity/coupling technique.
- the second order backward adaptive predictor 105 is used for removing the redundancy of the steady state signal and improving the encoding efficiency.
- the sum-difference stereo (M/S) module 106 operates on sound channel pairs.
- the sound channel pair refers to the two sound channels of the left-right sound channels or the left-right surround sound channels in, for example, double sound channel signals or multiple sound channel signals.
- the M/S module 106 achieves the effect of reducing code rate and improving encoding efficiency by means of the correlation between the two sound channels in the sound channel pair.
- the bit allocation and quantization encoding module 107 is realized by a nested loop, wherein the non-uniform quantizer performs lossy encoding, while the entropy encoding module performs lossless encoding, thus removing redundancy and reducing correlation.
- the nested loop comprises inner layer loop and outer layer loop, wherein the inner layer loop adjusts the step size of the non-uniform quantizer until the provided bits are used up, and the outer layer loop estimates the encoding quality of signal by using the ratio between the quantized noise and the masking threshold.
- the encoded signals are formed into an encoded audio stream through the bit stream multiplexing module 108 to be output.
- Fig. 2 is a schematic block diagram of the corresponding MPEG-2 AAC decoder.
- Said decoder comprises a bit stream demultiplexing module 201, a lossless decodingmodule 202, an inverse quantizer 203, a scale factor module 204, a sum-difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, a time-domain noise shaping module 208, a filter bank 209 and a gain control module 210.
- the encoded audio stream is demultiplexed by the bit stream demultiplexing module 201 to obtain the corresponding data stream and control stream.
- the inverse quantizer 203 is a non-uniform quantizer bank realized by a companding function, which is used for transforming the integer quantized values into a reconstruction spectrum.
- the scale factor module in the encoder differentiates the current scale factors from the previous scale factors and performs a Huffman encoding on the differences, so the scale factor module 204 in the decoder can obtain the corresponding differences through Huffman decoding, from which the real scale factors can be recovered.
- the M/S module 205 converts the sum-difference sound channel into a left-right sound channel under the control of the side information.
- a prediction module 206 is used in the decoder for performing prediction decoding.
- the intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, then outputs to the time domain noise shaping module 208 to perform time domain noise shaping decoding, and in the end integrated filtering is performed by the filter bank 209, which adopts an inverse modified discrete cosine transformation (IMDCT) technique.
- IMDCT inverse modified discrete cosine transformation
- the high frequency PQF frequency band of high frequency can be neglected through the gain control module 210 so as to obtain signals of low sampling rate.
- the MPEG-2 AAC encoding/decoding technique is suitable for audio signals of medium and high code rate, but it has a poor encoding quality for low code rate or very low code rate audio signals; meanwhile, this encoding/decoding technique involves a lot of encoding/decoding modules, so it is highly complex in implementation and is not easy for real-time implementation.
- Fig. 3 is a schematic drawing of the structure of the encoder using the Dolby AC-3 technique, which comprises a transient state signal detection module 301, a modified discrete cosine transformer filter MDCT 302, a frequency spectrum envelope/index encoding module 303, a mantissa encoding module 304, a forward-backward adaptive sensing model 305, a parameter bit allocation module 306, and a bit stream multiplexing module 307.
- the audio signal is determined through the transient state signal detection module 301 to be either a steady state signal or a transient state signal. Meanwhile, the time-domain data is mapped to the frequency-domain data through the signal adaptive MDCT filter bank 302, wherein a long window of 512 dots is applied to the steady state signal, and a pair of short windows are applied to the transient state signal.
- the frequency spectrum envelope/index encoding module 303 encodes the index portion of the signal according to the requirements of the code rate and frequency resolution in three modes, i.e. D15 encoding mode, D25 encoding mode and D45 encoding mode.
- the AC-3 technique uses differential encoding for the spectrum envelope in frequency, because an increment of ⁇ 2 is needed at most, each increment representing a level change of 6dB.
- An absolute value encoding is used for the first DC item, and differential encoding is used for the rest of the indexes.
- each index requires about 2.33 bits, and three differential groups are encoded in a word length of 7 bits.
- the D15 encoding mode sacrifices the time resolution to provide refined frequency resolution.
- D15 is transmitted occasionally, usually the frequency spectrum envelope of every 6 sound blocks (one data frame) is transmitted at one time.
- the estimate is encoded with lower frequency resolution generally using D25 and D45 encoding modes generally.
- the D25 encoding mode provides the appropriate frequency resolution and time resolution, and differential encoding is performed in every other frequency coefficient, thus each index needs about 1.15 bits. If the frequency spectrum is steady on two to three blocks but changes abruptly, the D25 encoding mode can be used.
- the D45 encoding mode performs differential encoding in every three frequency coefficients, thus each index needs about 0.58 bit.
- the D45 encoding mode provides very high time resolution but low frequency resolution, so it is generally used for encoding of transient state signals.
- the forward-backward adaptive sensing model 305 is used for estimating the masking threshold of each frame of signals, wherein the forward adaptive portion is only applied to the encoder to estimate a group of optimal sensing model parameters through iterative loop under the restriction of the code rate, then said parameters are transferred to the backward adaptive portion to estimate the masking threshold of each frame.
- the backward adaptive portion is applied both to the encoder and the decoder.
- the parameter bit allocation module 306 analyzes the frequency spectrum envelope of the audio signals according to the masking rule to determine the number of bits allocated to each mantissa. Said module 306 performs an overall bit allocation for all the sound channels by using a bit reservoir.
- bits are taken recurrently from the bit pool to be allocated to all sound channels.
- the quantization of the mantissa is adjusted according to the number of bits that can be obtained.
- the AC-3 encoder also uses the high frequency coupling technique, in which the high frequency portion of the coupled signal is divided into 18 sub-frequency channels according to the critical bandwidth of human ear, then some of the sound channels are selected to be coupled starting from a certain sub-band. Finally, AC-3 audio stream is formed through the bit stream multiplexing module 307 to be output.
- Fig. 4 is a schematic drawing of the flow of decoding using Dolby AC-3.
- the bit stream that is encoded by AC-3 encoder is input, and data frame synchronization and error code detection are performed on the bit stream. If a data error code is detected, error code covering or muting processing is performed. Then the bit stream is de-packaged to obtain the primary information and the side information, and then index decoding is performed thereon.
- index decoding two pieces of side information are needed, one is the number of packaged indexes, the other is the index strategy that is adopted, such as D15, D25 or D45 mode.
- the decoded index and the bit allocation side information again perform the bit allocation to indicate the number of bits used by each packaged mantissa, thereby obtaining a group of bit allocation pointers, each corresponding to an encoded mantissa.
- the bit allocation pointers point out the quantizer for the mantissa and the number of bits occupied by each mantissa in the code stream.
- the single encoded mantissa value is de-quantized to be transformed into a de-quantized value, and the mantissa that occupies zero bit is recovered to zero or is replaced by a random jitter value under the control of the jitter mark.
- the de-coupling operation is carried out, which recovers the high frequency portion of the coupled sound channel, including the index and the mantissa, from the common coupling sound channel and the coupling factor.
- a matrix processing is used for a certain sub-band, then at the decoding terminal, the sum and difference sound channel value of said sub-band should be converted into the left-right sound channel value through matrix recovery.
- the code stream includes the dynamic range control value of each audio block. A dynamic range compression is performed on said value to change the amplitude of the coefficients, including index and mantissa.
- the frequency-domain coefficients are inversely transformed into time-domain samples, then the time-domain samples are processed by adding window, and adjacent blocks are superposed to reconstruct the PCM audio signal.
- a down-mixing processing should be performed on the audio signal to finally output the PCM stream.
- the Dolby AC-3 encoding technique is mainly for high bit rate signals of multi-channel surround sound, but when the encoding bit rate of 5.1 sound channel is lower than 384kbps, the encoding effect is bad; besides, the encoding efficiency of stereo of mono and double sound channels is also low.
- the existing encoding and decoding techniques cannot ensure the encoding and decoding quality of audio signals of very low code rate, low code rate and high code rate and of signals of mono and dual channels, and the implementation thereof is complex.
- the technical problem to be solved by this invention is to provide an enhanced audio encoding/decoding device and method so as to overcome the low encoding efficiency and poor encoding quality with respect to the low code rate audio signals in the prior art.
- the enhanced audio encoding device of the invention comprises a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, a bit-stream multiplexing module, a signal characteristic analyzing module and a multi-resolution analyzing module.
- the signal characteristic analyzing module is configured to analyze the signal type of the input audio signal and output it to the psychoacoustical analyzing module and time-frequency mapping module, and to output the information of the result of signal type analysis to the bit-stream multiplexing module;
- the psychoacoustical analyzing module is configured to calculate a masking threshold and a signal-to-masking ratio of the audio signal, and output them to said quantization and entropy encoding module;
- the time-frequency mapping module is configured to convert the time-domain audio signal into frequency-domain coefficients and output them to the multi-resolution analyzing module;
- the multi-resolution analyzing module is configured to perform a multi-resolution analysis on the frequency-domain coefficients of signals of a fast varying type based on the signal type analysis result output from the signal characteristic analyzing module, and to output them to the quantization and entropy encoding module;
- the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the frequency
- the enhanced audio decoding device of the invention comprises a bit-stream demultiplexing module, an entropy decoding module, an inverse quantizer bank, a frequency-time mapping module, and a multi-resolution integration module.
- the bit-stream demultiplexing module is configured to demultiplex the compressed audio data stream and output the corresponding data signals and control signals to the entropy decoding module and the multi-resolution integration module;
- the entropy decoding module is configured to decode said signals, recover the quantized values of the spectrum so as to output them to the inverse quantizer bank;
- the inverse quantizer bank is configured to reconstruct the inverse quantization spectrum and output it to the multi-resolution integration module,
- the multi-resolution integration module is configured to perform multi-resolution integration on the inverse quantization spectrum and to output it to the frequency-time mapping module; and the frequency-time mapping module is configured to perform a frequency-time mapping on the spectrum coefficients to output the time-domain audio signals.
- the invention is applicable to the Hi-Fi compression encoding of audio signals with the configuration of multiple sampling rates and sound channels, and it supports audio signals with the sampling range of 8kHz to 192kHz. Meanwhile, it supports all possible sound channel configurations and supports audio encoding/decoding with a wide range of target code rate.
- Figs. 1-4 are the schematic drawings of the structures of the encoders of the prior art, which have been introduced in the background art, so they will not be elaborated herein.
- the audio encoding device of the present invention comprises a signal characteristic analyzing module 50, a psychoacoustical analyzing module 51, a time-frequency mapping module 52, a multi-resolution analyzing module 53, a quantization and entropy encoding module 54, and a bit-stream multiplexing module 55.
- the signal characteristic analyzing module 50 is configured to analyze the signal type of the input audio signal and output the audio signal to the psychoacoustical analyzing module 51 and time-frequency mapping module 52, and to output the result of signal type analysis to the bit-stream multiplexing module 55;
- the psychoacoustical analyzing module 51 is configured to calculate a masking threshold and a signal-to-masking ratio of the input audio signal, and output them to the quantization and entropy encoding module 54;
- the time-frequency mapping module 52 is configured to convert the time-domain audio signal into frequency-domain coefficients and output them to the multi-resolution analyzing module 53;
- the multi-resolution analyzing module 53 is configured to perform a multi-resolution analysis on the frequency-domain coefficients of signal of a fast varying type based on the signal type analysis result output from the psychoacoustical analyzing module 51, and to output them to the quantization and entropy encoding module 54;
- the quantization and entropy encoding module 54 is
- the digital audio signal is analyzed as to the signal type in the signal characteristic analyzing module 50, and the type information of the audio signal is output to the bit stream multiplexing module 55; meanwhile, the audio signal is output to the psychoacoustical analyzing module 51 and the time-frequency mapping module 52.
- the masking threshold and the signal-to-masking ratio of this frame of audio signal are calculated in the psychoacoustical analyzing module 51, and the signal-to-masking ratio is transmitted as a control signal to the quantization and entropy encoding module 54, and on the other hand, the time-domain audio signal is converted into frequency-domain coefficients through the time-frequency mapping module 52; the multi-resolution analyzing module 53 performs a multi-resolution analysis of the frequency-domain coefficients of the fast varying type signals so as to increase the time resolution of the fast varying type signals and to output the result to the quantization and entropy encoding module 54; under the control of the signal-to-masking ratio output from the psychoacoustical analyzing module 51, quantization and entropy encoding are performed in the quantization and entropy encoding module 54, then the encoded data and control signal are multiplexed in the bit-stream multiplexing module 55 to form a code stream of enhanced audio encoding.
- the signal characteristic analyzing module 50 is configured to analyze the signal type of the input audio signal and output the type information of the audio signal to the bit-stream multiplexing module 55, and to output the audio signal to the psychoacoustical analyzing module 51 and time-frequency mapping module 52 at the same time.
- the signal characteristic analyzing module 50 determines if the signal is a slowly varying signal or a fast varying signal by analyzing the forward and backward masking effects based on the adaptive threshold and waveform prediction. If the signal is of a fast varying type, the relevant parameter information of the abrupt component is then calculated, such as the location where the abrupt signal occurs and the intensity of the abrupt signal, etc.
- the psychoacoustical analyzing module 51 is mainly configured to calculate a masking threshold, a signal-to-masking ratio and a sensing entropy of the input audio signal.
- the number of bits needed for the transparent encoding of the current signal frame can be dynamically analyzed based on the sensing entropy calculated by the psychoacoustical analyzing module 51, thereby adjusting the bit allocation among frames.
- the psychoacoustical analyzing module 51 outputs the signal-to-masking ratio of each sub-band to the quantization and entropy encoding module 54 to control it.
- the time-frequency mapping module 52 is configured to convert the audio signal from a time-domain signal into frequency-domain coefficients, and it is formed of a filter bank which can be specifically discrete Fourier transformation (DFT) filter bank, discrete cosine transformation (DCT) filter bank, modified discrete cosine transformation (MDCT) filter bank, cosine modulated filter bank, or wavelet transformation filter bank, etc.
- DFT discrete Fourier transformation
- DCT discrete cosine transformation
- MDCT modified discrete cosine transformation
- cosine modulated filter bank or wavelet transformation filter bank
- the encoding device of the present invention increases the time resolution for the encoded fast varying signals by means of the multi-resolution analyzing module 53.
- the frequency-domain coefficients output from the time-frequency mapping module 52 are input to the multi-resolution analyzing module 53.
- a frequency-domain wavelet transformation or frequency-domain modified discrete cosine transformation is performed to obtain the multi-resolution representation for the frequency-domain coefficients to be output to the quantization and entropy encoding module 54; if the signal is of a slowly varying type, the frequency-domain coefficients are directly output to the quantization and entropy encoding module 54 without being processed.
- MDCT frequency-domain modified discrete cosine transformation
- the multi-resolution analyzing module 53 comprises a frequency-domain coefficient transformation module and a reorganization module, wherein the frequency-domain coefficient transformation module is used for transforming the frequency-domain coefficients into time-frequency plane coefficients; and the reorganization module is used for reorganizing the time-frequency plane coefficients according to a certain rule.
- the frequency-domain coefficients transformation module can use the filter bank of frequency-domain wavelet transformation, the filter bank of frequency-domain MDCT transformation, etc.
- the quantization and entropy encoding module 54 further comprises a non-linear quantizer bank and an encoder, wherein the quantizer can be either a scalar quantizer or a vector quantizer.
- the vector quantizer can be further divided into the two categories of memoryless vector quantizer and memory vector quantizer.
- each input vector is separately quantized independent of the previous vectors; while the memory vector quantizer quantizes a vector taking into account the previous vectors, i.e. using the correlation among the vectors.
- Main memoryless vector quantizers include full searching vector quantizer, tree searching vector quantizer, multi-stage vector quantizer, gain/waveform vector quantizer and separate mean value vector quantizer; and the main memory vector quantizers include prediction vector quantizer and finite state vector quantizer.
- the non-linear quantizer bank further comprises M sub-band quantizers.
- the scale factor is mainly used to perform the quantization, specifically, all the frequency-domain coefficients of the sub-band of M scale factor are non-linearly compressed, then the frequency-domain coefficients of said sub-band is quantized by using the scale factors to obtain the quantization spectrum represented by an integer to be output to the encoder, The first scale factor in each frame of signal output to the bit-stream multiplexing module 55 as the common scale factor to be , and the rest of the scale factors are output to the encoder after differential processing with respect to their respective preceding scale factors.
- the scale factors in said step are constantly varying values, which are adjusted according to the bit allocation strategy.
- the present invention provides an overall sensing bit allocation strategy with the minimum distortion, details are as follows:
- each sub-band quantizer is initialized to select an appropriate scale factor, so that the quantization values of the spectrum coefficients of all the sub-bands is zero.
- the quantization noise of each sub-band at this time equals to the energy value thereof, and the noise-to-masking ratio NMR of each sub-band equals to its signal-to-masking ratio SMR.
- the number of bit consumed by the quantization is zero, and the number of remaining bits B 1 equals to the number of target bits B.
- the sub-band with the largest noise-to-masking ratio NMR is searched. If the noise-to-masking ratio NMR is not more than 1, the scale factor remains unchanged and the allocation result is output, thus ending the bit allocation; otherwise, the scale factor of the corresponding sub-band quantizer is reduced by one unit, then the number of bits ⁇ B i ( Q i ) that needs to be added for said sub-band is calculated.
- the frequency-domain coefficients form a plurality of M-dimensional vectors to be input to the non-linear quantizer bank.
- Each M-dimensional vector is spectrum smoothed according to a smoothing factor, i.e. reducing the dynamic range of the spectrum, then the vector quantizer finds the code word from the code book that has the shortest distance from the vector to be quantized according to the subjective perception distance measure criterion, and transfers the corresponding code word index to the encoder.
- the smoothing factor is adjusted based on the bit allocation strategy of vector quantization, while the bit allocation strategy of vector quantization is controlled according to the priority of sensing among different sub-bands.
- the entropy encoding technique is used to further remove the statistical redundancy of the quantized coefficients and the side information.
- Entropy encoding is a source encoding technique, whose basic idea is allocating shorter code words to symbols that have greater probability of appearance, and allocating longer code words to symbols that have less probability of appearance, thus the average code word length is the shortest.
- entropy encoding mainly includes Huffman encoding, arithmetic encoding or run length encoding method.
- the entropy encoding in the present invention can be any of said encoding methods.
- Entropy encoding is performed on the quantization spectrum quantized and output by the scalar quantizer and the differentially processed scale factors in the encoder to obtain the code book sequence numbers, the encoded values of the scale factors, and the lossless encoding quantization spectrum, then the code book sequence numbers are entropy encoded to obtain the encoded values of the code book sequence numbers, then the encoded values of the scale factors, the encoded values of the code book sequence numbers, and the lossless encoding quantization spectrum are output to the bit-stream multiplexing module 55.
- the code word indexes quantized by the vector quantizer are one-dimensional or multi-dimensional entropy encoded in the encoder to obtain the encoded values of the code word indexes, then the encoded values of the code word indexes are output to the bit-stream multiplexing module 55.
- the encoding method based on said encoder as described above includes analyzing the signal type of the input audio signal; calculating the signal-to-masking ratio of the audio signal; performing a time-frequency mapping on the audio signal to obtain the frequency-domain coefficients of the audio signal; performing multi-resolution analysis, quantization and entropy encoding on the frequency-domain coefficients; and multiplexing the result of signal type analysis and the encoded audio code stream to obtain the compressed audio code stream.
- the signal type is determined by forward and backward masking effect analysis based on the adaptive threshold and waveform prediction, and the specific steps thereof are: decomposing the input audio data into frames; decomposing the input frames into a plurality of sub-frames and searching for the local extremal vertexes of the absolute values of the PCM data on each sub-frame; selecting the sub-frame peak value from the local extremal vertexes of the respective sub-frames; for a certain sub-frame peak value, predicting the typical sample value of a plurality of (typically four) sub-frames that are forward delayed with respect to said sub-frame by means of a plurality of (typically three) sub-frame peak values before said sub-frame; calculating the difference and ratio between said sub-frame peak value and the predicted typical sample value; if the predicted difference and ratio are both larger than the predetermined thresholds, determining that said sub-frame has jump signal and confirming that said sub-frame has the local extremal vertex with the capability of backward masking pre-echo, if there is a sub
- DFT discrete Fourier transformation
- DCT discrete cosine transformation
- MDCT modified discrete cosine transformation
- cosine modulation filter bank wavelet transformation
- wavelet transformation etc.
- modified discrete cosine transformation MDCT and cosine modulation filtering are taken as examples to illustrate the process of time-frequency mapping.
- the time-domain signals of M samples from the previous frame and the time domain signals of M samples of the present frame are selected first, then a window adding operation is performed on the altogether 2M samples of these two frames, finally, MDCT transformation is performed on the window added signals to obtain M frequency-domain coefficients.
- Sine window can be used as the window function.
- said limitation to the window function can be modified by using double orthogonal transformation with specific analysis filter and synthesis filter.
- the time-domain signals of M samples from the previous frame and the time domain signals of M samples of the present frame are selected first, then a window adding operation is performed on the altogether 2M samples of these two frames, finally, cosine modulation filtering is performed on the window added signals to obtain M frequency-domain coefficients.
- the impact response length of the analysis window (analysis prototype filter) P a (n) of M sub-bands cosine modulation filter bank is N a
- the impact response length of integrated window (integrated prototype filter) P s (n) is N s .
- the calculation of the masking threshold and signal-to-masking ratio of the re-sampled signal includes the following steps:
- the multi-resolution analyzing module 53 re-organizes the time-frequency domain of the input frequency-domain data to improve the time resolution of the frequency-domain data at the cost of reducing the frequency precision, thereby to automatically adapt to the time-frequency characteristic of the fast varying type signals and to suppress the pre-echo without adjusting the form of the filter bank in the time-frequency mapping module 52.
- the multi-resolution analysis includes the two steps of frequency-domain coefficient transformation and reorganization, wherein the frequency-domain coefficients are transformed into time-frequency plane coefficients through frequency-domain coefficient transformation, and the time-frequency plane coefficients are grouped by reorganization according to a certain rule.
- the frequency-domain wavelet or the wavelet basis of wavelet package transformation may either be fixed or adaptive.
- the scale coefficient of Harr wavelet basis is 1 2 ⁇ 1 2 . and the wavelet coefficient is 1 2 , - 1 2 .
- Fig. 6 shows the schematic drawing of the filtering structure that performs wavelet transformation by using Harr wavelet basis, wherein H 0 represents low-pass filtering (the filtering coefficient is 1 2 ⁇ 1 2 ), H 1 represents high-pass filtering (the filtering coefficient is 1 2 , - 1 2 ), with " ⁇ 2" representing a duple down sampling operation.
- Harr wavelet transformation is performed for the high frequency portions of the frequency-domain coefficients to obtain coefficients X 2 (k), X 3 (k), X 4 (k), X 5 (k), X 6 (k), and X 7 (k), of different time-frequency intervals, and the division of the corresponding time-frequency plane is as shown in Fig. 7.
- different wavelet transformation structures can be used for processing so as to obtain other similar time-frequency plane divisions. Therefore, the time-frequency plane division during signal analysis can be discretionarily adjusted as desired so as to meet different requirements of the analysis of the time and frequency resolution.
- time-frequency plane coefficients are reorganized in the reorganizationmodule according to a certain rule, for example, the time-frequency plane coefficients can be organized in the frequency direction first, and the coefficients in each frequency band are organized in the time direction, then the organized coefficients are arranged in the order of sub-window and scale factor band.
- Frequency-domain MDCT transformations of different lengths are used in different frequency-domain ranges, thereby to obtain different time-frequency plane divisions, i.e. different time and frequency precision.
- the reorganization module reorganizes the time-frequency domain data output from the filter bank of the frequency-domain MDCT transformation.
- One way of reorganization is to organize the time-frequency plane coefficients in the frequency direction first, and the coefficients in each frequency band are organized in the time direction at the same time, then the organized coefficients are arranged in the order of sub-window and scale factor band.
- Quantization and entropy encoding further include the two steps of non-linear quantization and entropy encoding, wherein the quantization can be scalar quantization or vector quantization.
- the scalar quantization comprises the steps of non-linearly compressing the frequency-domain coefficients in all the scale factor bands; using the scale factor of each sub-band to quantize the frequency-domain coefficients of said sub-band to obtain the quantization spectrum represented by an integer; selecting the first scale factor in each frame of signal as the common scale factor; and differentiating the rest of the scale factors from their respective previous scale factor.
- the vector quantization comprises the steps of forming a plurality of multi-dimensional vector signals with the frequency-domain coefficients; performing spectrum smoothing for each M-dimensional vector according to the smoothing factor; searching for the code word from the code book that has a shortest distance from the vector to be quantized according to the subjective perception distance measure criterion to obtain the code word index.
- the entropy encoding step comprises entropy encoding the quantization spectrum and the differentiated scale factors to obtain the sequence numbers of the code book, the encoded value of the scale factors and the quantization spectrum of lossless encoding; and entropy encoding the sequence numbers of the code book to obtain the encoded values thereof.
- a one-dimensional or multi-dimensional entropy encoding is performed on the code word indexes to obtain the encoded values of the code word indexes.
- Said entropy encoding method can be any one of the existing Huffman encoding, arithmetic encoding or run length encoding method.
- the encoded audio code stream is obtained, which is multiplexed together with the common scale factor and the result of signal type analysis to obtain the compressed audio code stream.
- Fig. 8 is a schematic drawing of the structure of the audio decoding device according to the present invention.
- the audio decoding device comprises a bit-stream demultiplexing module 60, an entropy decoding module 61, an inverse quantizer bank 62, a multi-resolution integration module 63 and a frequency-time mapping module 64.
- the compressed audio code stream is demultiplexed by the bit-stream demultiplexing module 60 to obtain the corresponding data signal and control signal which are output to the entropy decoding module 61 and the multi-resolution integration module 63; the data signal and control signal are decoded in the entropy decoding module 61 to recover the quantized values of the spectrum.
- Said quantized values are reconstructed in the inverse quantizer bank 62 to obtain the inversely quantized spectrum, the inversely quantized spectrum is then output to the multi-resolution integration module 63 and is output to the frequency-time mapping module 64 after a multi-resolution integration, then the audio signal of time-domain is obtained through frequency-time mapping.
- the bit-stream demultiplexing module 60 decomposes the compressed audio code stream to obtain the corresponding data signal and control signal and to provide the corresponding decoding information for other modules.
- the compressed audio data stream is demultiplexed to output signals to the entropy decoding module 61, said signals including the common scale factor, the scale factor encoded values, the encoded values of the code book sequence number, and the quantized spectrum of the lossless encoding, or the encoded values of the code word indexes, and to output the information of the signal type to the multi-resolution integration module 63.
- the quantization and entropy encoding module 54 uses the scalar quantizer, then in the decoding device, what the entropy decoding module 61 receives are the common scale factor, the scale factor encoded value, the encoded values of the code book sequence numbers, and the quantized spectrum of the lossless encoding output from the bit-stream demultiplexing module 60, then code book sequence number decoding, spectrum coefficient decoding and scale factor decoding are performed thereon to reconstruct the quantized spectrum and to output the integer representation of the scale factors and the quantized values of the spectrum to the inverse quantizer bank 62.
- the decoding method used by the entropy decoding module 61 corresponds to the encoding method used by entropy encoding in the encoding device, which is, for example, Huffman decoding, arithmetic decoding or run length decoding, etc.
- the inverse quantizer bank 62 Upon receipt of the quantized values of the spectrum and the integer representation of the scale factors, the inverse quantizer bank 62 inversely quantizes the quantized values of the spectrum into reconstructed spectrum without scaling (inverse quantization spectrum), and outputs the inverse quantization spectrum to the multi-resolution integration module 63.
- the inverse quantizer bank 62 can be either a uniform quantizer bank or a non-uniform quantizer bank realized by a companding function. In the encoding device, the quantizer bank uses the scalar quantizer, so in the decoding device, the inverse quantizer bank 62 also uses the scalar inverse quantizer. In the scalar inverse quantizer, the quantized values of the spectrum are non-linearly expanded first, then all the spectrum coefficients (inverse quantization spectrum) in the corresponding scale factor band are obtained by using each scale factor.
- the entropy decoding module 61 receives the encoded values of the code word indexes output from the bit-stream demultiplexing module 60, and decodes the encoded values of the code word indexes by the entropy decoding method corresponding to the entropy encoding method used in entropy encoding, thereby obtaining the corresponding code word index.
- the code word indexes are output to the inverse quantizer bank 62, and by looking up the code book, the quantized values (inverse quantization spectrum) are obtained and are output to the multi-resolution integration module 63.
- the inverse quantizer bank 62 uses the inverse vector quantizer. After a multi-resolution integration, the inverse quantization spectrum is mapped by the frequency-time mapping module 64 to obtain the time-domain audio signal.
- the frequency-time mapping module 64 can be a filter bank of inverse discrete cosine transformation (IDCT), a filter bank of inverse discrete Fourier transformation (IDFT), a filter bank of inverse modified discrete cosine transformation (IMDCT), a filter bank of inverse wavelet transformation, and a cosine modulation filter bank, etc.
- the decoding method based on the above-mentioned decoder comprises: demultiplexing the compressed audio code stream to obtain the data information and control information; entropy decoding said information to obtain the quantized values of the spectrum; inversely quantizing the quantized values of the spectrum to obtain the inverse quantization spectrum; multi-resolution integrating the inverse quantization spectrum and then performing a frequency-time mapping thereon to obtain the time-domain audio signal.
- the entropy decoding steps include: decoding the encoded values of the code book sequence numbers to obtain the code book sequence numbers of all the scale factor bands; decoding the quantization coefficients of all the scale factor bands according to the code book corresponding to the code book sequence numbers; and decoding the scale factors of all the scale factor bands to reconstruct the quantization spectrum.
- the entropy decoding method used in said process corresponds to the entropy encoding method used in the encoding method, which is, for example, run length decoding method, Huffman decoding method, or arithmetic decoding method, etc.
- the entropy decoding process is described below by using as examples the decoding of the code book sequence number by the run length decoding method, the decoding of the quantization coefficients by the Huffman decoding method, and the decoding of the scale factor by the Huffman decoding method.
- the code book sequence numbers of all the scale factor bands are obtained through the run length decoding method.
- the decoded code book sequence numbers are integers within a certain range. Suppose that said range is [0, 11], then only the code book sequence numbers within said valid range, i.e. between 0-11, are corresponding to the Huffman code book of the spectrum coefficients.
- a certain code book sequence can be selected to correspond to it, typically, the 0 sequence number can be selected.
- the Huffman code book of spectrum coefficients corresponding to said code book number is used to decode the quantization coefficients of all the scale factor bands. If the code book number of a scale factor band is within the valid range, for example between 1-11 in this embodiment, then said code book number corresponds to a spectrum coeff icient code book, and said code book is used to decode the quantization spectrum to obtain the code word indexes of the quantization coefficients of the scale factor bands, subsequently, the code word indexes are de-packaged to obtain the quantization coefficients. If the code book number of the scale factor band is not between 1 and 11, then said code book number is not corresponding to any spectrum coefficient code book, and the quantization coefficients of said scale factor band do not need to be decoded, but they are all directly set to be zero.
- the scale factors are used to reconstruct the spectrum values on the basis of the inverse quantization spectrum coefficients. If the code book number of the scale factor band is within the valid range, each code book number corresponds to a scale factor.
- the code stream occupied by the first scale factor is read first, then the rest of the scale factors are Huffman decoded to obtain the differences between each of the scale factors and their respective previous scale factors, and said differences are added to the valuse of the previous scale factors to obtain the respective scale factors. If the quantization coefficients of the present sub-band are all zero, then the scale factors of said sub-band do not have to be decoded.
- the quantized values of the spectrum and the integer representation of the scale factors are obtained, then the quantized values of the spectrum are inversely quantized to obtain the inverse quantization spectrum.
- the inverse quantization processing includes non-linear expanding the quantized values of the spectrum, and obtaining all the spectrum coefficients (inverse quantization spectrum) in the corresponding scale factor band according to each scale factor.
- the entropy decoding steps include: decoding the encoded values of the code word indexes by means of the entropy decoding method corresponding to the entropy encoding method used in the encoding device so as to obtain the code word indexes, then inversely quantizing the code word indexes to obtain the inverse quantization spectrum.
- the frequency-domain coefficients are multi-resolution analyzed, then the multi-resolution representation of the frequency-domain coefficients is quantized and entropy encoded; if it is not a fast varying type signal, the frequency-domain coefficients are directly quantized and entropy encoded.
- the multi-resolution integration can use frequency-domain wavelet transformation method or frequency-domain MDCT transformation method.
- the frequency-domain wavelet integration method includes: reorganizing said time-frequency plane coefficients according to a certain rule; performing wavelet transformation on the frequency-domain coefficients to obtain the time-frequency plane coefficients.
- the MDCT transformation includes: reorganizing said time-frequency plane coefficients according to a certain rule, and then performing several times of MDCT transformation on the frequency-domain coefficients to obtain the time-frequency plane coefficients.
- the reorganization method includes: organizing the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, then the organized coefficients are arranged in the order of sub-window and scale factor band.
- the method of performing a frequency-time mapping on the frequency-domain coefficients corresponds to the time-frequency mapping method in the encoding method, which can be inverse discrete cosine transformation (IDCT), inverse discrete Fourier transformation (IDFT), inverse modified discrete cosine transformation (IMDCT), and inverse wavelet transformation, etc.
- IDCT inverse discrete cosine transformation
- IDFT inverse discrete Fourier transformation
- IMDCT inverse modified discrete cosine transformation
- wavelet transformation etc.
- the frequency-time mapping process is illustrated below by taking inverse modified discrete cosine transformation IMDCT as an example.
- the frequency-time mapping process includes three steps: IMDCT transformation, time-domain window adding processing and time-domain superposing operation.
- IMDCT transformation is perform on the spectrum before prediction or the inverse quantization spectrum to obtain the transformed time-domain signal x i,n .
- window adding is performed on the time-domain signal obtained from IMDCT transformation at the time domain.
- Typical window functions include, among others, Sine window and Kaiser-Bessel window.
- said restriction to the window function can be modified by using double orthogonal transformation with a specific analysis filter and synthesis filter.
- the window added time-domain signal is superposed to obtain the time-domain audio signal.
- Fig. 9 is a schematic drawing of the first embodiment of the encoding device of the present invention.
- this embodiment has a frequency-domain linear prediction and vector quantization module 56 added between the output of the multi-resolution analyzing module 53 and the input of the quantization and entropy encoding module 54 for outputting the residual sequence to the quantization and entropy encoding module 54, and for outputting the quantized code indexes as the side information to the bit-stream multiplexing module 55.
- frequency-domain linear prediction and vector quantization module 56 needs to perform linear prediction and multi-stage vector quantization for the frequency-domain coefficients at each time interval.
- the frequency-domain coefficents output from the multi-resolution analyzing module 53 are transmitted to the frequency-domain linear prediction and vector quantization module 56.
- standard linear prediction analysis is performed on the frequency-domain coefficients at each time interval. If the prediction gain meets the given condition, linear prediction error filtering is performed on the frequency-domain coefficients, and the resulted prediction coefficients are transformed into line spectrum frequency LSF coefficients, then the optimal distortion measurement criterion is used to search and calculate the the code word indexes for the respective code book, and the code word indexes are used as side information to be transferred to the bit-stream multiplexing module 55, while the residual sequence obtained through prediction analysis is output to the quantization and entropy encoding module 54.
- the frequency-domain linear prediction and vector quantization module 56 consists of a linear prediction analyzer, a linear prediction filter, a transformer, and a vector quantizer. Frequency-domain coefficients are input to the linear prediction analyzer for prediction analysis to obtain the prediction gain and prediction coefficients. The frequency-domain coefficients that meet a certain condition are output to the linear prediction filter to be filtered, and a residual sequence is obtained thereby; the residual sequence is directly output to the quantization and entropy encoding module 54, while the prediction coefficients are transformed into line spectrum frequency LSF coefficients through the transformer, then the LSF parameters are sent to the vector quantizer for a multi-stage vector quantization, and the quantized signals are transmitted to the bit-stream multiplexing module 55.
- Performing a frequency-domain linear prediction processing on the audio signals can effectively suppress the pre-echo and obtain greater encoding gain.
- the real signal is x(t)
- C(f) is the one-side spectrum corresponding to the positive frequency component of signal x(t), that is, the Hilbert envelope of the signal is relevant to the autocorrelation function of said signal spectrum.
- PSD ( f ) F ⁇ x ( ⁇ ) ⁇ x *( ⁇ -t ) d ⁇ ⁇ , so the square Hilbert envelope of the signal at the time-domain and the power spectrum density function of the signal at the frequency-domain are corresponding to each other.
- the encoding method based on the encoding device as shown in Fig. 9 is substanially the same as the encoding method based on the encoding device as shown in Fig. 5, and the difference therebetween is that the former has the following steps added thereto: after a multi-resolution analysis of the frequency-domain coefficients, performing a standard linear prediction analysis on the frequency-domain coefficients at each time interval to obtain the prediction gain and the prediction coefficients; determining if the prediction gain exceeds the predetermined threshold, if it does, performing a frequency-domain linear prediction error filtering on the frequency-domain coefficients based on the prediction coefficients to obtain the residual sequence; transforming the prediction coefficients into line spectrum pair frequency coefficients, and performing a multi-stage vector quantization on said line spectrum pair frequency coefficients to obtain the side information; quantizing and entropy encoding the residual sequence; and if the prediction gain does not exceed the predetermined threshold, quantizing and entropy encoding the frequency-domain coefficients.
- a standard linear prediction analysis is performed on the frequency-domain coefficients at each time interval, including calculating the autocorrelation matrix, obtaining the prediction gain and the prediction coefficients by recursively executing the Levinson-Durbin algorithm. Then it is determined whether the calculated prediction gain exceeds a predetermined threshold, if it does, a linear prediction error filtering is performed on the frequency-domain coefficients based on the prediction coefficients, otherwise, the frequency-domain coefficients are not processed and the next step is executed to quantize and entropy encode the frequency-domain coefficients.
- Linear prediction includes forward prediction and backward prediction.
- Forward prediction refers to predicting the current value by using the values before a certain moment
- the backward prediction refers to predicting the current value by using the values after a certain moment.
- the forward prediction will be used as an example to explain the linear prediction error filtering.
- the frequency-domain coefficients X ( k ) output after the time-frequency transformation can be represented by the residual sequence E ( k ) and a group of prediction coefficients a i .
- said group of prediction coefficients a i are transformed into the linear spectrum frequency LSF coefficients, and multi-stage vector quantization is performed thereon.
- the vector quantization uses the optimal distortion measurement criterion (e.g. nearest neighbor criterion) to search and calculate the code word indexes of the respective stages of code book, thereby determining the code word corresponding to the prediction coefficients and outputting the code words indexes as the side information.
- the residual sequence E ( k ) is quantized and entropy encoded.
- Fig. 10 is a schematic drawing of embodiment one of the decoding device.
- Said decoding device has an inverse frequency-domain linear prediction and vector quantization module 65 added on the basis of the decoding device as shown in Fig. 8.
- Said inverse frequency-domain linear prediction and vector quantization module 65 is between the output of the inverse quantizer bank 62 and the input of the multi-resolution integration module 63, and the bit-stream demultiplexing module 60 outputs control information of inverse frequency-domain linear prediction vector quantization thereto for inverse quantizing and inverse linear prediction filtering the inverse quantization spectrum (residual spectrum), thereby obtaining the spectrum before prediction and outputting it to the multi-resolution integration module 63.
- the technique of frequency-domain linear prediction vector quantization is used to suppress the pre-echo and to obtain greater encoding gain. Therefore, in the decoder, the inverse quantization spectrum and the control information of inverse frequency-domain linear prediction vector quantization output from the bit-stream demultiplexing module 60 are input to the inverse frequency-domain linear prediction and vector quantization module 65 to recover the spectrum be fore the linear prediction.
- the inverse frequency-domain linear prediction and vector quantization module 65 comprises an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is used for inversely quantizing the code word indexes to obtain the line spectrum pair frequency (LSF) coefficients, the inverse transformer is used for inverse transforming the line spectrum frequency (LSF) coefficients into prediction coefficients, and the inverse linear prediction filter is used for inverse filtering the inverse quantization spectrum based on the prediction coefficients to obtain the spectrum before prediction and output it to the multi-resolution integration module 63.
- the inverse vector quantizer is used for inversely quantizing the code word indexes to obtain the line spectrum pair frequency (LSF) coefficients
- the inverse transformer is used for inverse transforming the line spectrum frequency (LSF) coefficients into prediction coefficients
- the inverse linear prediction filter is used for inverse filtering the inverse quantization spectrum based on the prediction coefficients to obtain the spectrum before prediction and output it to the multi-resolution integration module 63.
- the decoding method of the decoding device as shown in Fig. 10 is substantially the same as the decoding method of the decoding device as shown in Fig. 8, and the difference is that the former further includes the steps of after obtaining the inverse quantization spectrum, determining if the control information contains information concerning that the inverse quantization spectrum needs to undergo the inverse frequency-domain linear prediction vector quantization, if it does, performing the inverse vector quantization to obtain the prediction coefficients, and performing a linear prediction synthesizing on the inverse quantization spectrum according to the prediction coefficients to obtain the spectrum before prediction; and multi-resolution integrating the spectrum before prediction.
- the residual sequence E ( k ) and the calculated prediction coefficient a i are synthesized by frequency-domain linear prediction to obtain the spectrum X(k) before prediction which is then frequency-time mapped.
- control information indicates that said signal frame has not undergone the frequency-domain linear prediction vector quantization
- the inverse frequency-domain linear prediction vector quantization will not be performed, and the inverse quantization spectrum is directly frequency-time mapped.
- Fig. 11 is the schematic drawing of the second embodiment of the encoding device of the present invention.
- said embodiment has a sum-difference stereo (M/S) encoding module 57 added between the output of the multi-resolution analyzing module 53 and the input of the quantization and entropy encoding module 54.
- the psychoacoustical analyzing module 51 calculates not only the mono masking threshold of the audio signal, but also the masking threshold of the sum-difference sound channel to be output to the quantization and entropy encoding module 54.
- the sum-difference stereo module 57 can also be located between the quantizer bank and the encoder in the quantization and entropy encoding module 54.
- the sum-difference stereo module 57 makes use of the correlation between the two sound channels in the sound channel pair to equate the freuqency-domain coefficients/residual sequence of the left-right sound channels to the freuqency-domain coefficients/residual sequence of the sum-difference sound channels, thereby reducing the code rate and improving the encoding efficiency. Hence, it is only suitable for multi-channel signals of the same signal type. While as for mono signals or multi-channel signals of different signal types, the sum-difference stereo encoding is not performed.
- the encoding method of the encoding device as shown in Fig. 11 is substantially the same as the encoding method of the encoding device as shown in Fig. 5, and the difference is that the former further includes the steps of determining whether the audio signals are multi-channel signals before quantizing and entropy encoding the frequency-domain coefficients, if they are multi-channel signals, determining whether the types of the signals of the left-right sound channels are the same, if the signal types are the same, determining whether the scale factor bands corresponding to the two sound channels meet the conditions of sum-difference stereo encoding, if they meet the conditions, performing a sum-difference stereo encoding to obtain the frequency-domain coefficients of the sum-difference sound channels; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the frequency-domain coefficients are not processed.
- the sum-difference stereo encoding can be applied not only before the quantization, but also after the quantization and before the entropy encoding, that is, after quantizing the frequency-domain coefficients, it is determined if the audio signals are multi-channel signals, if they are, it is determined if the signals of the left-right sound channels are of the same type, if the signal types are the same, it is determined if the scale factor bands corresponding to the two sound channels meet the conditions of sum-difference stereo encoding, if they meet the conditions, performing a sum-difference stereo encoding thereon; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the sum-difference stereo encoding is not performed on the frequency-domain coefficients.
- Fig. 12 is a schematic drawing of embodiment two of the decoding device.
- said decoding device has a sum-difference stereo decoding module 66 added between the output of the inverse quantizer bank 62 and the input of the multi-resolution integration module 63 to receive the result of signal type analysis and the sum-difference stereo control signal output from the bit-stream demultiplexing module 60, and to transform the inverse quantization spectrum of the sum-difference sound channels into the inverse quantization spectrum of the left-right sound channels according to said control information.
- the sum-difference control signal there is a flag bit for indicating if the present sound channel pair needs a sum-difference stereo decoding, if it needs, then there is also a flag bit on each scale factor to indicate if the corresponding scale factor needs to be sum-difference stereo decoded, and the sum-difference stereo decoding module 66 determines, on the basis of the flag bit of the scale factor band, if it is necessary to perform sum-difference stereo decoding on the inverse quantization spectrum in some of the scale factor bands. If the sum-difference stereo encoding is performed in the encoding device, then the sum-difference stereo decoding must be performed on the inverse quantization spectrum in the decoding device.
- the sum-difference stereo decodingmodule 66 can also be located between the output of the entropy decoding module 61 and the input of the inverse quantizer bank 62 to receive the sum-difference stereo control signal and the result of signal type analysis output from the bit-stream demultiplexing module 60.
- the decoding method of the decoding device as shown in Fig. 12 is substantially the same as the decoding method of the decoding device as shown in Fig. 8, and the difference is that the former further includes the followng steps: after obtaining the inverse quantization spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the inverse quantization spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the inverse quantization spectrum of the sum-difference sound channels in said scale factor band is transformed into inverse quantization spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the inverse quantization spectrum is not processed and the subsequent processing is directly performed.
- the sum-difference stereo decoding can also be performed after the entropy decoding and before the inverse quantization, that is, after obtaining the quantized values of the spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the quantized values of the spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding; if it needs, the quantized values of the spectrum of the sum-difference sound channels in said scale factor band are transformed into the quantized values of the spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the quantized values of the spectrum are not processed and the subsequent processing is directly performed.
- Fig. 13 is a schematic drawing of the structure of the third embodiment of the encoding device of the present invention.
- said embodiment has a sum-difference stereo encoding module 57 added between the output of the frequency-domain linear prediction and vector quantization module 56 and the input of the quantization and entropy encoding module 54.
- the psychoacoustical analyzing module 51 outputs the masking threshold of the sum-difference sound channels to the quantization and entropy encoding module 54.
- the sum-difference stereo encoding module 57 can also be located between the quantizer bank and the encoder in the quantization and entropy encoding module 54 to receive the result of signal type analysis output from the psychoacoustical analyzing module 51.
- the encoding method of the encoding device as shown in Fig. 13 is substantially the same as the encoding method of the encoding device as shown in Fig. 9, and the difference is that the former further includes the steps of determining whether the audio signals are multi-channel signals before quantizing and entropy encoding the frequency-domain coefficients; if they are multi-channel signals, determining whether the types of the signals of the left-right sound channels are the same; if the signal types are the same, determining whether the scale factor bands meet the encoding conditions; if they meet the conditions, performing a sum-difference stereo encoding on said scale factor bands; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the sum-difference stereo encoding is not performed.
- the sum-difference stereo encoding can be applied not only before the quantization, but also after the quantization and before the entropy encoding, that is, after quantizing the frequency-domain coefficients, it is determined if the audio signals are multi-channel signals, if they are, itisdetermined if the signals of the left-right sound channels are of the same type, if the signal types are the same, it is determined if the scale factor bands meet the encoding conditions, if they meet the conditions, performing a sum-difference stereo encoding thereon; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the sum-difference stereo encoding is not performed.
- Fig. 14 is a schematic drawing of the structure of embodiment three of the decoding device of the present invention.
- said decoding device has a sum-difference stereo decoding module 66 added between the output of the inverse quantizer bank 62 and the input of the inverse frequency-domain linear prediction and vector quantization module 65, and the bit-stream demultiplexing module 60 outputs sum-difference stereo control signal thereto.
- the sum-difference stereo decoding module 66 can also be located between the output of the entropy decoding module 61 and the input of the inverse quantizer bank 62 to receive the sum-difference stereo control signal output from the bit-stream demultiplexing module 60.
- the function and the operating principle of the sum-difference stereo decoding module 66 are the same as those shows in Fig. 10, so they will not be elaborated again.
- the decoding method of the decoding device as shown in Fig. 14 is substantially the same as the decoding method of the decoding device as shown in Fig. 10, and the difference is that the former further includes the followng steps: after obtaining the inverse quantization spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the inverse quantization spectrum according to the sum-difference stereo control signal; if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the inverse quantization spectrum of the sum-difference sound channels in said scale factor band is transformed into inverse quantization spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the inverse quantization spectrum is not processed and the subsequent processing is directly performed.
- the sum-difference stereo decoding can also be performed before the inverse quantization, that is, after obtaining the quantized values of the spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the quantized values of the spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the quantized values of the spectrum of the sum-difference sound channels in said scale factor band are transformed into the quantized value of the spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the quantized values of the spectrum are not processed and the subsequent processing is directly performed.
- Fig. 15 is the schematic drawing of the fourth embodiment of the encoding device of the present invention.
- this embodiment has a re-sampling module 590 and a frequency band spreading module 591 added, wherein the re-sampling module 590 re-samples the input audio signals to change the sampling rate thereof, and then outputs the audio signals with a changed sampling rate to the signal characteristic analyzing module 50; the frequency band spreading module 591 is used for analyzing the input audio signals on the entire frequency band to extract the spectrum envelope of the high frequency portion and the characteristics of its relationship with the low frequency portion, and to output them to the bit-stream multiplexing module 55.
- the re-sampling module 590 is used for re-sampling the input audio signals.
- the re-sampling includes up-sampling and down-sampling.
- the re-sampling is described below using down-sampling as an example.
- the re-sampling module 590 comprises a low-pass filter and a down-sampler, wherein the low-pass filter is used for limiting the frequency band of the audio signals and eliminating the aliasing that might be caused by down-sampling.
- the input audio signal is down-sampled after being low-pass filtered.
- the input audio signal is s(n)
- said signal is output as v(n) after being filtered by the low-pass filter having a pulse response of h(n)
- the sequence of an M times of down-sampling on v(n) is x(n)
- the sampling rate of the re-sampled audio signal x(n) is reduced by M times as compared to the sampling rate of the originally input audio signal s (n) .
- the original audio signals After being input to the frequency band spreading module 591, the original audio signals are analyzed on the entire frequency band to extract the spectrum envelope of the high frequency portion and the characteristics of its relationship with the low frequency portion, and to output them to the bit-stream multiplexing module 55 as the frequency band spreading control information.
- the basic principle of frequency band spreading is that with respect to most audio signals, there is a strong correlation between the characteristic of the high frequency portion thereof and the characteristic of the low frequency portion thereof, so the high frequency portions of the audio signals can be effectively reconstructed through the low frequency portions, thus the high frequency portions of the audio signals may not be transmitted. In order to ensure a correct reconstruction of the high frequency portions, only few frequency band spreading control signals need to be transmitted in the compressed audio code stream.
- the frequency band spreading module 591 comprises a parameter extracting module and a spectrum envelope extracting module. Signals are input to the parameter extracting module which extracts the parameters representing the spectrum characteristics of the input signals at different time-frequency regions, then in the spectrum envelope extracting module, the spectrum envelope of the high frequency portion of the signal is estimated at a certain time-frequency resolution. In order to ensure that the time-frequency resolution is most suitable for the characteristics of the present input signals, the time-frequency resolution of the spectrum envelope can be selected freely.
- the parameters of the spectrum characteristics of the input signals and the spectrum envelope of the high frequency portion are used as the control signal for frequency band spreading to be output to the bit-stream multiplexing module 55 for multiplexing.
- the bit-stream multiplexing module 55 receives the code stream including the common scale factor, encoded values of the scale factors, encoded values of the code book sequence numbers and the quantization spectrum of lossless encoding or the encoded values of the code word indexes output from the quantization and entropy encoding module 54 and the frequency band spreading control signal output from the frequency band spreading module 591, and then multiplexes them to obtain the compressed audio data stream.
- the encoding method based on the encoding device as shown in Fig. 15 specifically includes: analyzing the input audio signal on the entire frequency band, and extracting the high frequency spectrum envelope and the parameters of the signal spectrum characteristics as the frequency band spreading control signal; re-sampling the input audio signal and analyzing the signal type; calculating the signal-to-masking ratio of the re-sampled signal; time-frequency mapping the re-sampled signal to obtain the frequency-domain coefficients of the audio signal; quantizing and entropy encoding the frequency-domain coefficients; multiplexing the frequency band spreading control signal and the encoded audio code stream to obtain the compressed audio code stream, wherein the re-sampling includes the two steps of limiting the frequency band of the audio signal and performing a multiple down-sampling on the audio signal whose frequency band is limited.
- Fig. 16 is a schematic drawing of the structure of embodiment four of the decoding device.
- said embodiment has a frequency band spreading module 68 added, which receives the frequency band spreading control information output from the bit stream demultiplexing module 60 and the time-domain audio signal of low frequency output from the frequency-time mapping module 64, and which reconstruct the high frequency signal portion through spectrum shift and high frequency adjustment to output the wide band audio signal.
- the decoding method based on the decoding device as shown in Fig. 16 is substantially the same as the decoding method based on the decoding device as shown in Fig. 8, and the difference lies in that the former further includes the step of reconstructing the high frequency portion of the audio signal according to the frequency band spreading control information and the time-domain audio signal after obtaining the time-domain audio signal, thereby to obtain the wide band audio signal.
- Figs. 17, 19 and 21 are the fifth to the seventh embodiments of the encoding device, which respectively have a re-sampling module 590 and a frequency band spreading module 591 added thereto on the basis of the encoding devices as shown in Figs. 11, 9 and 13.
- the connection of these two modules with other modules, and the function and principle of these two modules are the same as those shown in Fig. 15, so they will not be elaborated herein.
- Figs. 18, 20 and 22 are the fifth to the seventh embodiments of the decoding device, which respectively have a frequency band spreading module 68 added thereto on the basis of the decoding devices as shown in Figs. 12, 10 and 14 to receive the frequency band spreading control information output from the bit-stream demultiplexing module 60 and the time-domain audio signals of low frequency channel output from the frequency-time mapping module 64, then the high frequency signal portion is reconstructed through frequency spectrum shift and high frequency adjustment to output audio signals of wide frequency band.
- the seven embodiments of the encoding device as described above may also include a gain control module which receives the audio signals output from the signal characteristic analyzing module 50, controls the dynamic range of the fast varying type signals, and eliminates the pre-echo in audio processing. The output thereof is connected to the time-frequency mapping module 52 and the psychoacoustical analyzing module 51, meanwhile, the amount of gain adjustment is output to the bit-stream multiplexing module 55.
- a gain control module which receives the audio signals output from the signal characteristic analyzing module 50, controls the dynamic range of the fast varying type signals, and eliminates the pre-echo in audio processing.
- the output thereof is connected to the time-frequency mapping module 52 and the psychoacoustical analyzing module 51, meanwhile, the amount of gain adjustment is output to the bit-stream multiplexing module 55.
- the gain control module controls only the fast varying type signals, while the slowly varying signals are directly output without being processed.
- the gain control module adjusts the time-domain energy envelope of the signal to increase the gain value of the signal before the fast varying point, so that the amplitudes of the time-domain signal before and after the fast varying point are close to each other; then the time-domain signals whose time-domain energy envelope are adjusted are output to the time-frequency mapping module 52, meanwhile, the amount of gain adjustment is output to the bit-stream multiplexing module 55.
- the encoding method based on said encoding device is substantially the same as the encoding method based on the above described encoding device, and the difference lies in that the former further includes the step of performing a gain control on the signal whose signal type has been analyzed.
- the seven embodiments of the encoding device as described above may also include an inverse gain control module which is located after the output of the frequency-time mapping module 64 to receive the result of signal type analysis and the information of the amount of gain adjustment output from the bit-stream demultiplexing module 60, thereby adjusting the gain of the time-domain signal and controlling the pre-echo.
- the inverse gain control module controls the fast varying type signals but leaves the slowly varying type signals unprocessed.
- the inverse gain control module adjusts the energy envelope of the reconstructed time-domain signal according to the information of the amount of gain adjustment, reduces the amplitude value of the signal before the fast varying point, and adjusts the energy envelope back to the original state of low in the front and high in the back.
- the amplitude value of the quantified noise before the fast varying point will be reduced along with the amplitude value of the signal, thereby controlling the pre-echo.
- the decoding method based on said decoding device is substantially the same as the decoding method based on the above described decoding device, and the difference lies in that the former further includes the step of performing an inverse gain control on the reconstructed time-domain signals.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- The invention relates to audio encoding and decoding, and in particular, to an enhanced audio encoding/decoding device and method based on a sensor model.
- In order to obtain Hi-Fi digital audio signals, the digital audio signals need to be audio encoded or audio compressed for storage and transmission. The object of encoding the audio signals is to realize transparent representation thereof by using as less number of bits as possible, for example, the originally input audio signals are almost the same as the output audio signals after being encoded.
- In early 1980s, CD came into existence, which reflects many advantages of representing the audio signals by digits, such as high fidelity, large dynamic range and great robustness. However, all these advantages are achieved at the cost of a very high data rate. For example, the sampling rate requested by the digitization of the stereo signal of CD quality is 44.1kHz, and each sampling rate has to be uniformly quantized by 15 bits, thus the non-compressed data rate reaches 1.41Mb/s which brings great inconvenience to the transmission and storage of data, and the transmission and storage of data are limited by the bandwidth and cost especially in the situation of multimedia application and wireless transmission application. In order to maintain high-quality audio signals, the data rate in new network and wireless multimedia digital audio system must be reduced without damaging the quality of the audio. With respect to the problem mentioned above, various audio compression techniques have been put forward that can both obtain high compression ratio and generate hi-fi audio signals, among which the typical ones are the MPEG-1/-2/-4 technique of ISO/IEC, AC-2/AC-3 technique of Dolby, ATRAC/MiniDisc/SDDS technique of Sony, and PAC/EPAC/MPAC technique of Lucent Technologies, etc. The MPEG-2 AAC technique and the AC-3 technique of Dolby are described specifically below.
- MPEG-1 and MPEG-2 BC techniques are high sound quality encoding technique mainly used for mono and stereo audio signals. With the increasing demand in the multi-channel audio encoding that achieves high encoding quality at relatively low code rate, since the MPEG-2 BC encoding technique gives emphasis to backward compatibility with the MPEG-1 technique, it is impossible to realize high sound quality encoding of five sound channels at a code rate lower than 540kbps. With respect to this shortage, the MPEG-2 AAC technique was put forward, which can realize a high quality encoding of the five channel signals at a rate of 320kbps.
- Fig. 1 is a block diagram of the MPEG-2 AAC encoder. Said encoder comprisesagaincontroller101, a
filter bank 102, atime-domainnoise shaping module 103, an intensity/coupling module 104, a psychoacoustical model, a second order backwardadaptive predictor 105, a sum-difference stereo module 106, a bit allocation andquantization encoding module 107, and a bitstream multiplexing module 108, wherein the bit allocation andquantization encoding module 107 further comprises a compression ratio/distortion processing controller, a scale factor module, a non-uniform quantizer, and an entropy encoding module. - The
filter bank 102 uses a modified discrete cosine transformation (MDCT), whose resolution is signal-adaptive, that is, an MDCT transformation of 2048 dots is used for the steady state signal, while a MDCT transformation of 256 dots is used for the transient state signal, thus for a signal sampled at 48kHz, the maximum frequency resolution is 23Hz and the maximum time resolution is 2.6ms. Meanwhile, sine window and Kaiser-Bessel window can be used in thefilter bank 102, and the sine window is used when the harmonic wave interval of the input signal is less than 140Hz, while the Kaiser-Bessel window is used when the strong component interval in the input signal is greater than 220Hz. - Audio signals enter the
filter bank 102 through thegain controller 101, and are filtered according to the different signals, then the time-domainnoise shaping module 103 processes the frequency spectrum coefficients output by thefilter bank 102. The time-domain noise shaping technique performs linear prediction analysis on the frequency spectrum coefficients in the frequency domain, then controls the shape of the quantized noise according to said analysis to thereby control the pre-echo. - The intensity/
coupling module 104 is used for stereo encoding of the signal intensity. With respect to signals of high frequency channel (greater than 2kHz), the sense of direction of audition is related to the change in the relevant signal intensity (signal envelope), but is irrelevant to the waveform of the signal, that is, a constant envelope signal has no influence on the sense of direction of audition. Therefore, this characteristic and the relevant information among multiple sound channels can be utilized to combine several sound channels into one common sound channel to be encoded, thereby forming the intensity/coupling technique. - The second order backward
adaptive predictor 105 is used for removing the redundancy of the steady state signal and improving the encoding efficiency. The sum-difference stereo (M/S)module 106 operates on sound channel pairs. The sound channel pair refers to the two sound channels of the left-right sound channels or the left-right surround sound channels in, for example, double sound channel signals or multiple sound channel signals. The M/S module 106 achieves the effect of reducing code rate and improving encoding efficiency by means of the correlation between the two sound channels in the sound channel pair. The bit allocation andquantization encoding module 107 is realized by a nested loop, wherein the non-uniform quantizer performs lossy encoding, while the entropy encoding module performs lossless encoding, thus removing redundancy and reducing correlation. The nested loop comprises inner layer loop and outer layer loop, wherein the inner layer loop adjusts the step size of the non-uniform quantizer until the provided bits are used up, and the outer layer loop estimates the encoding quality of signal by using the ratio between the quantized noise and the masking threshold. Finally, the encoded signals are formed into an encoded audio stream through the bitstream multiplexing module 108 to be output. - Under scalable sampling rate, four frequency bands of equal bandwidth are generated in the multi-phase filter bank of four frequency channels (PQF) while inputting signals, each frequency band generating 256 frequency spectrum coefficients using MDCT, resulting in altogether 1024 frequency spectrum coefficients. The
gain controller 101 is used in each frequency band. The high frequency PQF frequency band can be neglected in the decoder to obtain signals of low sampling rate. - Fig. 2 is a schematic block diagram of the corresponding MPEG-2 AAC decoder. Said decoder comprises a bit
stream demultiplexing module 201, alossless decodingmodule 202, aninverse quantizer 203, ascale factor module 204, a sum-difference stereo (M/S)module 205, aprediction module 206, an intensity/coupling module 207, a time-domainnoise shaping module 208, afilter bank 209 and again control module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain the corresponding data stream and control stream. Said signals are then decoded by thelossless decoding module 202 to obtain integer representation of the scale factors and the quantized values of signal spectrum. Theinverse quantizer 203 is a non-uniform quantizer bank realized by a companding function, which is used for transforming the integer quantized values into a reconstruction spectrum. The scale factor module in the encoder differentiates the current scale factors from the previous scale factors and performs a Huffman encoding on the differences, so thescale factor module 204 in the decoder can obtain the corresponding differences through Huffman decoding, from which the real scale factors can be recovered. The M/S module 205 converts the sum-difference sound channel into a left-right sound channel under the control of the side information. Since the second order backwardadaptive predictor 105 is used in the encoder to remove the redundancy of the steady state signal and improve the encoding efficiency, aprediction module 206 is used in the decoder for performing prediction decoding. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, then outputs to the time domainnoise shaping module 208 to perform time domain noise shaping decoding, and in the end integrated filtering is performed by thefilter bank 209, which adopts an inverse modified discrete cosine transformation (IMDCT) technique. - In the case of scalable sampling rate, the high frequency PQF frequency band of high frequency can be neglected through the
gain control module 210 so as to obtain signals of low sampling rate. - The MPEG-2 AAC encoding/decoding technique is suitable for audio signals of medium and high code rate, but it has a poor encoding quality for low code rate or very low code rate audio signals; meanwhile, this encoding/decoding technique involves a lot of encoding/decoding modules, so it is highly complex in implementation and is not easy for real-time implementation.
- Fig. 3 is a schematic drawing of the structure of the encoder using the Dolby AC-3 technique, which comprises a transient state
signal detection module 301, a modified discrete cosine transformer filter MDCT 302, a frequency spectrum envelope/index encoding module 303, amantissa encoding module 304, a forward-backwardadaptive sensing model 305, a parameterbit allocation module 306, and a bitstream multiplexing module 307. - The audio signal is determined through the transient state
signal detection module 301 to be either a steady state signal or a transient state signal. Meanwhile, the time-domain data is mapped to the frequency-domain data through the signal adaptiveMDCT filter bank 302, wherein a long window of 512 dots is applied to the steady state signal, and a pair of short windows are applied to the transient state signal. - The frequency spectrum envelope/
index encoding module 303 encodes the index portion of the signal according to the requirements of the code rate and frequency resolution in three modes, i.e. D15 encoding mode, D25 encoding mode and D45 encoding mode. The AC-3 technique uses differential encoding for the spectrum envelope in frequency, because an increment of ±2 is needed at most, each increment representing a level change of 6dB. An absolute value encoding is used for the first DC item, and differential encoding is used for the rest of the indexes. In D15 frequency spectrum envelope index encoding, each index requires about 2.33 bits, and three differential groups are encoded in a word length of 7 bits. The D15 encoding mode sacrifices the time resolution to provide refined frequency resolution. Since only relative steady signals require refined frequency resolution, and the frequency spectrums of such signals are kept relatively constant on many blocks, with respect to the steady state signals, D15 is transmitted occasionally, usually the frequency spectrum envelope of every 6 sound blocks (one data frame) is transmitted at one time. When the signal frequency spectrum is not steady, the frequency spectrum estimate needs to be frequently updated. The estimate is encoded with lower frequency resolution generally using D25 and D45 encoding modes generally. The D25 encoding mode provides the appropriate frequency resolution and time resolution, and differential encoding is performed in every other frequency coefficient, thus each index needs about 1.15 bits. If the frequency spectrum is steady on two to three blocks but changes abruptly, the D25 encoding mode can be used. The D45 encoding mode performs differential encoding in every three frequency coefficients, thus each index needs about 0.58 bit. The D45 encoding mode provides very high time resolution but low frequency resolution, so it is generally used for encoding of transient state signals. - The forward-backward
adaptive sensing model 305 is used for estimating the masking threshold of each frame of signals, wherein the forward adaptive portion is only applied to the encoder to estimate a group of optimal sensing model parameters through iterative loop under the restriction of the code rate, then said parameters are transferred to the backward adaptive portion to estimate the masking threshold of each frame. The backward adaptive portion is applied both to the encoder and the decoder. - The parameter
bit allocation module 306 analyzes the frequency spectrum envelope of the audio signals according to the masking rule to determine the number of bits allocated to each mantissa. Saidmodule 306 performs an overall bit allocation for all the sound channels by using a bit reservoir. When encoding in themantissa encoding module 304, bits are taken recurrently from the bit pool to be allocated to all sound channels. The quantization of the mantissa is adjusted according to the number of bits that can be obtained. In order to realize compressed encoding, the AC-3 encoder also uses the high frequency coupling technique, in which the high frequency portion of the coupled signal is divided into 18 sub-frequency channels according to the critical bandwidth of human ear, then some of the sound channels are selected to be coupled starting from a certain sub-band. Finally, AC-3 audio stream is formed through the bitstream multiplexing module 307 to be output. - Fig. 4 is a schematic drawing of the flow of decoding using Dolby AC-3. First, the bit stream that is encoded by AC-3 encoder is input, and data frame synchronization and error code detection are performed on the bit stream. If a data error code is detected, error code covering or muting processing is performed. Then the bit stream is de-packaged to obtain the primary information and the side information, and then index decoding is performed thereon. When performing index decoding, two pieces of side information are needed, one is the number of packaged indexes, the other is the index strategy that is adopted, such as D15, D25 or D45 mode. The decoded index and the bit allocation side information again perform the bit allocation to indicate the number of bits used by each packaged mantissa, thereby obtaining a group of bit allocation pointers, each corresponding to an encoded mantissa. The bit allocation pointers point out the quantizer for the mantissa and the number of bits occupied by each mantissa in the code stream. The single encoded mantissa value is de-quantized to be transformed into a de-quantized value, and the mantissa that occupies zero bit is recovered to zero or is replaced by a random jitter value under the control of the jitter mark. Then the de-coupling operation is carried out, which recovers the high frequency portion of the coupled sound channel, including the index and the mantissa, from the common coupling sound channel and the coupling factor. When using the 2/0 mode to encode at the encoding terminal, a matrix processing is used for a certain sub-band, then at the decoding terminal, the sum and difference sound channel value of said sub-band should be converted into the left-right sound channel value through matrix recovery. The code stream includes the dynamic range control value of each audio block. A dynamic range compression is performed on said value to change the amplitude of the coefficients, including index and mantissa. The frequency-domain coefficients are inversely transformed into time-domain samples, then the time-domain samples are processed by adding window, and adjacent blocks are superposed to reconstruct the PCM audio signal. When the number of sound channels decoded and output is less than the number of sound channels in the encoded bit stream, a down-mixing processing should be performed on the audio signal to finally output the PCM stream.
- The Dolby AC-3 encoding technique is mainly for high bit rate signals of multi-channel surround sound, but when the encoding bit rate of 5.1 sound channel is lower than 384kbps, the encoding effect is bad; besides, the encoding efficiency of stereo of mono and double sound channels is also low.
- In summary, the existing encoding and decoding techniques cannot ensure the encoding and decoding quality of audio signals of very low code rate, low code rate and high code rate and of signals of mono and dual channels, and the implementation thereof is complex.
- The technical problem to be solved by this invention is to provide an enhanced audio encoding/decoding device and method so as to overcome the low encoding efficiency and poor encoding quality with respect to the low code rate audio signals in the prior art.
- The enhanced audio encoding device of the invention comprises a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, a bit-stream multiplexing module, a signal characteristic analyzing module and a multi-resolution analyzing module. The signal characteristic analyzing module is configured to analyze the signal type of the input audio signal and output it to the psychoacoustical analyzing module and time-frequency mapping module, and to output the information of the result of signal type analysis to the bit-stream multiplexing module; the psychoacoustical analyzing module is configured to calculate a masking threshold and a signal-to-masking ratio of the audio signal, and output them to said quantization and entropy encoding module; the time-frequency mapping module is configured to convert the time-domain audio signal into frequency-domain coefficients and output them to the multi-resolution analyzing module; the multi-resolution analyzing module is configured to perform a multi-resolution analysis on the frequency-domain coefficients of signals of a fast varying type based on the signal type analysis result output from the signal characteristic analyzing module, and to output them to the quantization and entropy encoding module; the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the frequency-domain coefficients under the control of the signal-to-masking ratio as output from the psychoacoustical analyzing module and output them to the bit-streammultiplexing module; and the bit-stream multiplexing module is configured to multiplex the received data to form audio encoding code stream.
- The enhanced audio decoding device of the invention comprises a bit-stream demultiplexing module, an entropy decoding module, an inverse quantizer bank, a frequency-time mapping module, and a multi-resolution integration module. The bit-stream demultiplexing module is configured to demultiplex the compressed audio data stream and output the corresponding data signals and control signals to the entropy decoding module and the multi-resolution integration module; the entropy decoding module is configured to decode said signals, recover the quantized values of the spectrum so as to output them to the inverse quantizer bank; the inverse quantizer bank is configured to reconstruct the inverse quantization spectrum and output it to the multi-resolution integration module, the multi-resolution integration module is configured to perform multi-resolution integration on the inverse quantization spectrum and to output it to the frequency-time mapping module; and the frequency-time mapping module is configured to perform a frequency-time mapping on the spectrum coefficients to output the time-domain audio signals.
- The invention is applicable to the Hi-Fi compression encoding of audio signals with the configuration of multiple sampling rates and sound channels, and it supports audio signals with the sampling range of 8kHz to 192kHz. Meanwhile, it supports all possible sound channel configurations and supports audio encoding/decoding with a wide range of target code rate.
-
- Fig. 1 is a block diagram of the MPEG-2 AAC encoder;
- Fig. 2 is a block diagram of the MPEG-2 AAC decoder;
- Fig. 3 is a schematic drawing of the structure of the encoder using the Dolby AC-3 technique;
- Fig. 4 is a schematic drawing of the decoding flow using the Dolby AC-3 technique;
- Fig. 5 is a schematic drawing of the structure of the encoding device according to the present invention;
- Fig. 6 is a schematic drawing of the filtering structure using wavelet transformation of Harr wavelet basis;
- Fig. 7 is a schematic drawing of the time-frequency division obtained by using wavelet transformation of Harr wavelet basis;
- Fig. 8 is a schematic drawing of the structure of the decoding device according to the present invention;
- Fig. 9 is a schematic drawing of the structure of embodiment one of the encoding device according to the present invention;
- Fig. 10 is a schematic drawing of the structure of embodiment one of the decoding device according to the present invention;
- Fig. 11 is a schematic drawing of the structure of embodiment two of the encoding device according to the present invention;
- Fig. 12 is a schematic drawing of the structure of embodiment two of the decoding device according to the present invention;
- Fig. 13 is a schematic drawing of the structure of embodiment three of the encoding device according to the present invention;
- Fig. 14 is a schematic drawing of the structure of embodiment three of the decoding device according to the present invention;
- Fig. 15 is a schematic drawing of the structure of embodiment four of the encoding device according to the present invention;
- Fig. 16 is a schematic drawing of the structure of embodiment four of the decoding device according to the present invention;
- Fig. 17 is a schematic drawing of the structure of embodiment five of the encoding device according to the present invention;
- Fig. 18 is a schematic drawing of the structure of embodiment five of the decoding device according to the present invention;
- Fig. 19 is a schematic drawing of the structure of embodiment six of the encoding device according to the present invention;
- Fig. 20 is a schematic drawing of the structure of embodiment six of the decoding device according to the present invention;
- Fig. 21 is a schematic drawing of the structure of embodiment seven of the encoding device according to the present invention; Fig. 22 is a schematic drawing of the structure of embodiment seven of the decoding device according to the present invention.
- Figs. 1-4 are the schematic drawings of the structures of the encoders of the prior art, which have been introduced in the background art, so they will not be elaborated herein.
- It has to be noted that to facilitate a convenient and clear description of the present invention, the following specific embodiments of the encoding device and decoding device are described in a corresponding manner, but it is not necessary that the encoding device and the decoding device must be of one-to-one correspondence.
- As shown in Fig. 5, the audio encoding device of the present invention comprises a signal
characteristic analyzing module 50, apsychoacoustical analyzing module 51, a time-frequency mapping module 52, amulti-resolution analyzing module 53, a quantization andentropy encoding module 54, and a bit-stream multiplexing module 55. The signal characteristic analyzing module 50 is configured to analyze the signal type of the input audio signal and output the audio signal to the psychoacoustical analyzing module 51 and time-frequency mapping module 52, and to output the result of signal type analysis to the bit-stream multiplexing module 55; the psychoacoustical analyzing module 51 is configured to calculate a masking threshold and a signal-to-masking ratio of the input audio signal, and output them to the quantization and entropy encoding module 54; the time-frequency mapping module 52 is configured to convert the time-domain audio signal into frequency-domain coefficients and output them to the multi-resolution analyzing module 53; the multi-resolution analyzing module 53 is configured to perform a multi-resolution analysis on the frequency-domain coefficients of signal of a fast varying type based on the signal type analysis result output from the psychoacoustical analyzing module 51, and to output them to the quantization and entropy encoding module 54; the quantization and entropy encoding module 54 is configured to perform quantization and entropy encoding on the frequency-domain coefficients under the control of the signal-to-masking ratio output from the psychoacoustical analyzing module 51 and output them to the bit-stream multiplexing module 55; and the bit-stream multiplexing module 55 is configured to multiplex the received data to form audio encoding code stream. - The digital audio signal is analyzed as to the signal type in the signal
characteristic analyzing module 50, and the type information of the audio signal is output to the bitstream multiplexing module 55; meanwhile, the audio signal is output to thepsychoacoustical analyzing module 51 and the time-frequency mapping module 52. On the one hand, the masking threshold and the signal-to-masking ratio of this frame of audio signal are calculated in thepsychoacoustical analyzing module 51, and the signal-to-masking ratio is transmitted as a control signal to the quantization andentropy encoding module 54, and on the other hand, the time-domain audio signal is converted into frequency-domain coefficients through the time-frequency mapping module 52; themulti-resolution analyzing module 53 performs a multi-resolution analysis of the frequency-domain coefficients of the fast varying type signals so as to increase the time resolution of the fast varying type signals and to output the result to the quantization andentropy encoding module 54; under the control of the signal-to-masking ratio output from thepsychoacoustical analyzing module 51, quantization and entropy encoding are performed in the quantization andentropy encoding module 54, then the encoded data and control signal are multiplexed in the bit-stream multiplexing module 55 to form a code stream of enhanced audio encoding. - The modules that compose said audio encoding device will be described below in detail.
- The signal
characteristic analyzing module 50 is configured to analyze the signal type of the input audio signal and output the type information of the audio signal to the bit-stream multiplexing module 55, and to output the audio signal to thepsychoacoustical analyzing module 51 and time-frequency mapping module 52 at the same time. - The signal
characteristic analyzing module 50 determines if the signal is a slowly varying signal or a fast varying signal by analyzing the forward and backward masking effects based on the adaptive threshold and waveform prediction. If the signal is of a fast varying type, the relevant parameter information of the abrupt component is then calculated, such as the location where the abrupt signal occurs and the intensity of the abrupt signal, etc. - The
psychoacoustical analyzing module 51 is mainly configured to calculate a masking threshold, a signal-to-masking ratio and a sensing entropy of the input audio signal. The number of bits needed for the transparent encoding of the current signal frame can be dynamically analyzed based on the sensing entropy calculated by thepsychoacoustical analyzing module 51, thereby adjusting the bit allocation among frames. Thepsychoacoustical analyzing module 51 outputs the signal-to-masking ratio of each sub-band to the quantization andentropy encoding module 54 to control it. - The time-
frequency mapping module 52 is configured to convert the audio signal from a time-domain signal into frequency-domain coefficients, and it is formed of a filter bank which can be specifically discrete Fourier transformation (DFT) filter bank, discrete cosine transformation (DCT) filter bank, modified discrete cosine transformation (MDCT) filter bank, cosine modulated filter bank, or wavelet transformation filter bank, etc. The frequency-domain coefficients obtained from the time-frequency mapping is output to the quantization andentropy encoding module 54 to be quantized and encoded. - With respect to signals of a fast varying type, in order to effectively overcome the pre-echo produced during the encoding and to improve the encoding quality, the encoding device of the present invention increases the time resolution for the encoded fast varying signals by means of the
multi-resolution analyzing module 53. The frequency-domain coefficients output from the time-frequency mapping module 52 are input to themulti-resolution analyzing module 53. If the signal is of a fast varying type, a frequency-domain wavelet transformation or frequency-domain modified discrete cosine transformation (MDCT) is performed to obtain the multi-resolution representation for the frequency-domain coefficients to be output to the quantization andentropy encoding module 54; if the signal is of a slowly varying type, the frequency-domain coefficients are directly output to the quantization andentropy encoding module 54 without being processed. - The
multi-resolution analyzing module 53 comprises a frequency-domain coefficient transformation module and a reorganization module, wherein the frequency-domain coefficient transformation module is used for transforming the frequency-domain coefficients into time-frequency plane coefficients; and the reorganization module is used for reorganizing the time-frequency plane coefficients according to a certain rule. The frequency-domain coefficients transformation module can use the filter bank of frequency-domain wavelet transformation, the filter bank of frequency-domain MDCT transformation, etc. - The quantization and
entropy encoding module 54 further comprises a non-linear quantizer bank and an encoder, wherein the quantizer can be either a scalar quantizer or a vector quantizer. The vector quantizer can be further divided into the two categories of memoryless vector quantizer and memory vector quantizer. As for the memoryless vector quantizer, each input vector is separately quantized independent of the previous vectors; while the memory vector quantizer quantizes a vector taking into account the previous vectors, i.e. using the correlation among the vectors. Main memoryless vector quantizers include full searching vector quantizer, tree searching vector quantizer, multi-stage vector quantizer, gain/waveform vector quantizer and separate mean value vector quantizer; and the main memory vector quantizers include prediction vector quantizer and finite state vector quantizer. - If the scalar quantizer is used, the non-linear quantizer bank further comprises M sub-band quantizers. In each sub-band quantizer, the scale factor is mainly used to perform the quantization, specifically, all the frequency-domain coefficients of the sub-band of M scale factor are non-linearly compressed, then the frequency-domain coefficients of said sub-band is quantized by using the scale factors to obtain the quantization spectrum represented by an integer to be output to the encoder, The first scale factor in each frame of signal output to the bit-
stream multiplexing module 55 as the common scale factor to be , and the rest of the scale factors are output to the encoder after differential processing with respect to their respective preceding scale factors. - The scale factors in said step are constantly varying values, which are adjusted according to the bit allocation strategy. The present invention provides an overall sensing bit allocation strategy with the minimum distortion, details are as follows:
- First, each sub-band quantizer is initialized to select an appropriate scale factor, so that the quantization values of the spectrum coefficients of all the sub-bands is zero. The quantization noise of each sub-band at this time equals to the energy value thereof, and the noise-to-masking ratio NMR of each sub-band equals to its signal-to-masking ratio SMR. The number of bit consumed by the quantization is zero, and the number of remaining bits B1 equals to the number of target bits B.
- Second, the sub-band with the largest noise-to-masking ratio NMR is searched. If the noise-to-masking ratio NMR is not more than 1, the scale factor remains unchanged and the allocation result is output, thus ending the bit allocation; otherwise, the scale factor of the corresponding sub-band quantizer is reduced by one unit, then the number of bits ΔBi (Qi ) that needs to be added for said sub-band is calculated. If the number of remaining bits of said sub-band Bi≥ΔBi (Qi ), the modification of said scale factor is confirmed and the number of remaining bits Bi is subtracted by ΔBi (Qi ) to recalculate the noise masking ratio NMR of said sub-band, then continue searching for the sub-band with the largest noise-to-masking ratio NMR and repeat the subsequent steps. If the number of remaining bits Bi <ΔBi (Qi ), said modification is canceled and the previous scale factor and number of remaining bits are retained, finally, the allocation result is output and the bit allocation is ended.
- If the vector quantizer is used, the frequency-domain coefficients form a plurality of M-dimensional vectors to be input to the non-linear quantizer bank. Each M-dimensional vector is spectrum smoothed according to a smoothing factor, i.e. reducing the dynamic range of the spectrum, then the vector quantizer finds the code word from the code book that has the shortest distance from the vector to be quantized according to the subjective perception distance measure criterion, and transfers the corresponding code word index to the encoder. The smoothing factor is adjusted based on the bit allocation strategy of vector quantization, while the bit allocation strategy of vector quantization is controlled according to the priority of sensing among different sub-bands.
- After said quantization processing, the entropy encoding technique is used to further remove the statistical redundancy of the quantized coefficients and the side information. Entropy encoding is a source encoding technique, whose basic idea is allocating shorter code words to symbols that have greater probability of appearance, and allocating longer code words to symbols that have less probability of appearance, thus the average code word length is the shortest. According to Shannon noiseless encoding theorem, if the transmitted N symbols of the source messages are independent from each other, appropriate variable length encoding is used, and the average length
n of the code word satisfies - Entropy encoding is performed on the quantization spectrum quantized and output by the scalar quantizer and the differentially processed scale factors in the encoder to obtain the code book sequence numbers, the encoded values of the scale factors, and the lossless encoding quantization spectrum, then the code book sequence numbers are entropy encoded to obtain the encoded values of the code book sequence numbers, then the encoded values of the scale factors, the encoded values of the code book sequence numbers, and the lossless encoding quantization spectrum are output to the bit-
stream multiplexing module 55. - The code word indexes quantized by the vector quantizer are one-dimensional or multi-dimensional entropy encoded in the encoder to obtain the encoded values of the code word indexes, then the encoded values of the code word indexes are output to the bit-
stream multiplexing module 55. - The encoding method based on said encoder as described above includes analyzing the signal type of the input audio signal; calculating the signal-to-masking ratio of the audio signal; performing a time-frequency mapping on the audio signal to obtain the frequency-domain coefficients of the audio signal; performing multi-resolution analysis, quantization and entropy encoding on the frequency-domain coefficients; and multiplexing the result of signal type analysis and the encoded audio code stream to obtain the compressed audio code stream.
- The signal type is determined by forward and backward masking effect analysis based on the adaptive threshold and waveform prediction, and the specific steps thereof are: decomposing the input audio data into frames; decomposing the input frames into a plurality of sub-frames and searching for the local extremal vertexes of the absolute values of the PCM data on each sub-frame; selecting the sub-frame peak value from the local extremal vertexes of the respective sub-frames; for a certain sub-frame peak value, predicting the typical sample value of a plurality of (typically four) sub-frames that are forward delayed with respect to said sub-frame by means of a plurality of (typically three) sub-frame peak values before said sub-frame; calculating the difference and ratio between said sub-frame peak value and the predicted typical sample value; if the predicted difference and ratio are both larger than the predetermined thresholds, determining that said sub-frame has jump signal and confirming that said sub-frame has the local extremal vertex with the capability of backward masking pre-echo, if there is a sub-frame between the front end of said sub-frame and the position that is 2.5ms before the masking vertex, whose peak value is small enough, determining that said frame of signal is a fast varying type signal; if the predicted difference and ratio are not larger than the predetermined thresholds, repeating the above steps until it is determined that said frame of signal is a fast varying type signal or until reaching the last sub-frame; if it is still not determined whether said frame of signal is a fast varying type signal when the last sub-frame has been reached, said frame of signal is a slowly varying type signal.
- There are many methods for performing a time-frequency transformation of the time-domain audio signals, such as discrete Fourier transformation (DFT), discrete cosine transformation (DCT),modified discrete cosine transformation (MDCT), cosine modulation filter bank, wavelet transformation, etc. The modified discrete cosine transformation MDCT and cosine modulation filtering are taken as examples to illustrate the process of time-frequency mapping.
- With respect to using modified discrete cosine transformation MDCT to perform the time-frequency transformation, the time-domain signals of M samples from the previous frame and the time domain signals of M samples of the present frame are selected first, then a window adding operation is performed on the altogether 2M samples of these two frames, finally, MDCT transformation is performed on the window added signals to obtain M frequency-domain coefficients.
-
-
- In practice, Sine window can be used as the window function. Of course, said limitation to the window function can be modified by using double orthogonal transformation with specific analysis filter and synthesis filter.
- With respect to using cosine modulation filtering to perform the time-frequency transformation, the time-domain signals of M samples from the previous frame and the time domain signals of M samples of the present frame are selected first, then a window adding operation is performed on the altogether 2M samples of these two frames, finally, cosine modulation filtering is performed on the window added signals to obtain M frequency-domain coefficients.
-
- Suppose that the impact response length of the analysis window (analysis prototype filter) Pa(n) of M sub-bands cosine modulation filter bank is Na, and the impact response length of integrated window (integrated prototype filter) Ps(n) is Ns. When the analysis window equals to the integrated window, i. e. Pa(n) = Ps(n) and Na=Ns, the cosine modulation filter bank represented by the above two formulae is an orthogonal filter bank, and matrixes H and F ([H]n, k = hk(n), [F]n,k= fk(n)) are orthogonal transformation matrixes. In order to obtain linear phase filter bank, it is further specified that the symmetrical windows satisfy Pa(2KM-1-n) =Pa(n). In order to ensure the complete reconstruction of the orthogonal and double orthogonal systems, the window function further needs to satisfy certain conditions, and details can be found in the document "Multirate Systems and Filter Banks", P. P. Vaidynathan, Prentice Hall, Englewood Cliffs, NJ, 1993.
- The calculation of the masking threshold and signal-to-masking ratio of the re-sampled signal includes the following steps:
- Step 1: mapping the signal from time-domain to frequency-domain. Fast Fourier transformation and Hanning window techniques can be used to transform the time-domain data into frequency-domain coefficient X[k], X[k] is represented by amplitude r[k] and phase ϕ[k] as X[k] = r[k]ejϕ[k]. Then the energy e[b] of each sub-band is the sum of all the spectrum lines within said sub-band, i.e.
- Step 2: determining the tone and non-tone components in the signal. The tonality of signal is estimated by performing inter-frame prediction on each spectrum line. The Euclidean distances of the prediction value and real value of each spectrum line are mapped into unpredictable measure, spectrum component of high predictability is considered as having strong tonality, while the spectrum component of low predictability is considered as quasi-noise.
The amplitude rpred and phaseϕpred can be represented by the following equations:
The unpredictable measure c[k] is calculated by the equation of
The tonality t[b] of the sub-band can be calculated according to the normalized unpredictability spread c̃s [b] , i.e. t[b] = -0.299-0.43loge(c̃s [b]) and 0≤t[b]≤1. When t[b] =1, said sub-band signal is pure tone, and when t[b] =0, said sub-band signal is white noise. - Step 3: calculating the signal-to-noise ratio (SNR) needed for each sub-band. The value of the noise-masking-tone (NMT) of all the sub-bands is set to be 5dB, and the value of the tone-masking-noise (TMN) is set to be 18dB. If the noise is to be made imperceptible, the signal-to-noise ratio SNR[b] of each sub-band should be SNR[b] =18t[b] +6(1-t[b]).
- Step 4: calculating the masking threshold of each sub-band and the sensing entropy of the signal. The noise energy threshold n[b] of each sub-band is calculated to be n[b] = ẽs[b] 10-SNR[b]/10 based on the normalized signal energy of each sub-band and the needed signal-to-noise ratio SNR as obtained in the above steps.
In order to avoid the influence of the pre-echo, the noise energy threshold n[b] of the present frame is compared to the noise energy threshold nprev[b] of the previous frame, and the masking threshold of the signal is obtained to be n[b]=min(n[b], 2nprev[b]), thereby ensuring that there will not be any deviation in the masking threshold owing to the generation of high-energy impact at the near end of the analysis window.
Further, while taking into account the influence of the still masking threshold qsthr[b], the final masking threshold of the signal is selected to be the larger one of the still masking threshold and said calculated masking threshold, i.e. n[b] = max(n[b], qsthr[b]). Then the sensing entropy is calculated by the equation of - Step 5: calculating the signal-to-masking ratio (SMR) of each sub-band signal. The signal-to-masking ratio SMR[b] of eawch sub-band is
- Then, a multi-resolution analysis is performed on the frequency-domain coefficients. The
multi-resolution analyzing module 53 re-organizes the time-frequency domain of the input frequency-domain data to improve the time resolution of the frequency-domain data at the cost of reducing the frequency precision, thereby to automatically adapt to the time-frequency characteristic of the fast varying type signals and to suppress the pre-echo without adjusting the form of the filter bank in the time-frequency mapping module 52. - The multi-resolution analysis includes the two steps of frequency-domain coefficient transformation and reorganization, wherein the frequency-domain coefficients are transformed into time-frequency plane coefficients through frequency-domain coefficient transformation, and the time-frequency plane coefficients are grouped by reorganization according to a certain rule.
- The process of multi-resolution analysis is described below by taking frequency-domain wavelet transformation and frequency-domain MDCT transformation as examples.
- Suppose that the time series is x(i), i=0, 1, ...2M-1, the frequency-domain coefficients obtained through time-frequency mapping is X(k), k=0, 1, ... M-1. The frequency-domain wavelet or the wavelet basis of wavelet package transformation may either be fixed or adaptive.
- The multi-resolution analysis on the frequency-domain coefficients is illustrated below by taking the simplest wavelet transformation of Harr wavelet basis as an example.
- The scale coefficient of Harr wavelet basis is
- The above-mentioned time-frequency plane coefficients are reorganized in the reorganizationmodule according to a certain rule, for example, the time-frequency plane coefficients can be organized in the frequency direction first, and the coefficients in each frequency band are organized in the time direction, then the organized coefficients are arranged in the order of sub-window and scale factor band.
- Suppose that the frequency-domain data input to the filter bank of the input frequency-domain MDCT transformation is X(k), k= 1, 1, ..., N-1, M-dot MDCT transformation is performed on said N dot frequency-domain data sequentially, so that the frequency precision of the time frequency domain data is reduced, while the time precision is increased. Frequency-domain MDCT transformations of different lengths are used in different frequency-domain ranges, thereby to obtain different time-frequency plane divisions, i.e. different time and frequency precision. The reorganization module reorganizes the time-frequency domain data output from the filter bank of the frequency-domain MDCT transformation. One way of reorganization is to organize the time-frequency plane coefficients in the frequency direction first, and the coefficients in each frequency band are organized in the time direction at the same time, then the organized coefficients are arranged in the order of sub-window and scale factor band.
- Quantization and entropy encoding further include the two steps of non-linear quantization and entropy encoding, wherein the quantization can be scalar quantization or vector quantization.
- The scalar quantization comprises the steps of non-linearly compressing the frequency-domain coefficients in all the scale factor bands; using the scale factor of each sub-band to quantize the frequency-domain coefficients of said sub-band to obtain the quantization spectrum represented by an integer; selecting the first scale factor in each frame of signal as the common scale factor; and differentiating the rest of the scale factors from their respective previous scale factor.
- The vector quantization comprises the steps of forming a plurality of multi-dimensional vector signals with the frequency-domain coefficients; performing spectrum smoothing for each M-dimensional vector according to the smoothing factor; searching for the code word from the code book that has a shortest distance from the vector to be quantized according to the subjective perception distance measure criterion to obtain the code word index.
- The entropy encoding step comprises entropy encoding the quantization spectrum and the differentiated scale factors to obtain the sequence numbers of the code book, the encoded value of the scale factors and the quantization spectrum of lossless encoding; and entropy encoding the sequence numbers of the code book to obtain the encoded values thereof.
- Or, a one-dimensional or multi-dimensional entropy encoding is performed on the code word indexes to obtain the encoded values of the code word indexes.
- Said entropy encoding method can be any one of the existing Huffman encoding, arithmetic encoding or run length encoding method.
- After quantization and entropy encoding, the encoded audio code stream is obtained, which is multiplexed together with the common scale factor and the result of signal type analysis to obtain the compressed audio code stream.
- Fig. 8 is a schematic drawing of the structure of the audio decoding device according to the present invention. The audio decoding device comprises a bit-
stream demultiplexing module 60, anentropy decoding module 61, aninverse quantizer bank 62, amulti-resolution integration module 63 and a frequency-time mapping module 64. The compressed audio code stream is demultiplexed by the bit-stream demultiplexing module 60 to obtain the corresponding data signal and control signal which are output to theentropy decoding module 61 and themulti-resolution integration module 63; the data signal and control signal are decoded in theentropy decoding module 61 to recover the quantized values of the spectrum. Said quantized values are reconstructed in theinverse quantizer bank 62 to obtain the inversely quantized spectrum, the inversely quantized spectrum is then output to themulti-resolution integration module 63 and is output to the frequency-time mapping module 64 after a multi-resolution integration, then the audio signal of time-domain is obtained through frequency-time mapping. - The bit-
stream demultiplexing module 60 decomposes the compressed audio code stream to obtain the corresponding data signal and control signal and to provide the corresponding decoding information for other modules. The compressed audio data stream is demultiplexed to output signals to theentropy decoding module 61, said signals including the common scale factor, the scale factor encoded values, the encoded values of the code book sequence number, and the quantized spectrum of the lossless encoding, or the encoded values of the code word indexes, and to output the information of the signal type to themulti-resolution integration module 63. - If, in the encoding device, the quantization and
entropy encoding module 54 uses the scalar quantizer, then in the decoding device, what theentropy decoding module 61 receives are the common scale factor, the scale factor encoded value, the encoded values of the code book sequence numbers, and the quantized spectrum of the lossless encoding output from the bit-stream demultiplexing module 60, then code book sequence number decoding, spectrum coefficient decoding and scale factor decoding are performed thereon to reconstruct the quantized spectrum and to output the integer representation of the scale factors and the quantized values of the spectrum to theinverse quantizer bank 62. The decoding method used by theentropy decoding module 61 corresponds to the encoding method used by entropy encoding in the encoding device, which is, for example, Huffman decoding, arithmetic decoding or run length decoding, etc. - Upon receipt of the quantized values of the spectrum and the integer representation of the scale factors, the
inverse quantizer bank 62 inversely quantizes the quantized values of the spectrum into reconstructed spectrum without scaling (inverse quantization spectrum), and outputs the inverse quantization spectrum to themulti-resolution integration module 63. Theinverse quantizer bank 62 can be either a uniform quantizer bank or a non-uniform quantizer bank realized by a companding function. In the encoding device, the quantizer bank uses the scalar quantizer, so in the decoding device, theinverse quantizer bank 62 also uses the scalar inverse quantizer. In the scalar inverse quantizer, the quantized values of the spectrum are non-linearly expanded first, then all the spectrum coefficients (inverse quantization spectrum) in the corresponding scale factor band are obtained by using each scale factor. - If the quantization and
entropy encoding module 54 uses the vector quantizer, then in the decoding device, theentropy decoding module 61 receives the encoded values of the code word indexes output from the bit-stream demultiplexing module 60, and decodes the encoded values of the code word indexes by the entropy decoding method corresponding to the entropy encoding method used in entropy encoding, thereby obtaining the corresponding code word index. - The code word indexes are output to the
inverse quantizer bank 62, and by looking up the code book, the quantized values (inverse quantization spectrum) are obtained and are output to themulti-resolution integration module 63. Theinverse quantizer bank 62 uses the inverse vector quantizer. After a multi-resolution integration, the inverse quantization spectrum is mapped by the frequency-time mapping module 64 to obtain the time-domain audio signal. The frequency-time mapping module 64 can be a filter bank of inverse discrete cosine transformation (IDCT), a filter bank of inverse discrete Fourier transformation (IDFT), a filter bank of inverse modified discrete cosine transformation (IMDCT), a filter bank of inverse wavelet transformation, and a cosine modulation filter bank, etc. - The decoding method based on the above-mentioned decoder comprises: demultiplexing the compressed audio code stream to obtain the data information and control information; entropy decoding said information to obtain the quantized values of the spectrum; inversely quantizing the quantized values of the spectrum to obtain the inverse quantization spectrum; multi-resolution integrating the inverse quantization spectrum and then performing a frequency-time mapping thereon to obtain the time-domain audio signal.
- If the demultiplexed information includes the encoded values of the code book sequence numbers, the common scale factor, the encoded values of the scale factors, and the quantization spectrum of the lossless encoding, then the spectrum coefficients in the encoding device are quantized by the scalar quantization technique. Accordingly, the entropy decoding steps include: decoding the encoded values of the code book sequence numbers to obtain the code book sequence numbers of all the scale factor bands; decoding the quantization coefficients of all the scale factor bands according to the code book corresponding to the code book sequence numbers; and decoding the scale factors of all the scale factor bands to reconstruct the quantization spectrum. The entropy decoding method used in said process corresponds to the entropy encoding method used in the encoding method, which is, for example, run length decoding method, Huffman decoding method, or arithmetic decoding method, etc.
- The entropy decoding process is described below by using as examples the decoding of the code book sequence number by the run length decoding method, the decoding of the quantization coefficients by the Huffman decoding method, and the decoding of the scale factor by the Huffman decoding method.
- First, the code book sequence numbers of all the scale factor bands are obtained through the run length decoding method. The decoded code book sequence numbers are integers within a certain range. Suppose that said range is [0, 11], then only the code book sequence numbers within said valid range, i.e. between 0-11, are corresponding to the Huffman code book of the spectrum coefficients. As for the all-zero sub-band, a certain code book sequence can be selected to correspond to it, typically, the 0 sequence number can be selected.
- When the code book number of the respective scale factor band is obtained through decoding, the Huffman code book of spectrum coefficients corresponding to said code book number is used to decode the quantization coefficients of all the scale factor bands. If the code book number of a scale factor band is within the valid range, for example between 1-11 in this embodiment, then said code book number corresponds to a spectrum coeff icient code book, and said code book is used to decode the quantization spectrum to obtain the code word indexes of the quantization coefficients of the scale factor bands, subsequently, the code word indexes are de-packaged to obtain the quantization coefficients. If the code book number of the scale factor band is not between 1 and 11, then said code book number is not corresponding to any spectrum coefficient code book, and the quantization coefficients of said scale factor band do not need to be decoded, but they are all directly set to be zero.
- The scale factors are used to reconstruct the spectrum values on the basis of the inverse quantization spectrum coefficients. If the code book number of the scale factor band is within the valid range, each code book number corresponds to a scale factor. When decoding said scale factors, the code stream occupied by the first scale factor is read first, then the rest of the scale factors are Huffman decoded to obtain the differences between each of the scale factors and their respective previous scale factors, and said differences are added to the valuse of the previous scale factors to obtain the respective scale factors. If the quantization coefficients of the present sub-band are all zero, then the scale factors of said sub-band do not have to be decoded.
- After said entropy decoding, the quantized values of the spectrum and the integer representation of the scale factors are obtained, then the quantized values of the spectrum are inversely quantized to obtain the inverse quantization spectrum. The inverse quantization processing includes non-linear expanding the quantized values of the spectrum, and obtaining all the spectrum coefficients (inverse quantization spectrum) in the corresponding scale factor band according to each scale factor.
- If the demultiplexed information contains the encoded values of the code word indexes, it means that the encoding device uses the vector quantization technique to quantize the spectrum coefficients, then the entropy decoding steps include: decoding the encoded values of the code word indexes by means of the entropy decoding method corresponding to the entropy encoding method used in the encoding device so as to obtain the code word indexes, then inversely quantizing the code word indexes to obtain the inverse quantization spectrum.
- With respect to the inverse quantization spectrum, if it is a fast varying type signal, the frequency-domain coefficients are multi-resolution analyzed, then the multi-resolution representation of the frequency-domain coefficients is quantized and entropy encoded; if it is not a fast varying type signal, the frequency-domain coefficients are directly quantized and entropy encoded.
- The multi-resolution integration can use frequency-domain wavelet transformation method or frequency-domain MDCT transformation method. The frequency-domain wavelet integration method includes: reorganizing said time-frequency plane coefficients according to a certain rule; performing wavelet transformation on the frequency-domain coefficients to obtain the time-frequency plane coefficients. The MDCT transformation includes: reorganizing said time-frequency plane coefficients according to a certain rule, and then performing several times of MDCT transformation on the frequency-domain coefficients to obtain the time-frequency plane coefficients. The reorganization method includes: organizing the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, then the organized coefficients are arranged in the order of sub-window and scale factor band.
- The method of performing a frequency-time mapping on the frequency-domain coefficients corresponds to the time-frequency mapping method in the encoding method, which can be inverse discrete cosine transformation (IDCT), inverse discrete Fourier transformation (IDFT), inverse modified discrete cosine transformation (IMDCT), and inverse wavelet transformation, etc.
- The frequency-time mapping process is illustrated below by taking inverse modified discrete cosine transformation IMDCT as an example. The frequency-time mapping process includes three steps: IMDCT transformation, time-domain window adding processing and time-domain superposing operation.
- First, IMDCT transformation is perform on the spectrum before prediction or the inverse quantization spectrum to obtain the transformed time-domain signal xi,n . The expression of IMDCT transformation is
wherein, n is the sequence number of the sample, and 0≤n<N, N represents the number of time-domain samples which is 2048, n 0=(N/2+1)/2; i represents the frame sequence number; k represents the spectrum sequence number. - Second, window adding is performed on the time-domain signal obtained from IMDCT transformation at the time domain. In order to satisfy the requirement for complete reconstruction, the window function w (n) must meet the two conditions of w(2M-1-n) = w(n) and w2 (n) +w2 (n+M) =1.
- Typical window functions include, among others, Sine window and Kaiser-Bessel window. The present invention uses a fixed window function, which is w(N+k) =cos(pi/2*((k+0.5)/N-0.94*sin (2*pi/N*(k+0.5))/(2*pi))), wherein k=1...N-1; w(k) represents the kth coefficient of the window function and w(k) = w (2*N-1-k); N represents the number of samples of the encoded frame, and N=1024. In addition, said restriction to the window function can be modified by using double orthogonal transformation with a specific analysis filter and synthesis filter.
- Finally, the window added time-domain signal is superposed to obtain the time-domain audio signal. Specifically, the first N/2 samples of the signals obtained by the window adding are superposed with the last N/2 samples of the previous frame of signal to obtain N/2 output time-domain audio samples, i.e., timeSami,n = preSami,n +preSami-1, n+N/2, wherein i denotes the frame sequence number, n denotes the sample sequence number,
- Fig. 9 is a schematic drawing of the first embodiment of the encoding device of the present invention. On the basis of Fig. 5, this embodiment has a frequency-domain linear prediction and
vector quantization module 56 added between the output of themulti-resolution analyzing module 53 and the input of the quantization andentropy encoding module 54 for outputting the residual sequence to the quantization andentropy encoding module 54, and for outputting the quantized code indexes as the side information to the bit-stream multiplexing module 55. - After a multi-resolution analysis of the frequency-domain coefficients, time-frequency coefficients having specific time-frequency plane division are obtained, so the frequency-domain linear prediction and
vector quantization module 56 needs to perform linear prediction and multi-stage vector quantization for the frequency-domain coefficients at each time interval. - The frequency-domain coefficents output from the
multi-resolution analyzing module 53 are transmitted to the frequency-domain linear prediction andvector quantization module 56. After a multi-resolution analysis of the frequency-domain coefficients, standard linear prediction analysis is performed on the frequency-domain coefficients at each time interval. If the prediction gain meets the given condition, linear prediction error filtering is performed on the frequency-domain coefficients, and the resulted prediction coefficients are transformed into line spectrum frequency LSF coefficients, then the optimal distortion measurement criterion is used to search and calculate the the code word indexes for the respective code book, and the code word indexes are used as side information to be transferred to the bit-stream multiplexing module 55, while the residual sequence obtained through prediction analysis is output to the quantization andentropy encoding module 54. - The frequency-domain linear prediction and
vector quantization module 56 consists of a linear prediction analyzer, a linear prediction filter, a transformer, and a vector quantizer. Frequency-domain coefficients are input to the linear prediction analyzer for prediction analysis to obtain the prediction gain and prediction coefficients. The frequency-domain coefficients that meet a certain condition are output to the linear prediction filter to be filtered, and a residual sequence is obtained thereby; the residual sequence is directly output to the quantization andentropy encoding module 54, while the prediction coefficients are transformed into line spectrum frequency LSF coefficients through the transformer, then the LSF parameters are sent to the vector quantizer for a multi-stage vector quantization, and the quantized signals are transmitted to the bit-stream multiplexing module 55. - Performing a frequency-domain linear prediction processing on the audio signals can effectively suppress the pre-echo and obtain greater encoding gain. Given that the real signal is x(t), and the square Hilbert envelope e (t) thereof is e (t) = e(t) = F-1{∫C(ξ)·C •(ξ-ƒ)dξ}, wherein C(f) is the one-side spectrum corresponding to the positive frequency component of signal x(t), that is, the Hilbert envelope of the signal is relevant to the autocorrelation function of said signal spectrum. The relationship between the power spectrum density function of the signal and the autocorrelation function of the time-domain waveform thereof is PSD(f) = F{∫x(τ)·x*(τ-t)dτ}, so the square Hilbert envelope of the signal at the time-domain and the power spectrum density function of the signal at the frequency-domain are corresponding to each other. It can be seen that with respect to some of the band-pass signals in each predetermined range, if the Hilbert envelope thereof is constant, the autocorrelation of the adjacent spectrum values is also constant, implying that the sequence of the spectrum coefficients is a steady state sequence with respect to the frequency, thus the prediction encoding technique can be used to process the spectrum values and a group of common prediction coefficients can be used to effectively represent said signal.
- The encoding method based on the encoding device as shown in Fig. 9 is substanially the same as the encoding method based on the encoding device as shown in Fig. 5, and the difference therebetween is that the former has the following steps added thereto: after a multi-resolution analysis of the frequency-domain coefficients, performing a standard linear prediction analysis on the frequency-domain coefficients at each time interval to obtain the prediction gain and the prediction coefficients; determining if the prediction gain exceeds the predetermined threshold, if it does, performing a frequency-domain linear prediction error filtering on the frequency-domain coefficients based on the prediction coefficients to obtain the residual sequence; transforming the prediction coefficients into line spectrum pair frequency coefficients, and performing a multi-stage vector quantization on said line spectrum pair frequency coefficients to obtain the side information; quantizing and entropy encoding the residual sequence; and if the prediction gain does not exceed the predetermined threshold, quantizing and entropy encoding the frequency-domain coefficients.
- After a multi-resolution analysis of the frequency-domain coefficients, a standard linear prediction analysis is performed on the frequency-domain coefficients at each time interval, including calculating the autocorrelation matrix, obtaining the prediction gain and the prediction coefficients by recursively executing the Levinson-Durbin algorithm. Then it is determined whether the calculated prediction gain exceeds a predetermined threshold, if it does, a linear prediction error filtering is performed on the frequency-domain coefficients based on the prediction coefficients, otherwise, the frequency-domain coefficients are not processed and the next step is executed to quantize and entropy encode the frequency-domain coefficients.
- Linear prediction includes forward prediction and backward prediction. Forward prediction refers to predicting the current value by using the values before a certain moment, while the backward prediction refers to predicting the current value by using the values after a certain moment. The forward prediction will be used as an example to explain the linear prediction error filtering. The transfer function of the linear prediction error filter is
- Thus, after linear prediction error filtering, the frequency-domain coefficients X(k) output after the time-frequency transformation can be represented by the residual sequence E(k) and a group of prediction coefficients ai. Then said group of prediction coefficients ai are transformed into the linear spectrum frequency LSF coefficients, and multi-stage vector quantization is performed thereon. The vector quantization uses the optimal distortion measurement criterion (e.g. nearest neighbor criterion) to search and calculate the code word indexes of the respective stages of code book, thereby determining the code word corresponding to the prediction coefficients and outputting the code words indexes as the side information. Meanwhile, the residual sequence E(k) is quantized and entropy encoded. It can be seen from the encoding principle of linear prediction analysis that the dynamic range of the residual sequence of the spectrum coefficients is smaller than that of the original spectrum coefficients, so less number of bits are allocated thereto during quantization, or under the condition of same number of bits, improved encoding gain can be obtained.
- Fig. 10 is a schematic drawing of embodiment one of the decoding device. Said decoding device has an inverse frequency-domain linear prediction and
vector quantization module 65 added on the basis of the decoding device as shown in Fig. 8. Said inverse frequency-domain linear prediction andvector quantization module 65 is between the output of theinverse quantizer bank 62 and the input of themulti-resolution integration module 63, and the bit-stream demultiplexing module 60 outputs control information of inverse frequency-domain linear prediction vector quantization thereto for inverse quantizing and inverse linear prediction filtering the inverse quantization spectrum (residual spectrum), thereby obtaining the spectrum before prediction and outputting it to themulti-resolution integration module 63. - In the encoder, the technique of frequency-domain linear prediction vector quantization is used to suppress the pre-echo and to obtain greater encoding gain. Therefore, in the decoder, the inverse quantization spectrum and the control information of inverse frequency-domain linear prediction vector quantization output from the bit-
stream demultiplexing module 60 are input to the inverse frequency-domain linear prediction andvector quantization module 65 to recover the spectrum be fore the linear prediction. - The inverse frequency-domain linear prediction and
vector quantization module 65 comprises an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is used for inversely quantizing the code word indexes to obtain the line spectrum pair frequency (LSF) coefficients, the inverse transformer is used for inverse transforming the line spectrum frequency (LSF) coefficients into prediction coefficients, and the inverse linear prediction filter is used for inverse filtering the inverse quantization spectrum based on the prediction coefficients to obtain the spectrum before prediction and output it to themulti-resolution integration module 63. - The decoding method of the decoding device as shown in Fig. 10 is substantially the same as the decoding method of the decoding device as shown in Fig. 8, and the difference is that the former further includes the steps of after obtaining the inverse quantization spectrum, determining if the control information contains information concerning that the inverse quantization spectrum needs to undergo the inverse frequency-domain linear prediction vector quantization, if it does, performing the inverse vector quantization to obtain the prediction coefficients, and performing a linear prediction synthesizing on the inverse quantization spectrum according to the prediction coefficients to obtain the spectrum before prediction; and multi-resolution integrating the spectrum before prediction.
- After obtaining the inverse quantization spectrum, it is determined if said frame of signal has undergone the frequency-domain linear prediction vector quantization according to the control information, if it has, the code word indexes resulted from the vector quantization of the prediction coefficients are obtained from the control information; then the quantized line spectrum frequency (LSF) coefficients are obtained according to the code word index, on the basis of which the prediction coefficients are calculated; subsequently, a linear prediction synthesizing is performed on the inverse quantization spectrum to obtain the spectrum before prediction.
-
- Thus the residual sequence E(k) and the calculated prediction coefficient ai are synthesized by frequency-domain linear prediction to obtain the spectrum X(k) before prediction which is then frequency-time mapped.
- If the control information indicates that said signal frame has not undergone the frequency-domain linear prediction vector quantization, the inverse frequency-domain linear prediction vector quantization will not be performed, and the inverse quantization spectrum is directly frequency-time mapped.
- Fig. 11 is the schematic drawing of the second embodiment of the encoding device of the present invention. On the basis of Fig. 5, said embodiment has a sum-difference stereo (M/S) encoding
module 57 added between the output of themulti-resolution analyzing module 53 and the input of the quantization andentropy encoding module 54. With respect to multi-channel signals, thepsychoacoustical analyzing module 51 calculates not only the mono masking threshold of the audio signal, but also the masking threshold of the sum-difference sound channel to be output to the quantization andentropy encoding module 54. The sum-difference stereo module 57 can also be located between the quantizer bank and the encoder in the quantization andentropy encoding module 54. - The sum-
difference stereo module 57 makes use of the correlation between the two sound channels in the sound channel pair to equate the freuqency-domain coefficients/residual sequence of the left-right sound channels to the freuqency-domain coefficients/residual sequence of the sum-difference sound channels, thereby reducing the code rate and improving the encoding efficiency. Hence, it is only suitable for multi-channel signals of the same signal type. While as for mono signals or multi-channel signals of different signal types, the sum-difference stereo encoding is not performed. - The encoding method of the encoding device as shown in Fig. 11 is substantially the same as the encoding method of the encoding device as shown in Fig. 5, and the difference is that the former further includes the steps of determining whether the audio signals are multi-channel signals before quantizing and entropy encoding the frequency-domain coefficients, if they are multi-channel signals, determining whether the types of the signals of the left-right sound channels are the same, if the signal types are the same, determining whether the scale factor bands corresponding to the two sound channels meet the conditions of sum-difference stereo encoding, if they meet the conditions, performing a sum-difference stereo encoding to obtain the frequency-domain coefficients of the sum-difference sound channels; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the frequency-domain coefficients are not processed.
- The sum-difference stereo encoding can be applied not only before the quantization, but also after the quantization and before the entropy encoding, that is, after quantizing the frequency-domain coefficients, it is determined if the audio signals are multi-channel signals, if they are, it is determined if the signals of the left-right sound channels are of the same type, if the signal types are the same, it is determined if the scale factor bands corresponding to the two sound channels meet the conditions of sum-difference stereo encoding, if they meet the conditions, performing a sum-difference stereo encoding thereon; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the sum-difference stereo encoding is not performed on the frequency-domain coefficients.
- There are many methods for determining whether a sum-difference stereo encoding can be performed on the scale factor band, and the one used in the present invention is K-L transformation. The specific process of determination is as follows:
- Suppose that the spectrum coefficient of the scale factor band of the left sound channel is l(k), and the spectrum coefficient of the corresponding scale factor band of the right sound channel is r(k), the correlation matrix thereof is
The K-L transformation is performed on the correlation matrix C to obtain - The rotation a satisfies the equation of
- If the sum-difference stereo encoding is applied before the quantization, the frequency-domain coefficients of the left-right sound channels at the scale factor band are linearly transformed and are replaced with the frequency-domain coefficients of the sum-difference sound channels:
- If the sum-difference stereo encoding is applied after the quantization, the quantized frequency-domain coefficients of the left-right sound channels at the scale factor band are linearly transformed and are replaced with the frequency-domain coefficients of the sum-difference sound channels:
- Putting the sum-difference stereo encoding after the quantization can effectively eliminate the correlation between the left-right sound channels, meanwhile, a lossless encoding can be realized since the encoding is after the quantization.
- Fig. 12 is a schematic drawing of embodiment two of the decoding device. On the basis of the decoding device of Fig. 8, said decoding device has a sum-difference
stereo decoding module 66 added between the output of theinverse quantizer bank 62 and the input of themulti-resolution integration module 63 to receive the result of signal type analysis and the sum-difference stereo control signal output from the bit-stream demultiplexing module 60, and to transform the inverse quantization spectrum of the sum-difference sound channels into the inverse quantization spectrum of the left-right sound channels according to said control information. - In the sum-difference control signal, there is a flag bit for indicating if the present sound channel pair needs a sum-difference stereo decoding, if it needs, then there is also a flag bit on each scale factor to indicate if the corresponding scale factor needs to be sum-difference stereo decoded, and the sum-difference
stereo decoding module 66 determines, on the basis of the flag bit of the scale factor band, if it is necessary to perform sum-difference stereo decoding on the inverse quantization spectrum in some of the scale factor bands. If the sum-difference stereo encoding is performed in the encoding device, then the sum-difference stereo decoding must be performed on the inverse quantization spectrum in the decoding device. - The sum-
difference stereo decodingmodule 66 can also be located between the output of theentropy decoding module 61 and the input of theinverse quantizer bank 62 to receive the sum-difference stereo control signal and the result of signal type analysis output from the bit-stream demultiplexing module 60. - The decoding method of the decoding device as shown in Fig. 12 is substantially the same as the decoding method of the decoding device as shown in Fig. 8, and the difference is that the former further includes the followng steps: after obtaining the inverse quantization spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the inverse quantization spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the inverse quantization spectrum of the sum-difference sound channels in said scale factor band is transformed into inverse quantization spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the inverse quantization spectrum is not processed and the subsequent processing is directly performed.
- The sum-difference stereo decoding can also be performed after the entropy decoding and before the inverse quantization, that is, after obtaining the quantized values of the spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the quantized values of the spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding; if it needs, the quantized values of the spectrum of the sum-difference sound channels in said scale factor band are transformed into the quantized values of the spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the quantized values of the spectrum are not processed and the subsequent processing is directly performed.
- If the sum-difference stereo decoding is after the entropy decoding and before the inverse quantization, then the frequency-domain coefficients of the left-right sound channels in the scale factor band are obtained from the frequency-domain coefficients of the sum-difference sound channels through the equation of
- If the sum-difference stereo decoding is after the inverse quantization, then the inversely quantized frequency-domain coefficients of the left-right sound channels in the sub-band are obtained from the frequency-domain coefficients of the sum-difference sound channels through the computation of the matrix of
- Fig. 13 is a schematic drawing of the structure of the third embodiment of the encoding device of the present invention. On the basis of Fig. 9, said embodiment has a sum-difference
stereo encoding module 57 added between the output of the frequency-domain linear prediction andvector quantization module 56 and the input of the quantization andentropy encoding module 54. Thepsychoacoustical analyzing module 51 outputs the masking threshold of the sum-difference sound channels to the quantization andentropy encoding module 54. - The sum-difference
stereo encoding module 57 can also be located between the quantizer bank and the encoder in the quantization andentropy encoding module 54 to receive the result of signal type analysis output from thepsychoacoustical analyzing module 51. - In this embodiment, the function and the operating principle of the sum-difference
stereo encoding module 57 are the same as those show in Fig. 11, so they will not be elaborated again. - The encoding method of the encoding device as shown in Fig. 13 is substantially the same as the encoding method of the encoding device as shown in Fig. 9, and the difference is that the former further includes the steps of determining whether the audio signals are multi-channel signals before quantizing and entropy encoding the frequency-domain coefficients; if they are multi-channel signals, determining whether the types of the signals of the left-right sound channels are the same; if the signal types are the same, determining whether the scale factor bands meet the encoding conditions; if they meet the conditions, performing a sum-difference stereo encoding on said scale factor bands; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the sum-difference stereo encoding is not performed.
- The sum-difference stereo encoding can be applied not only before the quantization, but also after the quantization and before the entropy encoding, that is, after quantizing the frequency-domain coefficients, it is determined if the audio signals are multi-channel signals, if they are, itisdetermined if the signals of the left-right sound channels are of the same type, if the signal types are the same, it is determined if the scale factor bands meet the encoding conditions, if they meet the conditions, performing a sum-difference stereo encoding thereon; if they do not meet the conditions, the sum-difference stereo encoding is not performed. If the signals are mono signals or multi-channel signals of different types, the sum-difference stereo encoding is not performed.
- Fig. 14 is a schematic drawing of the structure of embodiment three of the decoding device of the present invention. On the basis of the decoding device as shown in Fig. 10, said decoding device has a sum-difference
stereo decoding module 66 added between the output of theinverse quantizer bank 62 and the input of the inverse frequency-domain linear prediction andvector quantization module 65, and the bit-stream demultiplexing module 60 outputs sum-difference stereo control signal thereto. - The sum-difference
stereo decoding module 66 can also be located between the output of theentropy decoding module 61 and the input of theinverse quantizer bank 62 to receive the sum-difference stereo control signal output from the bit-stream demultiplexing module 60. - In this embodiment, the function and the operating principle of the sum-difference
stereo decoding module 66 are the same as those shows in Fig. 10, so they will not be elaborated again. - The decoding method of the decoding device as shown in Fig. 14 is substantially the same as the decoding method of the decoding device as shown in Fig. 10, and the difference is that the former further includes the followng steps: after obtaining the inverse quantization spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the inverse quantization spectrum according to the sum-difference stereo control signal; if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the inverse quantization spectrum of the sum-difference sound channels in said scale factor band is transformed into inverse quantization spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the inverse quantization spectrum is not processed and the subsequent processing is directly performed.
- The sum-difference stereo decoding can also be performed before the inverse quantization, that is, after obtaining the quantized values of the spectrum, if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the quantized values of the spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the quantized values of the spectrum of the sum-difference sound channels in said scale factor band are transformed into the quantized value of the spectrum of the left-right sound channels before the subsequent processing; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the quantized values of the spectrum are not processed and the subsequent processing is directly performed.
- Fig. 15 is the schematic drawing of the fourth embodiment of the encoding device of the present invention. On the basis of the encoding device as shown in Fig. 5, this embodiment has a
re-sampling module 590 and a frequencyband spreading module 591 added, wherein there-sampling module 590 re-samples the input audio signals to change the sampling rate thereof, and then outputs the audio signals with a changed sampling rate to the signalcharacteristic analyzing module 50; the frequencyband spreading module 591 is used for analyzing the input audio signals on the entire frequency band to extract the spectrum envelope of the high frequency portion and the characteristics of its relationship with the low frequency portion, and to output them to the bit-stream multiplexing module 55. - The
re-sampling module 590 is used for re-sampling the input audio signals. The re-sampling includes up-sampling and down-sampling. The re-sampling is described below using down-sampling as an example. In this embodiment, there-sampling module 590 comprises a low-pass filter and a down-sampler, wherein the low-pass filter is used for limiting the frequency band of the audio signals and eliminating the aliasing that might be caused by down-sampling. The input audio signal is down-sampled after being low-pass filtered. Suppose that the input audio signal is s(n), and said signal is output as v(n) after being filtered by the low-pass filter having a pulse response of h(n), then - After being input to the frequency
band spreading module 591, the original audio signals are analyzed on the entire frequency band to extract the spectrum envelope of the high frequency portion and the characteristics of its relationship with the low frequency portion, and to output them to the bit-stream multiplexing module 55 as the frequency band spreading control information. - The basic principle of frequency band spreading is that with respect to most audio signals, there is a strong correlation between the characteristic of the high frequency portion thereof and the characteristic of the low frequency portion thereof, so the high frequency portions of the audio signals can be effectively reconstructed through the low frequency portions, thus the high frequency portions of the audio signals may not be transmitted. In order to ensure a correct reconstruction of the high frequency portions, only few frequency band spreading control signals need to be transmitted in the compressed audio code stream.
- The frequency
band spreading module 591 comprises a parameter extracting module and a spectrum envelope extracting module. Signals are input to the parameter extracting module which extracts the parameters representing the spectrum characteristics of the input signals at different time-frequency regions, then in the spectrum envelope extracting module, the spectrum envelope of the high frequency portion of the signal is estimated at a certain time-frequency resolution. In order to ensure that the time-frequency resolution is most suitable for the characteristics of the present input signals, the time-frequency resolution of the spectrum envelope can be selected freely. The parameters of the spectrum characteristics of the input signals and the spectrum envelope of the high frequency portion are used as the control signal for frequency band spreading to be output to the bit-stream multiplexing module 55 for multiplexing. - The bit-
stream multiplexing module 55 receives the code stream including the common scale factor, encoded values of the scale factors, encoded values of the code book sequence numbers and the quantization spectrum of lossless encoding or the encoded values of the code word indexes output from the quantization andentropy encoding module 54 and the frequency band spreading control signal output from the frequencyband spreading module 591, and then multiplexes them to obtain the compressed audio data stream. - The encoding method based on the encoding device as shown in Fig. 15 specifically includes: analyzing the input audio signal on the entire frequency band, and extracting the high frequency spectrum envelope and the parameters of the signal spectrum characteristics as the frequency band spreading control signal; re-sampling the input audio signal and analyzing the signal type; calculating the signal-to-masking ratio of the re-sampled signal; time-frequency mapping the re-sampled signal to obtain the frequency-domain coefficients of the audio signal; quantizing and entropy encoding the frequency-domain coefficients; multiplexing the frequency band spreading control signal and the encoded audio code stream to obtain the compressed audio code stream, wherein the re-sampling includes the two steps of limiting the frequency band of the audio signal and performing a multiple down-sampling on the audio signal whose frequency band is limited.
- Fig. 16 is a schematic drawing of the structure of embodiment four of the decoding device. On the basis of the decoding device as shown in Fig. 8, said embodiment has a frequency
band spreading module 68 added, which receives the frequency band spreading control information output from the bitstream demultiplexing module 60 and the time-domain audio signal of low frequency output from the frequency-time mapping module 64, and which reconstruct the high frequency signal portion through spectrum shift and high frequency adjustment to output the wide band audio signal. - The decoding method based on the decoding device as shown in Fig. 16 is substantially the same as the decoding method based on the decoding device as shown in Fig. 8, and the difference lies in that the former further includes the step of reconstructing the high frequency portion of the audio signal according to the frequency band spreading control information and the time-domain audio signal after obtaining the time-domain audio signal, thereby to obtain the wide band audio signal.
- Figs. 17, 19 and 21 are the fifth to the seventh embodiments of the encoding device, which respectively have a
re-sampling module 590 and a frequencyband spreading module 591 added thereto on the basis of the encoding devices as shown in Figs. 11, 9 and 13. The connection of these two modules with other modules, and the function and principle of these two modules are the same as those shown in Fig. 15, so they will not be elaborated herein. - Figs. 18, 20 and 22 are the fifth to the seventh embodiments of the decoding device, which respectively have a frequency
band spreading module 68 added thereto on the basis of the decoding devices as shown in Figs. 12, 10 and 14 to receive the frequency band spreading control information output from the bit-stream demultiplexing module 60 and the time-domain audio signals of low frequency channel output from the frequency-time mapping module 64, then the high frequency signal portion is reconstructed through frequency spectrum shift and high frequency adjustment to output audio signals of wide frequency band. - The seven embodiments of the encoding device as described above may also include a gain control module which receives the audio signals output from the signal
characteristic analyzing module 50, controls the dynamic range of the fast varying type signals, and eliminates the pre-echo in audio processing. The output thereof is connected to the time-frequency mapping module 52 and thepsychoacoustical analyzing module 51, meanwhile, the amount of gain adjustment is output to the bit-stream multiplexing module 55. - According to the signal type of the audio signals, the gain control module controls only the fast varying type signals, while the slowly varying signals are directly output without being processed. As for the fast varying type signals, the gain control module adjusts the time-domain energy envelope of the signal to increase the gain value of the signal before the fast varying point, so that the amplitudes of the time-domain signal before and after the fast varying point are close to each other; then the time-domain signals whose time-domain energy envelope are adjusted are output to the time-
frequency mapping module 52, meanwhile, the amount of gain adjustment is output to the bit-stream multiplexing module 55. - The encoding method based on said encoding device is substantially the same as the encoding method based on the above described encoding device, and the difference lies in that the former further includes the step of performing a gain control on the signal whose signal type has been analyzed.
- The seven embodiments of the encoding device as described above may also include an inverse gain control module which is located after the output of the frequency-
time mapping module 64 to receive the result of signal type analysis and the information of the amount of gain adjustment output from the bit-stream demultiplexing module 60, thereby adjusting the gain of the time-domain signal and controlling the pre-echo. After receiving the reconstructed time-domain signal output from the frequency-time mapping module 64, the inverse gain control module controls the fast varying type signals but leaves the slowly varying type signals unprocessed. As for the signals of fast varying type, the inverse gain control module adjusts the energy envelope of the reconstructed time-domain signal according to the information of the amount of gain adjustment, reduces the amplitude value of the signal before the fast varying point, and adjusts the energy envelope back to the original state of low in the front and high in the back. Thus the amplitude value of the quantified noise before the fast varying point will be reduced along with the amplitude value of the signal, thereby controlling the pre-echo. - The decoding method based on said decoding device is substantially the same as the decoding method based on the above described decoding device, and the difference lies in that the former further includes the step of performing an inverse gain control on the reconstructed time-domain signals.
- Finally, it has to be noted that the above-mentioned embodiments illustrate rather than limit the technical solutions of the invention. While the invention has been described in conjunction with preferred embodiments, those skilled in the art shall understand that that modifications or equivalent substitutions can be made to the technical solution of the present invention without deviating from the spirit and scope of the technical solutions of the present invention. Accordingly, it is intended to embrace all such modifications or equivalent substitutions as fall within the scope of the appended claims
Claims (20)
- An enhanced audio encoding device, comprising a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, and a bit-stream multiplexing module, characterized in that said device further comprises a signal characteristic analyzing module and a multi-resolution analyzing module; wherein
the signal characteristic analyzing module is configured to analyze the signal type of the input audio signal and output it to the psychoacoustical analyzing module and the time-frequency mapping module, and to output the result of signal type analysis of the audio signals to the bit-stream multiplexing module at the same time;
the psychoacoustical analyzing module is configured to calculate a masking threshold and a signal-to-masking ratio of the audio signal, and output them to said quantization and entropy encoding module;
the time-frequency mapping module is configured to convert the time-domain audio signal into frequency-domain coefficients and output them to the multi-resolution analyzing module;
the multi-resolution analyzing module is configured to perform a multi-resolution analysis on the frequency-domain coefficients of signal of a fast varying type based on the signal type analysis result output from the signal characteristic analyzing module, and to output it to the quantization and entropy encoding module;
the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the frequency-domain coefficients under the control of the signal-to-masking ratio output from the psychoacoustical analyzing module and output them to the bit-stream multiplexing module; and
the bit-stream multiplexing module is configured to multiplex the received data to form audio encoding code stream. - The enhanced audio encoding device according to claim 1, characterized in that the multi-resolution analyzing module comprises a frequency-domain coefficient transformation module and a reorganization module, wherein the frequency-domain coefficient transformation module is used for transforming the frequency-domain coefficients into time-frequency plane coefficients; and the reorganization module is used for reorganizing the time-frequency plane coefficients according to a certain rule; wherein the frequency-domain coefficient transformation module is the filter bank of frequency-domain wavelet transformation or the filter bank of frequency-domain MDCT transformation.
- The enhanced audio encoding device according to claim 1, further comprising a frequency-domain linear prediction and vector quantization module located between the output of the multi-resolution analyzing module and the input of the quantization and entropy encoding module; said frequency-domain linear prediction and vector quantization module consists of a linear prediction analyzer, a linear prediction filter, a transformer, and a vector quantizer; the linear prediction analyzer is used for predictive analyzing the frequency-domain coefficients to obtain the prediction gain and prediction coefficients, and for outputting the frequency-domain coefficients that meet a certain condition to the linear prediction filter; while the prediction coefficients that do not meet the condition are directly output to said quantization and entropy encoding module;
the linear prediction filter is used for filtering the frequency-domain coefficients to obtain the residual sequence, and for outputting the residual sequence to the quantization and entropy encoding module and outputting the prediction coefficients to the transformer;
the transformer is used for transforming the prediction coefficients into line spectrum pair frequency coefficients; the vector quantizer is used for performing multi-stage vector quantization on the line spectrum pair frequency coefficients, and the relevant side information obtained from the quantization is transmitted to the bit-stream multiplexing module. - The enhanced audio encoding device according to any one of claims 1-3, further comprising a sum-difference stereo encoding module located between the output of the frequency-domain linear prediction and vector quantization module and the input of the quantization and entropy encoding module, or between the quantizer bank and the encoder in the quantization and entropy encoding module; the signal characteristic analyzing module outputs the result of signal type analysis thereto; the sum-difference stereo encoding module is used for transforming the residual sequence/frequency-domain coefficients of the left-right sound channels into the residual sequence/ frequency-domain coefficients of the sum-difference sound channels.
- The enhanced audio encoding device according to any one of claims 1-4, further comprising a re-sampling module and a frequency band spreading module;
the re-sampling module is used for re-sampling the input audio signal to change the sampling rate thereof, then outputting the audio signal with a changed sampling rate to the psychoacoustical analyzing module and the signal characteristic analyzing module; said re-sampling module comprises a low-pass filter and a down-sampler; wherein the low-pass filter is used for limiting the frequency band of the audio signal, and the down-sampler is used for down-sampling the audio signal whose frequency band is limited to reduce the sampling rate of the signal;
the frequency-band spreading module is used for analyzing the input audio signal on the entire frequency band to extract the spectrum envelope of the high frequency portion and the parameters representing the correlation between the low and high frequency spectrum and to output them to the bit-stream multiplexing module; said frequency-band spreading module comprises a parameter extracting module and a spectrum envelope extracting module; said parameter extracting module is used for extracting the parameters representing the spectrum characteristics of the input signal at different time-frequency regions, and said spectrum envelope extracting module is used for estimating the spectrum envelope of the high frequency portion of the signal at a certain time-frequency resolution, and then outputting the parameters of the spectrum characteristics of the input signals and the spectrum envelope of the high frequency portion to the bit-stream multiplexing module. - An enhanced audio encoding method, comprising the following steps:step 1: analyzing the type of the input audio signal and using the result of the signal type analysis as a part of the multiplexed information;step 2: time-frequency mapping the type analyzed signal to obtain the frequency-domain coefficients of the audio signal; and calculating the signal-to-masking ratio of the audio signal;step 3: multi-resolution analyzing the frequency-domain coefficients for the signal of a fast varying type, while if the signal is not of a fast varying type, proceed to step 4; step 4: quantizing and entropy encoding the frequency-domain coefficients under the control of the signal-to-masking ratio; step 5: multiplexing the encoded audio signal to obtain the compressed audio code stream.
- The enhanced audio encoding method according to claim 6, characterized in that the quantization in step 4 is scalar quantization which comprises non-linearly compressing the frequency-domain coefficients in all the scale factor bands; using the scale factor of each sub-band to quantize the frequency-domain coefficients of said sub-band to obtain the quantization spectrum represented by an integer; selecting the first scale factor in each frame of signal as the common scale factor; and differentiating the rest of the scale factors from their respective previous scale factor;
the entropy encoding comprises entropy encoding the quantization spectrum and the differentiated scale factors to obtain the sequence numbers of the code book, the encoded values of the scale factors and the lossless encoded values of the quantization spectrum; and entropy encoding the sequence numbers of the code book to obtain the encoded values thereof. - The enhanced audio encoding method according to claim 6, characterized in that the multi-resolution analysis in step 3 comprises performing MDCT transformation on the frequency-domain coefficients to obtain the time-frequency plane coefficients; reorganizing said time-frequency plane coefficients according to a certain rule; wherein the reorganization method comprises organizing the time-frequency plane coefficients in the frequency direction, and organizing the coefficients in each frequency band in the time direction, then arranging the organized coefficients in the order of sub-window and scale factor band.
- The enhanced audio encoding method according to any one of claims 6-8, characterized in that in between said step 3 and step 4, there are also the steps of performing a standard linear prediction analysis on the frequency-domain coefficients to obtain the prediction gain and the prediction coefficients; determining if the prediction gain exceeds the predetermined threshold, if it does, a frequency-domain linear prediction error filtering is performed on the frequency-domain coefficients based on the prediction coefficients to obtain the linear prediction residual sequence of the frequency-domain coefficients; transforming the prediction coefficients into line spectrum pair frequency coefficients, and performing a multi-stage vector quantization on said line spectrum pair frequency coefficients to obtain the side information; quantizing and entropy encoding the residual sequence; if the prediction gain does not exceed the predetermined threshold, quantizing and entropy encoding the frequency-domain coefficients.
- The enhanced audio encoding method according to any one of claims 6-9, characterized in that said step 4 further comprises quantizing the frequency-domain coefficients; determining if the audio signals are multi-channel signals, if they are, determining if the signals of the left-right sound channels are of the same type; if the signal types are the same, determining if the scale factor bands corresponding to the two sound channels meet the conditions of sum-difference stereo encoding, if they meet the conditions, performing a sum-difference stereo encoding on the spectrum coefficients in the scale factor band to obtain the frequency-domain coefficients of the sum-difference sound channels; if they do not meet the conditions, not performing the sum-difference stereo encoding on the spectrum coefficients in said scale factor band; if the signals are mono signals or multi-channel signals of different types, not processing the frequency-domain coefficients; and entropy encoding the frequency-domain coefficients; wherein the method for determining whether the scale factor band meets the condition for encoding is K-L transformation, specifically, the correlation matrix of the spectrum coefficients of the scale factor bands of the left-right sound channels is calculated; K-L transformation is performed on the correlation matrix; if the absolute value of the rotation angle a deviates from π/4 by a small amount, e.g. 3π/16<| a| <5π/16, a sum-difference stereo encoding can be performed on the corresponding scale factor bands; said sum-difference stereo encoding is
wherein M̂ denotes the quantized frequency-domain coefficients of the sum sound channel, Ŝ denotes the quantized frequency-domain coefficients of the difference channel, L̂ denotes the quantized frequency-domain coefficients of the left sound channel, and R̂ denotes the quantized frequency-domain coefficients of the right sound channel. - The enhanced audio encoding method according to any one of claims 6-10, characterized in that there is also a re-sampling step and a frequency spreading step before said step 1; the re-sampling step resamples the input audio signal to change the sampling rate thereof;
the frequency spreading step analyzes the input audio signal on the entire frequency band to extract the high frequency spectrum envelope and the parameters of the signal spectrum characteristics as a part of the signal multiplexing. - An enhanced audio decoding device, comprising a bit-stream demultiplexing module, an entropy decoding module, an inverse quantizer bank, and a frequency-time mapping module, characterized in that said device further comprises a multi-resolution integration module;
the bit-stream demultiplexing module is configured to demultiplex the compressed audio data stream and output the corresponding data signal and control signal to the entropy decoding module and the multi-resolution integration module; the entropy decoding module is configured to decode said signal, recover the quantized values of the spectrum so as to output them to the inverse quantizer bank;
the inverse quantizer bank is configured to reconstruct the inverse quantization spectrum and output it to the multi-resolution integration module;
the multi-resolution integration module is configured to perform multi-resolution integration on the inverse quantization spectrum and to output it to the frequency-time mapping module; and
the frequency-time mapping module is configured to perform a frequency-time mapping on the spectrum coefficients to output the time-domain audio signal. - The enhanced audio decoding device according to claim 12, characterized in that said multi-resolution integration module comprises a coefficient reorganization module and a coefficient transformation module; said coefficient transformation module is a filter bank of frequency-domain inverse wavelet transformation, or a filter bank of frequency-domain inverse modified discrete cosine transformation.
- The enhanced audio decoding device according to claim 12 or 13, further comprising inverse frequency-domain linear prediction and vector quantization module located between the output of the inverse quantizer bank and the input of the multi-resolution integration module; said inverse frequency-domain linear prediction and vector quantization module comprises an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter; wherein the inverse vector quantizer is used for inversely quantizing the code word index to obtain the line spectrum pair frequency coefficients, the inverse transformer is used for inversely transforming the line spectrum frequency coefficients into prediction coefficients, and the inverse linear prediction filter is used for inversely filtering the inverse quantization spectrum based on the prediction coefficients to obtain the spectrum before prediction.
- The enhanced audio decoding device according to any one of claims 12-14, further comprising a sum-difference stereo decoding module located behind the inverse quantizer bank or between the output of the entropy decoding module and the input of the inverse quantizer bank, to receive the sum-difference stereo control signal output from the bit-stream demultiplexing module, and to transform the inverse quantization spectrum/the quantized values of the spectrum of the sum-difference sound channels into the inverse quantization spectrum/the quantized values of the spectrum of the left-right sound channels based an the sum-difference stereo control information.
- An enhanced audio decoding method, comprising the following steps:step 1: demultiplexing the compressed audio data stream to obtain the data information and the control information;step 2: entropy decoding said information to obtain the quantized values of the spectrum;step 3: inversely quantizing the quantized values of the spectrum to obtain the inverse quantization spectrum;step 4: multi-resolution integrating the inverse quantization spectrum;step 5: performing a frequency-time mapping to obtain the time-domain audio signal.
- The enhanced audio decoding method according to claim 16, characterized in that the multi-resolution integration in said step 4 specifically comprises arranging the inverse quantization spectrum coefficients in the order of sub-window and scale factor band, and reorganizing them according to the frequency sequence, then performing a plurality of times of inverse modified discrete cosine transformation on the reorganized coefficients to obtain the inverse quantization spectrum before the multi-resolution analysis.
- The enhanced audio decoding method according to claim 16, characterized in that said step 5 further comprises performing inverse modified discrete cosine transformation to obtain the transformed time-domain signals; performing a window adding processing on the transformed time-domain signals at the time domain; superposing said window added time-domain signals to obtain the time-domain audio signals; wherein the window function in said window adding processing is: w(N+k) =cos(pi/2*((k+0.5)/N-0.94*sin (2*pi/N*(k+0.5))/(2*pi))), wherein k=1...N-1; w(k) represents the kth coefficient of the window function and w(k) = w(2*N-1-k); N represents the number of samples of the encoded frame.
- The enhanced audio decoding method according to claim 17 or 18, characterized in that between said step 3 and step 4, there is also the steps of determining if the control information contains information concerning that the inverse quantization spectrum has to undergo the inverse frequency-domain linear prediction vector quantization, if it does, performing the inverse vector quantization to obtain the prediction coefficients, and performing a linear prediction synthesizing on the inverse quantization spectrum according to the prediction coefficients to obtain the spectrum before prediction; and frequency-time mapping the spectrum before prediction; wherein said inverse vector quantization further comprises: obtaining the code word indexes resulted from the vector quantization of the prediction coefficients from the control information; then obtaining the quantized line spectrum pair frequency coefficients according to the code word indexes, on the basis of which the prediction coefficients are calculated.
- The enhanced audio decoding method according to any one of claims 16-19, characterized in that between said steps 2 and 3, there are also the steps that if the result of signal type analysis shows that the signal types are the same, it is determined whether it is necessary to perform a sum-difference stereo decoding on the inverse quantization spectrum according to the sum-difference stereo control signal, if it is necessary, it is determined, on the basis of the flag bit on each scale factor band, if said scale factor band needs a sum-difference stereo decoding, if it needs, the inverse quantization spectrum of the sum-difference sound channels in said scale factor band is transformed into inverse quantization spectrum of the left-right sound channels, and proceed to step 3; if the signal types are not the same or it is unnecessary to perform the sum-difference stereo decoding, the inverse quantization spectrum is not processed and proceed to step 3; wherein the sum-difference stereo decoding is
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200410030946 | 2004-04-01 | ||
PCT/CN2005/000440 WO2005096273A1 (en) | 2004-04-01 | 2005-04-01 | Enhanced audio encoding/decoding device and method |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1873753A1 true EP1873753A1 (en) | 2008-01-02 |
Family
ID=35064017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05742018A Withdrawn EP1873753A1 (en) | 2004-04-01 | 2005-04-01 | Enhanced audio encoding/decoding device and method |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1873753A1 (en) |
WO (1) | WO2005096273A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090006081A1 (en) * | 2007-06-27 | 2009-01-01 | Samsung Electronics Co., Ltd. | Method, medium and apparatus for encoding and/or decoding signal |
US20100010807A1 (en) * | 2008-07-14 | 2010-01-14 | Eun Mi Oh | Method and apparatus to encode and decode an audio/speech signal |
WO2011110572A1 (en) * | 2010-03-11 | 2011-09-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
JP2013037111A (en) * | 2011-08-05 | 2013-02-21 | Fujitsu Semiconductor Ltd | Method and device for coding audio signal |
WO2014044812A1 (en) * | 2012-09-21 | 2014-03-27 | Dolby International Ab | Coding of a sound field signal |
RU2591011C2 (en) * | 2009-10-20 | 2016-07-10 | Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. | Audio signal encoder, audio signal decoder, method for encoding or decoding audio signal using aliasing-cancellation |
RU2592412C2 (en) * | 2012-03-29 | 2016-07-20 | Хуавэй Текнолоджиз Ко., Лтд. | Methods and apparatus for encoding and decoding signals |
US11830507B2 (en) | 2018-08-21 | 2023-11-28 | Dolby International Ab | Coding dense transient events with companding |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036441B (en) | 2014-03-24 | 2023-06-06 | 杜比国际公司 | Method and apparatus for applying dynamic range compression to high order ambisonics signals |
US10531099B2 (en) * | 2016-09-30 | 2020-01-07 | The Mitre Corporation | Systems and methods for distributed quantization of multimodal images |
CN112530444B (en) * | 2019-09-18 | 2023-10-03 | 华为技术有限公司 | Audio coding method and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR960012475B1 (en) * | 1994-01-18 | 1996-09-20 | 대우전자 주식회사 | Digital audio coder of channel bit |
EP0720316B1 (en) * | 1994-12-30 | 1999-12-08 | Daewoo Electronics Co., Ltd | Adaptive digital audio encoding apparatus and a bit allocation method thereof |
CN1154084C (en) * | 2002-06-05 | 2004-06-16 | 北京阜国数字技术有限公司 | Audio coding/decoding technology based on pseudo wavelet filtering |
CN1461112A (en) * | 2003-07-04 | 2003-12-10 | 北京阜国数字技术有限公司 | Quantized voice-frequency coding method based on minimized global noise masking ratio criterion and entropy coding |
-
2005
- 2005-04-01 WO PCT/CN2005/000440 patent/WO2005096273A1/en active Application Filing
- 2005-04-01 EP EP05742018A patent/EP1873753A1/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO2005096273A1 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090006081A1 (en) * | 2007-06-27 | 2009-01-01 | Samsung Electronics Co., Ltd. | Method, medium and apparatus for encoding and/or decoding signal |
US8532982B2 (en) * | 2008-07-14 | 2013-09-10 | Samsung Electronics Co., Ltd. | Method and apparatus to encode and decode an audio/speech signal |
US20100010807A1 (en) * | 2008-07-14 | 2010-01-14 | Eun Mi Oh | Method and apparatus to encode and decode an audio/speech signal |
US9728196B2 (en) | 2008-07-14 | 2017-08-08 | Samsung Electronics Co., Ltd. | Method and apparatus to encode and decode an audio/speech signal |
US9355646B2 (en) | 2008-07-14 | 2016-05-31 | Samsung Electronics Co., Ltd. | Method and apparatus to encode and decode an audio/speech signal |
US20140012589A1 (en) * | 2008-07-14 | 2014-01-09 | Samsung Electronics Co., Ltd. | Method and apparatus to encode and decode an audio/speech signal |
RU2591011C2 (en) * | 2009-10-20 | 2016-07-10 | Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. | Audio signal encoder, audio signal decoder, method for encoding or decoding audio signal using aliasing-cancellation |
US9252803B2 (en) | 2010-03-11 | 2016-02-02 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
RU2616863C2 (en) * | 2010-03-11 | 2017-04-18 | Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. | Signal processor, window provider, encoded media signal, method for processing signal and method for providing window |
WO2011110572A1 (en) * | 2010-03-11 | 2011-09-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
AU2011226121B2 (en) * | 2010-03-11 | 2014-08-28 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
KR101445292B1 (en) * | 2010-03-11 | 2014-09-29 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
US8907822B2 (en) | 2010-03-11 | 2014-12-09 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
CN102893329B (en) * | 2010-03-11 | 2015-04-08 | 弗兰霍菲尔运输应用研究公司 | Signal processor, window provider, method for processing a signal and method for providing a window |
JP2013531264A (en) * | 2010-03-11 | 2013-08-01 | フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. | Signal processor, window provider, encoded media signal, method for processing signal and method for providing window |
CN102893329A (en) * | 2010-03-11 | 2013-01-23 | 弗兰霍菲尔运输应用研究公司 | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
EP2372703A1 (en) * | 2010-03-11 | 2011-10-05 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window |
EP3096317A1 (en) * | 2010-03-11 | 2016-11-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Signal processor and method for processing a signal |
JP2013037111A (en) * | 2011-08-05 | 2013-02-21 | Fujitsu Semiconductor Ltd | Method and device for coding audio signal |
RU2592412C2 (en) * | 2012-03-29 | 2016-07-20 | Хуавэй Текнолоджиз Ко., Лтд. | Methods and apparatus for encoding and decoding signals |
US9537694B2 (en) | 2012-03-29 | 2017-01-03 | Huawei Technologies Co., Ltd. | Signal coding and decoding methods and devices |
US9786293B2 (en) | 2012-03-29 | 2017-10-10 | Huawei Technologies Co., Ltd. | Signal coding and decoding methods and devices |
US9899033B2 (en) | 2012-03-29 | 2018-02-20 | Huawei Technologies Co., Ltd. | Signal coding and decoding methods and devices |
US10600430B2 (en) | 2012-03-29 | 2020-03-24 | Huawei Technologies Co., Ltd. | Signal decoding method, audio signal decoder and non-transitory computer-readable medium |
US9460729B2 (en) | 2012-09-21 | 2016-10-04 | Dolby Laboratories Licensing Corporation | Layered approach to spatial audio coding |
US9495970B2 (en) | 2012-09-21 | 2016-11-15 | Dolby Laboratories Licensing Corporation | Audio coding with gain profile extraction and transmission for speech enhancement at the decoder |
US9502046B2 (en) | 2012-09-21 | 2016-11-22 | Dolby Laboratories Licensing Corporation | Coding of a sound field signal |
WO2014044812A1 (en) * | 2012-09-21 | 2014-03-27 | Dolby International Ab | Coding of a sound field signal |
US9858936B2 (en) | 2012-09-21 | 2018-01-02 | Dolby Laboratories Licensing Corporation | Methods and systems for selecting layers of encoded audio signals for teleconferencing |
US11830507B2 (en) | 2018-08-21 | 2023-11-28 | Dolby International Ab | Coding dense transient events with companding |
Also Published As
Publication number | Publication date |
---|---|
WO2005096273A1 (en) | 2005-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1852851A1 (en) | An enhanced audio encoding/decoding device and method | |
EP1873753A1 (en) | Enhanced audio encoding/decoding device and method | |
JP4950210B2 (en) | Audio compression | |
CA2853987C (en) | Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding | |
JP5539203B2 (en) | Improved transform coding of speech and audio signals | |
JP5788833B2 (en) | Audio signal encoding method, audio signal decoding method, and recording medium | |
US8655670B2 (en) | Audio encoder, audio decoder and related methods for processing multi-channel audio signals using complex prediction | |
JP5820464B2 (en) | Audio or video encoder, audio or video decoder, and multi-channel audio or video signal processing method using prediction direction variable prediction | |
RU2449387C2 (en) | Signal processing method and apparatus | |
US7181404B2 (en) | Method and apparatus for audio compression | |
US9037454B2 (en) | Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT) | |
CN101276587A (en) | Audio encoding apparatus and method thereof, audio decoding device and method thereof | |
CN103366750B (en) | A kind of sound codec devices and methods therefor | |
CN101192410B (en) | Method and device for regulating quantization quality in decoding and encoding | |
CN104751850B (en) | Vector quantization coding and decoding method and device for audio signal | |
CN1677492A (en) | Intensified audio-frequency coding-decoding device and method | |
CN103366751A (en) | Sound coding and decoding apparatus and sound coding and decoding method | |
WO2005096508A1 (en) | Enhanced audio encoding and decoding equipment, method thereof | |
RU2409874C2 (en) | Audio signal compression | |
WO2006056100A1 (en) | Coding/decoding method and device utilizing intra-channel signal redundancy | |
AU2011205144B2 (en) | Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding | |
JPH05114863A (en) | High-efficiency encoding device and decoding device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAJ | Public notification under rule 129 epc |
Free format text: ORIGINAL CODE: 0009425 |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070816 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: SCHUG, MICHAEL Inventor name: DENG, HAO C/F4, TRIUMPH PLAZA Inventor name: ZHU, XIAOMING C/F4, TRIUMPH PLAZA Inventor name: WANG, LEI C/F4, TRIUMPH PLAZA Inventor name: REN, WEIMIN C/F4, TRIUMPH PLAZA Inventor name: HENN, FREDRIK Inventor name: HOERICH, HOLGER Inventor name: MARTIN, DIETZ Inventor name: PAN, XINGDE C/F4, TRIUMPH PLAZA Inventor name: EHRET, ANDREAS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20090323 |