WO2005096508A1 - Enhanced audio encoding and decoding equipment, method thereof - Google Patents

Enhanced audio encoding and decoding equipment, method thereof Download PDF

Info

Publication number
WO2005096508A1
WO2005096508A1 PCT/CN2004/001034 CN2004001034W WO2005096508A1 WO 2005096508 A1 WO2005096508 A1 WO 2005096508A1 CN 2004001034 W CN2004001034 W CN 2004001034W WO 2005096508 A1 WO2005096508 A1 WO 2005096508A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
frequency
inverse
signal
spectrum
Prior art date
Application number
PCT/CN2004/001034
Other languages
French (fr)
Chinese (zh)
Inventor
Xingde Pan
Dietz Martin
Andreas Ehret
Holger HÖRICH
Xiaoming Zhu
Michael Schug
Weimin Ren
Lei Wang
Hao Deng
Fredrik Henn
Original Assignee
Beijing Media Works Co., Ltd
Beijing E-World Technology Co., Ltd.
Coding Technologies Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Media Works Co., Ltd, Beijing E-World Technology Co., Ltd., Coding Technologies Ab filed Critical Beijing Media Works Co., Ltd
Publication of WO2005096508A1 publication Critical patent/WO2005096508A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • G10L19/0208Subband vocoders

Definitions

  • the present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec apparatus and method based on a perceptual model.
  • the digital audio signal is audio encoded or audio compressed for storage and transmission.
  • the purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few odds as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
  • CDs represented the many advantages of using digital representations of audio speech, such as high fidelity, large dynamic range, and robustness.
  • these advantages are at the expense of a very high data rate.
  • the sampling rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is equally quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s.
  • Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost.
  • new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio.
  • MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques mainly used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates.
  • MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, because five-channel high-quality encoding cannot be achieved at a code rate lower than 540 kbps.
  • MPE&-2 AAC technology was proposed, which can achieve higher quality coding for five-channel signals at a rate of 320 kbps.
  • Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a booster 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self.
  • the filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48 kHz
  • MDCT modified discrete cosine transform
  • the sampled signal has a maximum frequency resolution of 23 Hz and a maximum inter-resolution of 2. 6 ras.
  • a sine window and a Ka i ser-Bes sel window can be used in the filter bank 102, and a sine window is used when the harmonic interval of the input signal is less than 140 Hz, and Ka i ser is used when a strong component of the input signal is greater than 220 Hz.
  • -Bes sel window whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48 kHz
  • the audio signal After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103.
  • the time domain noise shaping technique is in the frequency domain. Perform linear prediction analysis on the spectral coefficients, and then control the quantization noise according to the analysis.
  • the shape in the time domain is used to achieve the purpose of controlling the pre-echo.
  • the intensity/coupling module 104 is used for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.
  • the second-order backward adaptive predictor 105 is used to eliminate the redundancy of the steady-state signal and improve the coding efficiency.
  • the difference and stereo (M/S) module 106 operates for a pair of channels, which are two channels such as a channel or a left and right surround channel in a channel signal or a multi-channel signal.
  • the M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.
  • the bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-homogeneous quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation.
  • the nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop uses the ratio of the quantization noise to the masking threshold to estimate the encoding quality of the signal. .
  • the last encoded signal is passed through a bitstream multiplexing module 108 to form an encoded audio stream output.
  • the input signal simultaneously generates four equal-band bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • PQF quad-band polyphase filter bank
  • each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • a gain controller 101 is used in each frequency band.
  • the high frequency PQF band can be ignored to obtain a low sampling rate signal.
  • FIG. 2 shows a block diagram of the corresponding MPEG-2 AAC decoder.
  • the decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, a time domain A noise shaping module 208, a filter bank 209, and a gain control module 210.
  • the encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream.
  • the lossless decoding module 202 After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a quantized value of the signal spectrum are obtained.
  • the inverse quantizer 203 is a set of non-homogeneous quantizers implemented by a companding function for converting integer quantized values into reconstructed spectra. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the difference channel to the left and right channels under the control of the side information.
  • the intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction.
  • IMDCT Improved Discrete Cosine Transform
  • the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
  • MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.
  • FIG. 3 is a schematic structural diagram of an encoder using Dolby AC-3 technology, including a transient signal detection module 301, an improved discrete cosine transform filter MDCT 302, a spectral envelope/exponential encoding module 303, a mantissa encoding module 304, Forward-backward adaptive sensing model 305, parameter bit allocation module 306, and bitstream multiplexing module 307.
  • the audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.
  • the spectral envelope/index encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes.
  • the AC-3 technique differentially encodes the spectral envelope in frequency because at most ⁇ 2 increments are required, each increment represents a 6 dB level change, and the first DC term is encoded in absolute value, and the remaining indices are used.
  • Differential encoding In the D15 spectrum envelope index coding, each 4th number needs about 2.33 bits, 3 difference packets are encoded in a 7-bit word length, and the D15 coding mode provides a fine frequency by 4 times time resolution. Resolution.
  • D15 is occasionally transmitted, usually every 6 sound blocks (one The spectral envelope of the data frame is transmitted once.
  • the D25 coding mode provides a suitable frequency resolution and time resolution, and is differentially coded every other frequency coefficient, so that each index requires approximately 1.15 bits.
  • the D45 coding mode is differentially coded every three frequency coefficients, so that each index requires approximately 0.58 bits.
  • the D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.
  • the forward-backward adaptive sensing model 305 is used to estimate the masking threshold for each frame of the signal.
  • the forward adaptive part is only applied to the encoder end.
  • an optimal set of perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame.
  • the backward adaptive part is applied to both the encoder side and the decoder side.
  • the parameter bit allocation module 306 analyzes the spectral envelope of the audio signal based on the masking criteria to determine the number of bits allocated to each mantissa.
  • the module 306 utilizes a bit pool for global bit allocation for all channels.
  • bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted according to the number of bits that can be obtained.
  • the AC-3 encoder also uses high frequency coupling technology to divide the high frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels starting from a certain sub-band. Coupling.
  • an AC-3 audio stream output is formed by the bit stream multiplexing module 307.
  • Figure 4 shows the flow diagram for decoding with Dolby AC-3.
  • the bit 3 ⁇ 4 ⁇ 4 encoded by the AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data error is detected, error concealment or muting processing is performed.
  • the bit stream is then unpacked to obtain the main information and the side information, and then exponentially decoded.
  • two side information is needed: one is the number of indexed packets; one is the indexing strategy used, such as D15, D25 or D45 mode.
  • the decoded index and bit allocation side information are then bit-allocated, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa.
  • the bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream.
  • the single coded mantissa value is dequantized, converted to a dequantized value, the mantissa occupying zero bits is restored to zero, or replaced by a random jitter value under the control of the jitter flag.
  • the decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • matrix processing is applied to a certain sub-band, then the solution is; the 5th end needs to convert the sub-band and the difference channel value into left and right channel values through matrix recovery.
  • the dynamic range control value of each audio block is included in the code stream, and the value is subjected to dynamic range compression to change the amplitude of the coefficient, including Index and mantissa.
  • the frequency domain coefficients are inversely transformed into a time domain sample, and then the time domain samples are windowed, and adjacent blocks are overlapped and added to reconstruct a PCM audio signal.
  • the number of channels outputted by the decoding is smaller than the number of channels in the encoded bit stream, it is also necessary to downmix the audio signal and finally output the PCM stream.
  • Dolby AC-3 encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is 4 ⁇ 384 kbps, its encoding effect is poor; and for mono and dual Channel stereo coding efficiency is also awkward.
  • the existing codec technology cannot comprehensively solve the codec quality from very low code rate, low bit rate to high bit rate audio signal and mono and two channel signals, and the implementation is complicated.
  • the technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio codec to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.
  • the enhanced audio coding apparatus of the present invention includes a frequency band extension module, a resampling module, a psychoacoustic analysis module, a time-frequency missing module, a quantization and entropy coding module, and a bit stream multiplexing module;
  • the original input audio signal is analyzed over the entire frequency band, and the spectral envelope of the high frequency portion and its characteristics related to the low frequency portion are extracted and output to the bit stream multiplexing module;
  • the resampling module is used for inputting The audio signal is resampled, the sampling rate of the audio signal is changed, and the audio signal of the changed sampling rate is output to the psychoacoustic analysis module and the time-frequency mapping module;
  • the psychoacoustic analysis module is configured to calculate the input audio signal a masking threshold and a mask ratio, which are output to the quantization and entropy encoding module;
  • the time-frequency mapping module is configured to convert a time domain audio signal into a frequency domain coefficient; and the quantization
  • the enhanced audio decoding device of the present invention comprises a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-EI inter-mapping module and a band extension module; and the bit stream demultiplexing module is used for compression
  • the audio data stream is demultiplexed, and outputs corresponding data signals and control signals to the entropy decoding module and the band extension module;
  • the entropy decoding module is configured to decode the foregoing signals, and recover the quantized values of the spectra, Outputting to the inverse quantizer;
  • the inverse quantizer group is configured to reconstruct an inverse quantization spectrum and output to the frequency-time mapping module;
  • the frequency-time mapping module is configured to perform frequency-time on the spectral coefficients Mapping, obtaining a time domain audio signal of a low frequency band;
  • the frequency band expansion module configured to receive frequency band extension control information output by the bit stream demultiplexing module and the frequency-time mapping module, and a time
  • the invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported.
  • FIG. 1 is a block diagram of an MPEG-2 AAC encoder
  • FIG. 2 is a block diagram of an MPEG-2 AAC decoder
  • Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology
  • Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology
  • Figure 5 is a schematic structural view of an audio encoding device of the present invention.
  • FIG. 6 is a schematic structural diagram of an audio decoding device of the present invention.
  • FIG. 7 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention
  • FIG. 8 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention.
  • Embodiment 9 is a schematic structural diagram of Embodiment 2 of an encoding apparatus according to the present invention.
  • Figure 10 is a schematic structural diagram of Embodiment 2 of the decoding device of the present invention.
  • Figure 11 is a schematic structural view of Embodiment 3 of the invention encoding device
  • Figure 12 is a schematic structural diagram of Embodiment 3 of the decoding device of the present invention.
  • Figure 13 is a schematic structural view of Embodiment 4 of the invention encoding device
  • FIG. 14 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform
  • Figure 15 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform
  • FIG. 16 is a schematic structural diagram of Embodiment 4 of the invention decoding apparatus
  • Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention.
  • Embodiment 5 of the decoding device of the present invention is a schematic structural diagram of Embodiment 5 of the decoding device of the present invention.
  • Figure 19 is a schematic structural view of Embodiment 6 of the invention encoding device
  • Embodiment 6 of the decoding device of the present invention is a schematic structural diagram of Embodiment 6 of the decoding device of the present invention.
  • Figure 21 is a schematic structural view of Embodiment 7 of the invention encoding device
  • Figure 22 is a schematic structural diagram of Embodiment 7 of the decoding device of the present invention.
  • Figure 23 is a schematic structural view of Embodiment 8 of the inventive encoding device.
  • Figure 24 is a schematic structural view of Embodiment 9 of the coding apparatus of the present invention.
  • Figure 25 is a schematic structural view of Embodiment 10 of the encoding apparatus of the present invention.
  • Figure 26 is a schematic structural view of Embodiment 11 of the encoding apparatus of the present invention.
  • Figure 27 is a schematic structural view of Embodiment 12 of the encoding apparatus of the present invention.
  • Figure 28 is a schematic structural view of Embodiment 13 of the encoding apparatus of the present invention.
  • Figure 29 is a schematic structural view of Embodiment 14 of the encoding apparatus of the present invention.
  • Figure 30 is a schematic structural diagram of Embodiment 8 of the decoding apparatus of the present invention.
  • Figure 31 is a schematic structural view of Embodiment 15 of the inventive encoding device.
  • Figure 32 is a schematic structural view of Embodiment 16 of the encoding apparatus of the present invention.
  • Figure 33 is a schematic structural view of Embodiment 17 of the inventive encoding device.
  • Figure 34 is a schematic structural view of Embodiment 18 of the encoding apparatus of the present invention.
  • Figure 35 is a schematic structural view of Embodiment 19 of the encoding apparatus of the present invention.
  • Figure 36 is a schematic structural diagram of Embodiment 9 of the decoding apparatus of the present invention.
  • Figure 37 is a schematic structural diagram of Embodiment 10 of the decoding apparatus of the present invention.
  • Figure 38 is a schematic structural diagram of Embodiment 11 of the decoding device of the present invention.
  • Embodiment 12 of a decoding apparatus is a schematic structural diagram of Embodiment 12 of a decoding apparatus according to the present invention.
  • Figure 40 is a block diagram showing the structure of a thirteenth embodiment of the decoding apparatus of the present invention.
  • FIG. 1 to FIG. 4 are schematic structural diagrams of several encoders of the prior art, which have been introduced in the background art, and are not described herein again.
  • the audio encoding apparatus includes a resampling module 50, a psychoacoustic analysis module 51, a time-frequency mapping module 52, a quantization and entropy encoding module 53, a band extension module 54, and a bit stream multiplexing module 55;
  • the resampling module 50 is configured to resample the input audio signal;
  • the psychoacoustic analysis module 51 The masking threshold and the signal mask ratio of the audio signal after resampling are calculated, and the signal type is analyzed;
  • the time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients; and the quantization and entropy encoding module 53 is used in the psychoacoustic analysis module 51.
  • the frequency domain coefficients are quantized and entropy encoded under the control of the output mask ratio, and output to the bit stream multiplexing module 55.
  • the band expansion module 54 is configured to analyze the input audio signal over the entire frequency band to extract the high frequency portion.
  • the spectral envelope and its characteristics related to the frequency portion are output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is used to expand the resampling module 50, the quantization and entropy encoding module 53 and the frequency band
  • the data output by module 54 is multiplexed to form an audio coded stream. '
  • the digital audio signal is resampled in the resampling module 50, the sampling rate is changed, and the resampled signal is input to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, respectively, and the frame audio is calculated in the psychoacoustic analysis module 51.
  • the masked threshold and the signal mask ratio of the signal are then passed to the quantization and entropy encoding module 53 as a control signal; on the other hand, the audio signal in the time domain is converted to frequency domain coefficients by the time-frequency mapping module 52.
  • the above-described frequency domain coefficients are quantized and entropy encoded in the quantization and entropy coding module 53 under the control of the mask ratio output by the psychoacoustic analysis module 51.
  • the original digital audio signal is subjected to analysis by the band expansion module 54, and the spectral envelope and spectral characteristic parameters of the high frequency portion are obtained and output to the bit stream multiplexing module 55.
  • the encoded data and control signals are multiplexed in a bit:; jfc multiplexing module 55 to form an enhanced audio coded code stream.
  • the resampling module 50 is used to resample the input audio signal, and the resampling includes upsampling and downsampling.
  • the following sampling is used as an example to illustrate resampling.
  • the resampling module 50 includes a low pass filter and a down sampler, wherein the pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by downsampling.
  • the psychoacoustic analysis module 51 is mainly used for calculating the masking value, the mask ratio and the perceptual entropy of the input audio signal, and analyzing the signal type.
  • the perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current frame signal to be transparently encoded, thereby adjusting the bit allocation between frames.
  • the psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantization and entropy coding module 53, which controls it.
  • the time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT Modified discrete cosine transform
  • the frequency domain coefficients obtained by the time-frequency mapping are output to the quantization and entropy coding module 53 for quantization and coding processing.
  • the quantization and entropy encoding module 53 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer.
  • the vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; the i-resonant vector quantizer considers the previous vector when quantifying a vector, ie, uses correlation between vectors Sex.
  • the main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a split mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer. If a scalar quantizer is employed, the non-linear quantizer group further includes M sub-band quantizers. In each sub-band quantizer, the scale factor is mainly used for quantization, specifically: performing nonlinear suppression on all frequency domain coefficients in the M scale factor bands, and then using the scale factor to determine the frequency domain coefficient of the sub-band.
  • Quantization is performed, and the quantized spectrum obtained by the integer is output to the encoder, and the first scale factor in each frame signal is output as a common scale factor to the bit stream multiplexing module 55, and other scale factors are differentially processed with the previous scale factor. Output to the encoder.
  • the scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy.
  • the present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:
  • each sub-band quantizer is initialized, and the quantized values of the spectral coefficients in all sub-bands are all zero.
  • the quantization noise of each sub-band is equal to the energy value of each sub-band
  • the noise masking ratio R of each sub-band is equal to its signal-to-mask ratio SMR, and the ratio of the number of remaining pixels is 0, and the number of remaining bits is equal to the target.
  • the scale factor is reduced by one unit, and then the number of bits required to increase the subband ⁇ 5, . ( ⁇ ) is calculated. If the remaining bits of the subband ), confirm the modification of the scale factor, and subtract the remaining bit number A by ⁇ 5 ; (3 ⁇ 4), recalculate the noise mask of the subband, and then continue to find the subband with the noise masking ratio NMR, repeat Perform the next steps. If the remaining number of bits of the subband is ⁇ ⁇ 5 ; ( ⁇ ), the modification is canceled, the previous scale factor and the remaining number of bits are retained, and finally the allocation result is output, and the bit allocation t process ends.
  • the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group.
  • the spectral flattening is performed according to the flattening factor, that is, the dynamic range of the spectrum is reduced, and then the vector quantizer is used.
  • the subjective perceptual distance criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and passes the corresponding codeword index to the encoder.
  • the flattening factor is adjusted according to the bit quantization strategy of the vector quantization, and the bit allocation of the vector quantization is controlled according to the perceived importance between different subbands.
  • Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then use
  • the entropy encoding Represents the entropy of the source, representing the symbol variable. Since the entropy r>> is the shortest limit of the average codeword length, the above formula indicates that the average length of the codeword is very close to its lower bound entropy HUd, so this variable length coding technique becomes "entropy coding". Huffman entropy encoding major coding, arithmetic coding, or run-length coding method, the entropy encoding may be employed in the present invention, the above-described coding; method of any one of five horses.
  • the codebook serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the codebook sequence number is entropy encoded to obtain
  • the code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantization spectrum to the bit stream multiplexing module 55.
  • the codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain The coded value of the codeword index is then output to the bitstream multiplexing module 55.
  • the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the band extension control information to the bit stream multiplexing module. 55.
  • band extension For most audio signals, the characteristics of the high-frequency part have a strong correlation with the characteristics of the low-frequency part, because the high-frequency part of the jt ⁇ audio signal can be effectively reconstructed through its low-frequency part. Thus, the high frequency component of the audio signal may not be transmitted. To ensure proper reconstruction of the high frequency portion, only a small number of band extension control signals are transmitted in the compressed audio stream.
  • the band expansion module 54 includes a parameter extraction module and a spectral envelope extraction module, and the input signal enters the parameter extraction module to extract parameters in different time-frequency regions i or representing spectral characteristics of the input signal, and then in the spectral envelope extraction module,
  • the time-frequency resolution estimates the spectral envelope of the high frequency portion of the signal. In order to ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the spectral resolution of the spectral envelope can be chosen freely.
  • the parameters of the input signal spectrum characteristics and the spectral envelope of the high frequency portion are sent to the bit stream multiplexing module 55 for multiplexing as the output of the band extension.
  • the bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum output by the quantization and entropy coding module 53 or the coded value of the codeword index and the band extension. After the information output by the module 54 is multiplexed, a compressed audio data stream is obtained.
  • the encoding method based on the above encoder specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal; calculating the resampling Signal-to-mask ratio of the signal; time-frequency mapping of the resampled signal to obtain frequency domain coefficients of the audio signal; quantization and entropy coding of the frequency domain coefficients; multiplexing of the band extension control signal and the encoded audio code stream , get the compressed audio stream.
  • the resampling process consists of two steps: limiting the frequency band of the audio signal; and multiplying the audio signal of the limited band.
  • time-frequency transform of time-domain audio signals such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), chord-modulated filter bank, wavelet transform, and so on.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT modified discrete cosine transform
  • chord-modulated filter bank wavelet transform
  • wavelet transform wavelet transform
  • time-frequency transform using the modified discrete cosine transform (MDCT)
  • MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.
  • the impulse response of the MDCT analysis filter is:
  • J > is the output frequency domain signal of the MDCT transform.
  • the S ine window can be used as a window function.
  • the above limitation on the window function can also be modified by using a bi-orthogonal transform with a specific analysis filter and synthesis filter.
  • time-frequency transform using cosine modulation filtering first select the time domain signal of the previous frame sample and the current frame sample, and then perform windowing operation on the time domain signals of two samples of the two frames, and then The windowed signal is subjected to cosine modulation conversion to obtain a frequency domain coefficient.
  • n 0X---,N,- 1
  • 0 ⁇ ⁇ M-1, 0 ⁇ n ⁇ 2KM- ⁇ is an integer greater than zero, 1 ⁇ .
  • the analysis window of the M sub-band cosine-modulated filter bank (analytical prototype filter) has an impulse response length of N.
  • integrated window integrated prototype filter
  • the impact response length is N s .
  • Calculating the masking threshold and the mask ratio of the resampled signal includes the following steps:
  • the first step is to map the signal to the time domain to frequency i or.
  • the second step is to determine the tonal and non-sounding components in the signal.
  • the tonality of the signal is estimated by inter-predicting each spectral line.
  • the Euclidean distance between the predicted and true values of each spectral line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones.
  • Very strong, and low-predictive spectral components are considered to be noise-like.
  • r pred [k] r M [k] + (r f _! [k] - r t _ 2 [ ⁇ ] ) , indicates the coefficient of the current frame; -1 indicates the coefficient of the previous frame; - 2 indicates the coefficient of the first two frames.
  • the unpredictability of each sub-band is the kk h of the unpredictability of the energy of all the lines in the sub-band
  • the third step is to calculate the signal-to-noise ratio (SNR) of each sub-band.
  • SNR Signal-to-noise ratio
  • the masking threshold for selecting the final signal is static.
  • the value of the masking threshold and the masked threshold calculated above is greater, ie .
  • the perceptual entropy calculated using the following formula, i.e. pe ⁇ - ifbwidthb X log lQ ( n [b] / (e [b] + l))), where represents the number of spectral lines cbwidth b included in each sub-band.
  • Step 5 Calculate the mask ratio (S i gna l - to- Ma sk Ra tio , SMR for short) of each sub-band signal.
  • the frequency domain coefficients are quantized and entropy encoded according to the mask ratio, wherein the quantization may be scalar quantization or vector quantization.
  • the scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer; selecting each frame The first scale factor in the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.
  • the vector quantization comprises the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; performing spectral flattening according to a flattening factor for each dimension vector; finding a code having the smallest distance from the vector to be quantized in the codebook according to the main ⁇ perceptual distance measure criterion Word, get its codeword index.
  • the entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the codebook sequence number to obtain a codebook serial number coded value.
  • a one-dimensional or multi-dimensional entropy encoding of the codeword index which will be the encoded value of the codeword index.
  • the above entropy coding method may use any of the existing Huffman codec, arithmetic coding or run length coding.
  • the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and the band extension control signal to obtain a compressed audio code stream.
  • FIG. 6 is a block diagram showing the structure of an audio decoding device of the present invention.
  • the tone decoding apparatus includes a bit stream demultiplexing module 601, an entropy decoding module 602, an inverse quantizer group 603, a frequency-time mapping module 604, and a band extension module 605.
  • a bit stream demultiplexing module 601 After the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 601, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 602 and the band extension module 605.
  • the teaching signal and the control signal are subjected to decoding processing in the entropy decoding module 602 to recover the quantized value of the spectrum.
  • the upper quantized value is reconstructed in the inverse quantizer group 603, and the inverse quantized spectrum is obtained.
  • the inverse quantized spectrum is output to the frequency-time mapping module 604, and the time domain audio signal is obtained through frequency-time mapping, and is in the band extension module. In the middle 605, the high frequency signal portion is reconstructed to obtain a wide-band time domain audio signal.
  • the bit stream demultiplexing module 601 decomposes the compressed audio code stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules.
  • the signals output to the entropy decoding module 602 include a common scale factor, a scale factor encoding straight, a codebook serial number encoded value, and a lossless encoded quantized spectrum, or an encoded value of a codeword index;
  • the control information is extended to the band extension module 605.
  • the ⁇ decoding module 602 receives the common scale factor, the scale factor coded value output by the bit stream demultiplexing module 601, The code book serial number coded value and the lossless coded quantized spectrum are then decoded by the code book number, the spectral coefficient is decoded, and The scale factor is decoded, the quantized spectrum is reconstructed, and the integer representation of the scale factor and the quantized value of the spectrum are output to the inverse quantizer group 603.
  • the decoding method adopted by the entropy decoding module 602 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, # ⁇ decoding, or run-length decoding.
  • the inverse quantizer group 603 After receiving the quantized value of the spectrum and the gas expression of the scale factor, the inverse quantizer group 603 inversely quantizes the transmitted quantized value into a non-scaled reconstructed spectrum (inverse quantized spectrum), and outputs inverse quantization to the frequency-time mapping module 604. Spectrum.
  • the inverse quantizer group 603 may be a uniform quantizer group or a non-uniform quantizer group realized by a companding number.
  • the quantizer group employs a scalar quantizer, and the inverse quantizer group 603 also employs a scalar inverse quantizer in the numerator apparatus.
  • the quantized values of the spectra are first nonlinearly expanded, and then each spectral factor is used to obtain all the spectral coefficients (inverse t-spectrum) in the corresponding scale factor bands.
  • the entropy decoding module 602 receives the coded value of the codeword index output by the bitstream demultiplexing module 601, and the coded value of the ⁇ 1 codeword index is used.
  • the entropy decoding method corresponding to the entropy encoding method at the time of encoding is decoded to obtain a corresponding codeword index.
  • the codeword index is output to the inverse quantizer group 603, and the quantized value (inverse quantization ⁇ ) is obtained by querying the codebook, and output to the frequency-time mapping module 604.
  • the inverse quantizer group 603 employs an inverse vector quantizer.
  • the inverse quantization spectrum is processed by the mapping of the frequency-time mapping module 604 to obtain a time domain audio signal of a low frequency band.
  • the frequency-time mapping block 604 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and Cosine modulating wave group, etc.
  • IDCT inverse discrete cosine transform
  • IMDCT inverse modified discrete cosine transform
  • the band extension module 605 receives the band extension information output by the bit stream demultiplexing module 601 and the time domain audio signal of the low frequency band of the frequency-time mapping module 604, and reconstructs the high frequency signal part by spectrum shifting and high frequency adjustment. Output a wideband audio signal.
  • the decoding method based on the above solution includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the upper information to obtain a quantized value; performing inverse quantization processing on the quantized value of the spectrum to obtain an inverse Quantization spectrum; 4 inverse quantization spectrum for frequency-time mapping, to obtain a low-frequency time domain audio signal; according to the band extension control signal, reconstruct the high-frequency portion of the time domain audio signal to obtain a wide-band audio signal.
  • the demultiplexed information includes a codebook serial number code value, a common scale factor, a scale factor coded value, and a lossless coded quantized spectrum, it indicates that the spectral coefficients are quantized using a scalar quantization technique in the encoding device, and the entropy decoding step is performed.
  • the method comprises: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing the quantization i Lu.
  • the entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.
  • the process of entropy decoding is illustrated by using the run length decoding method to decode the code book number, using the Huffman decoding method to decode the quantized coefficients, and using the Huffman decoding method to decode the scale factor.
  • the codebook number of all scale factor bands is obtained by the run-length decoding method.
  • the decoded codebook sequence number is an integer of a certain interval, and the interval is [0, 11], then only the valid range is set, that is, 0.
  • the codebook number between 11 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For the all-zero sub-band, you can select the corresponding code number of the code book, and the optional 0 number of the type.
  • the quantized coefficients of all scale factor bands are decoded using the spectral coefficient Huffman codebook corresponding to the codebook number. If the codebook number of a scale factor is within the valid range, and the embodiment is between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the codebook is decoded from the quantized spectrum to obtain the scale factor band. The codeword index of the quantized coefficients, and then unpacked from the codeword index To the quantized coefficient.
  • the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the scale factor band is not decoded, and the quantized coefficients of the subband are all set to zero.
  • the scale factor is used to reconstruct the spectral value based on the inverse quantized spectral coefficient. If the codebook number of the scale factor is in the effective range, each codebook number corresponds to a scale factor. When decoding the above scale factor, first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. If the quantized coefficients of the current subband are all zero, the scale factor of the subband does not need to be decoded.
  • the inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all the spectral coefficients (inverse quantization) in the corresponding scale factor bands according to each scale factor.
  • the demultiplexed information includes the coded value of the codeword index, it indicates that the coding device uses the vector quantization technique to quantize the spectral coefficients, and the entropy decoding step includes: adopting an entropy corresponding to the entropy coding method in the encoding device.
  • the decoding method decodes the encoded value of the codeword index to obtain a codeword index.
  • the codeword index is then inversely quantized to obtain an inverse quantized spectrum.
  • the method of performing frequency-time mapping processing on the inverse quantization spectrum corresponds to the time-frequency mapping processing method in the encoding method, and may be an inverse discrete cosine transform (IDCT), an inverse discrete Fourier transform (IDFT), or an inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.
  • IDCT inverse discrete cosine transform
  • IDFT inverse discrete Fourier transform
  • IMDCT inverse modified discrete cosine transform
  • the inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process.
  • the frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.
  • the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x, ., repet. IMDCT
  • the transformed expression is: where, represents the sample number
  • the time domain signal obtained by the IMDCT transform is windowed in the time domain.
  • Typical window functions are Sine windows, Kaiser-Bessel windows, and the like.
  • the scalar orthogonal transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.
  • the windowed time domain signal is superimposed to obtain a time domain audio signal.
  • the high frequency portion of the audio signal is reproduced based on the frequency extension spread control information and the time domain audio signal to obtain a wideband audio signal.
  • FIG. 7 is a schematic illustration of a first embodiment of an encoding device of the present invention.
  • This embodiment adds a difference and difference stereo (M/S) encoding module 56 on the basis of FIG. 5, which is located between the output of the time-frequency mapping module 52 and the input of the quantization and entropy encoding module 53, the psychoacoustic analysis module. 51 to the output signal type analysis results.
  • the psychoacoustic analysis module 51 calculates the masking threshold for the sum and difference channels in addition to the masking threshold for the audio signal mono, and outputs to the quantization and entropy code module 53.
  • the and difference stereo encoding module 56 can also be located between the quantizer group and the encoder in the quantization and entropy encoding module 53.
  • the difference stereo encoding module 56 converts the frequency domain coefficients/residual sequences of the left and right channels into the difference channel frequency domain coefficients/residual sequences by using the correlation between the two channels of the channel pair to achieve
  • the purpose of improving coding efficiency and stereo panning is therefore only applicable to channel-pair signals with consistent word types. If it is a mono signal or a channel pair signal with inconsistent signal type, .] does not perform and differential stereo encoding.
  • the encoding method based on the encoding apparatus shown in FIG. 7 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain coefficients, it is judged whether the audio signal is Multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo coding conditions.
  • the frequency of the difference channel is obtained: 3 ⁇ 4 coefficient; if not, no and difference stereo coding is performed; if it is a mono signal or a signal type inconsistent multi-channel signal , the frequency domain coefficients are not processed.
  • the difference stereo coding can be applied before the quantization process, before the entropy coding, that is, after the quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, such as a multi-channel signal. , determining whether the signal classes of the left and right channel signals are consistent. If the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels satisfy the difference and the sound coding conditions, and if they are satisfied, the difference is performed. Stereo encoding; if not, no and difference stereo encoding processing is performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain teaching is not performed and the differential stereo encoding processing is performed.
  • the spectral factor of the corresponding scale factor band of the right channel is r (k) , and its correlation matrix
  • C rr ⁇ ; r(A ⁇ * r ⁇ k); is the number of spectral lines of the scale factor
  • the frequency domain coefficients in the scale factor band corresponding to the left and right channels are replaced by the frequency domain coefficients of the linear transform and the difference channel:
  • M represents the sum frequency domain coefficient
  • S represents the difference channel frequency domain coefficient
  • L represents the left channel frequency domain coefficient
  • R represents the right channel frequency domain coefficient
  • the quantized frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:
  • A represents the quantized sum channel frequency domain coefficient; represents the quantized difference channel frequency domain coefficient; £ quantized left channel frequency domain coefficient; represents the quantized right channel frequency domain coefficient.
  • the phase of the left and right channels can be effectively removed, and since the quantization is performed, lossless coding can be achieved.
  • Fig. 8 is a schematic diagram of the first embodiment of the decoding apparatus.
  • the decoding apparatus adds a difference and stereo decoding module 606, based on the decoding apparatus shown in FIG. 6, between the output of the inverse quantizer group 603 and the input of the frequency-time generating module 604, and receives the bit stream demultiplexing.
  • the signal type analysis result and the cover stereo control signal output by the module 601 are used to convert the inverse quantized spectrum of the difference channel and the ⁇ quantized spectrum of the left and right channels according to the above control information.
  • the stereo decoding, and difference stereo decoding module 606 determines whether the quantized values of the inverse quantized spectrum/spectrum in some scale factor bands need to be subjected to differential stereo decoding based on the flag bits of the scale factor band. If the sum and difference stereo coding are performed in the encoding device, the inverse quantized spectrum must be subjected to the sum and difference stereo I code in the decoding device.
  • the sum and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603, and receive the difference stereo control signal and signal type output from the bit stream demultiplexing module 601.
  • the decoding method based on the decoding apparatus shown in FIG. 8 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 6, except that the following steps are added: After the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the signal types are the same, Determining whether inverse quantization spectrum and differential stereo decoding are required according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, The inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be performed with the difference stereo win code, the inverse quantization spectrum is not performed. Processing, direct processing.
  • the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is: when the language is obtained
  • the signal type analysis result indicates that the letter ⁇ type is consistent
  • it is judged according to the difference and the stereo control signal whether the quantized value of the spectrum needs to be performed and the difference stereo code; if necessary, according to each scale
  • the flag bit determines whether the scale factor band needs to be decoded with the difference body sound, and if necessary, converts the quantized value of the spectrum of the difference channel in the scale dice band into the quantized value of the spectrum of the left and right channels, and then performs subsequent Processing;
  • the word signal type is inconsistent or does not need to be performed and the difference stereo: ⁇ code, then the paired quantized value is not processed, r is followed by subsequent processing.
  • the left and right channels are obtained by using the following operations in the frequency factor of the scale factor band and the frequency domain of the difference channel.
  • the quantized sum channel frequency domain coefficients S represents the quantized channel frequency domain coefficients; Z represents the quantized left channel frequency domain coefficients; ⁇ represents the quantized right channel frequency domain coefficients.
  • the right channel is in the frequency domain of the inverse quantization of the subband.
  • Channel frequency domain coefficient represents the difference channel frequency domain coefficient; 7 table left channel frequency domain coefficient; r represents the right channel frequency domain system.
  • Fig. 9 is a view showing the construction of a second embodiment of the encoding apparatus of the present invention.
  • This embodiment adds a frequency domain linear prediction and vector quantization module 57 to FIG. 5 , 7 , wherein the frequency domain linear prediction and vector quantization module 57 is located at the output of the time-frequency mapping module 52 and the input of the quantization and entropy coding sample 53 .
  • the residual sequence is outputted to the quantization and entropy encoding module 53, and the quantized code index is output to the bit stream multiplexing module 55 as side information.
  • the frequency domain coefficients output by the time-frequency mapping module 52 are transmitted to the frequency domain linear prediction and vector quantization module 57. If the predicted gain value of the frequency domain coefficients satisfies a given condition, J5'J performs linear prediction filtering on the frequency domain coefficients to obtain The prediction coefficient is converted into a line coefficient LSF (Line Spec trum Frequency), and then the codeword index of each codebook is calculated by using the best distortion metric, and the codeword index is transmitted as side information to the ratio ⁇
  • LSF Line Spec trum Frequency
  • the frequency domain linear prediction and vector quantization module 57 is composed of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer.
  • the frequency domain coefficients are input into a linear pre-conductor for prediction analysis, and the predicted soil and prediction coefficients are obtained. If the value of the prediction gain satisfies certain conditions, the frequency domain coefficients are output to the linear prediction filter to perform linear prediction error filtering.
  • Get the pre-frequency of the frequency domain coefficient The residual sequence is directly output to the quantity and entropy coding module 53, and the prediction coefficient is converted into a line spectrum pair frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization.
  • the codeword index of the pair is transferred to the bitstream multiplexing module 55.
  • the edge language corresponding to the positive frequency component of the signal that is, the Hi l ber t envelope of the signal is related to the autocorrelation function of the signal.
  • the bandpass signal in each certain frequency range if its H i lber t envelope remains constant,
  • the sequence of spectral coefficients is a sequence of 3 ⁇ 4fe states with respect to frequency, so that the spectral value can be processed by predictive coding techniques, and the signal is represented by a common set of prediction coefficients.
  • the encoding method based on the encoding device shown in FIG. 9 is basically the same as the encoding method based on the encoding device shown in FIG. 5, except that the following steps are added: standard linear predictive analysis is performed on the frequency domain coefficients to obtain prediction gain and prediction coefficients; Determining whether the prediction gain exceeds a threshold value of t, and if so, performing frequency domain linear prediction error filtering according to the prediction coefficient tooth frequency domain coefficient to obtain a prediction residual sequence of the domain coefficient; degenerating the prediction coefficient into a line spectrum pair frequency coefficient, and The multi-level vector quantization processing of the line spectrum on the frequency coefficient line is performed to obtain the side information; the tooth residual sequence is quantized and entropy encoded; if the prediction gain exceeds the set threshold, the frequency domain coefficients are quantized and entropy encoded.
  • the standard linear prediction analysis of the frequency domain coefficients is first performed, including the self-correlation matrix and the recursive Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the prediction gain of the if calculation exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient; otherwise, the frequency domain coefficient is not processed, and the next step is performed, and the frequency domain coefficient is performed. Quantity ⁇ ⁇ ⁇ and ⁇ encoding.
  • Linear prediction can be divided into forward prediction and backward prediction.
  • Forward prediction refers to the prediction of the current value by the value before a certain moment
  • backward prediction refers to the prediction of the current value by the value after a certain moment.
  • A represents the prediction coefficient, which is the prediction order. After passing through the frequency-domain coefficients of the ⁇ division-frequency transformation, the predicted error E (k) is obtained, which is also called the difference sequence, and the relationship between the two is satisfied.
  • the frequency domain coefficient X (k) output by the time-frequency transform module can be represented by the residual sequence E 0> and a set of prediction coefficients ai .
  • the set of prediction coefficients ⁇ ,. is converted into a line spectrum pair frequency coefficient LSF, and multi-level vector quantization is performed thereon.
  • Vectorization selects the best distortion metric (such as the most recent criterion), and searches for the codeword that best matches the temple-quantized LSF parameter vector (residual vector) in each codebook, and the corresponding codeword index is used. Side information is transmitted ⁇ !.
  • the residual sequence is quantized and encoded.
  • the decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 607 to the output and frequency of the inverse quantizer group 603 based on the decoding apparatus shown in FIG.
  • the demultiplexing module 601 outputs inverse frequency domain linear vector quantization control information thereto for inverse quantization processing and inverse linear prediction filtering on the inverse quantization spectrum (residual spectrum)
  • the pre-predicted spectrum is obtained and output to the frequency-time mapping module 604.
  • frequency domain linear predictive vector quantization techniques are employed to suppress pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear pre-quantization control information outputted by the inverse quantization spectrum and the bit multiplex demodulation module 601 is input to the inverse frequency domain linear prediction and vector quantization module 607 to recover the linear prediction error. .
  • the inverse frequency domain linear prediction and vector quantization module 607 includes an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is used to inversely quantize the codeword index row to obtain a line spectrum pair frequency coefficient LSF; Then, the line spectrum is inversely converted into a prediction coefficient by a frequency coefficient LSF; the inverse linear prediction filter is used to inversely filter the inverse quantization spectrum according to the prediction coefficient, to obtain a pre-predicted spectrum, and output to the frequency-time mapping module 604.
  • the decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 6, except that the following steps are added: After obtaining the inverse spectrum, it is judged whether or not the inverse quantization is included in the control information. Inverse frequency domain linear prediction vector quantization information, if it is, then inverse vector quantization processing is performed to obtain prediction coefficients, and the inverse quantization spectrum is linearly predicted and synthesized according to the prediction coefficients to obtain a spectrum before prediction; Frequency-time mapping.
  • the inverse quantization spectrum After obtaining the inverse quantization spectrum, determining whether the frame signal is subjected to frequency domain linear predictive vector quantization according to the control information, and if so, obtaining the quantized codeword index from the control information; and then obtaining the quantization according to the codeword index
  • the line spectrum pairs the frequency coefficient LSF, and calculates the prediction coefficient; then the inverse quantization spectrum is linearly predicted and synthesized to obtain the spectrum before prediction.
  • the residual sequence and the calculated prediction coefficients are synthesized by frequency domain linear prediction, and the pre-predicted spectrum (k) can be obtained, and the pre-predicted spectrum J (k) is subjected to frequency-time ⁇ mapping processing.
  • control information indicates that the frame signal has not undergone frequency domain linear predictive vector quantization
  • the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantized spectrum is directly subjected to frequency-time mapping processing.
  • FIG. 11 is a block diagram showing the structure of a third embodiment of the encoding apparatus of the present invention.
  • This embodiment based on FIG. 9, adds a difference and difference stereo coding module 56 between the output of the frequency domain linear prediction and vector quantization module 57 and the input of the quantization and entropy coding module 53, to which the psychoacoustic analysis module 51 The signal type analysis result is output, and the masked value of the difference channel is output to the quantization entropy encoding module 53.
  • the sum and difference stereo coding module 56 may also be located between the quantizer group and the encoder in the quantization entropy coding module 53 to receive the signal type analysis result output by the psychoacoustic analysis module 51.
  • the function and working principle of the and difference stereo coding module 56 are the same as those in FIG. 7, and will not be described here.
  • the encoding method based on the encoding apparatus shown in FIG. 11 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain system, it is determined whether the audio signal is Multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor satisfies the coding condition, and if it is satisfied, the scale factor band is performed. And difference stereo coding; if not full, the difference stereo coding process is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo coding process is not performed.
  • the difference stereo coding can be applied before the quantization and before the entropy coding, that is, after the quantization of the frequency domain coefficients, it is judged whether the audio signal is a multi-channel signal, if it is multi-channel. Signal, then determine whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determine whether the scale factor band satisfies the coding condition, and if so, perform a differential stereo coding on the ⁇ -factor band; If it is satisfied, the difference and the stereo encoding processing are not performed; if it is a multi-channel signal in which the mono signal signal types are inconsistent, the difference and the stereo encoding processing are not performed.
  • Figure 12 is a block diagram showing a third embodiment of the decoding apparatus.
  • the decoding apparatus adds a difference and difference stereo decoding module 606 on the basis of the decoding apparatus shown in FIG. 10, between the output of the inverse quantizer group 603 and the input of the inverse frequency domain linear prediction and vector quantization module 607, and the bit stream solution
  • the multiplexing module 601 outputs and the difference stereo control signals thereto.
  • the sum and difference stereo decoding module 606 may also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603, and receive the sum and difference stereo planting signals output by the bit stream demultiplexing module 601.
  • the function and working principle of the difference and stereo decoding module 606 are the same as those in FIG. 8, and details are not described herein again.
  • the decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 10, except that the following steps are added: After the inverse quantization is obtained, if the signal type analysis result indicates that the signal types are the same, Determining whether inverse quantization spectrum and differential stereo decoding are required according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, The 3 ⁇ 4 quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and the difference stereo decoding, the inverse quantization is not processed. , directly follow-up processing.
  • the difference stereo decoding can also be performed before the inverse quantization process, that is: after the obtained quantized value, if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control signal whether the quantized value of the spectrum needs to be Poor stereo decoding; if necessary, judging whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, the quantized value of the spectrum of the sum channel in the scale factor band Converted into the quantized values of the left and right channels, and then perform subsequent processing; if the signal types are inconsistent or do not need to perform and the difference stereo decoding, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed.
  • Fig. 13 is a view showing the construction of a fourth embodiment of the encoding apparatus of the present invention.
  • This embodiment adds a multi-resolution analysis module 59 on the basis of FIG. 5, wherein the multi-resolution analysis block 59 is located between the output of the time-frequency mapping module 52 and the input of the quantization and entropy coding module 53, psychoacoustic points.
  • the Zhejiang module 51 outputs a signal type analysis result to it.
  • the encoding apparatus of the present invention increases the time resolution of the frequency-domain coefficients of the fast-changing signal by the multi-resolution analyzing module 59.
  • the frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 59. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT > is obtained, and the frequency domain coefficients are obtained.
  • the multi-resolution representation is output to the quantization and entropy coding module 53. If it is a slowly varying odor signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy coding module 53.
  • the multi-resolution analysis module 59 performs time-frequency domain re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the expense of the frequency precision, thereby automatically adapting the time-frequency characteristics of the fast-changing type signal. The effect of suppressing the pre-echo is achieved.
  • the form of the filter bank in the frequency mapping module 52 can be adjusted at any time.
  • the multi-resolution analysis module 59 includes a frequency domain coefficient transform module and a recombination patch, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the reassembly module is configured to use the ⁇ "frequency plane coefficients according to certain rules.
  • the frequency domain coefficient transform module may use a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, and the like.
  • the following takes the frequency domain wavelet transform and the frequency domain MDCT transform as an example to illustrate the operation of the multi-segment analysis module 59. Cheng.
  • the wavelet base of the frequency domain wavelet or wavelet packet transform may be fixed or adaptive.
  • Harr wavelet-based wavelet transform Let's take the simplest Harr wavelet-based wavelet transform as an example to illustrate the process of multi-resolution analysis of frequency domain and number.
  • the wave-wave transform is performed, and the high-frequency part of the frequency-domain coefficient is subjected to Harr wavelet transform to obtain coefficients X 2 (k) and X 3 of different time-frequency intervals ( k), X (k), 5 (;), X ⁇ ⁇ X. ik) , the corresponding time-frequency plane is divided as shown in Figure 15.
  • Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, the time-frequency plane division of the signal analysis can be arbitrarily adjusted as needed to meet the analysis requirements of different time and frequency resolutions.
  • time-frequency plane coefficients are weighted according to certain rules in the recombination module.
  • the time-frequency plane coefficients may be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are according to the sub-window, The order of the scale factor bands.
  • Different frequency-frequency MDCT transforms can be used in different frequency domains to obtain different time-frequency plane partitions, that is, different time and frequency precisions.
  • the recombination module reorganizes the time-frequency domain data outputted by the frequency MDCT transform filter bank.
  • a recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then the organization Good coefficients are arranged in the order of the sub-window and scale factor bands.
  • the basic flow is the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: If it is a fast-changing type signal, multi-resolution is performed on the frequency domain coefficient. Analysis, then quantizing and entropy coding the multi-resolution representation of the frequency domain coefficients; if not fast-type signals, the frequency domain coefficients are directly quantized and entropy encoded.
  • the multi-resolution analysis may use a frequency domain wavelet transform method or a frequency domain MDCT transform method.
  • the frequency domain wavelet analysis method comprises: performing wavelet transform on the frequency domain coefficients to obtain time-frequency plane coefficients; and recombining the above-mentioned time-frequency plane coefficients according to a certain rule.
  • the MDCT transform method includes: performing n times MDCT transformation on the frequency domain coefficients to obtain a time-frequency plane coefficient; and recombining the above-mentioned time-frequency plane coefficients according to a certain rule.
  • the method of recombination may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the coefficients of the woven fabric are arranged in the order of the sub-window and the scale factor band.
  • Figure 16 is a block diagram showing the structure of a fourth embodiment of the decoding apparatus.
  • the decoding device adds a multi-resolution synthesis module 609 to the base of the anti-horse device shown in FIG.
  • the multi-resolution synthesis module 609 is located between the output of the inverse quantizer ⁇ JL 603 and the input of the frequency-time mapping module 604 for multi-resolution green combining of the inverse quantized spectrum.
  • a multi-resolution filtering technique is employed for the fast-changing type signal to improve the temporal resolution of the encoded fast-changing type signal. Accordingly, in the decoder, the multi-resolution synthesis module 609 is required to recover the frequency domain coefficients before the multi-resolution analysis for the fast-changing signal.
  • the multi-resolution synthesis module 609 includes: a coefficient recombination module and a coefficient transformation module, wherein the coefficient transformation module can adopt a frequency domain inverse wavelet transform filter bank or a frequency domain IMDC ⁇ transform filter group.
  • the basic flow is the same as the decoding method based on the de-horse device shown in FIG. 6, except that the following steps are added: After the inverse quantization spectrum is obtained, the inverse quantization " ⁇ Multi-resolution synthesis is performed, and the obtained prediction coefficients are subjected to frequency-time mapping.
  • the process is described in detail below with 128 IMDCT transforms (8 inputs, 16 outputs).
  • 128 IMDCT transforms 8 inputs, 16 outputs.
  • the 3 inverse quantization spectral coefficients are arranged in the order of sub-window and scale factor bands; then recombined according to the frequency order, so that 128 coefficients of each sub-blind are organized in frequency order.
  • the coefficients arranged by the sub-window are grouped by frequency every 8 groups: ⁇ "to the organization, each group of 8 coefficients are arranged in time series, so that there are 128 sets of coefficients in the frequency direction.
  • Each group of coefficients is converted by 16 points IMDCT, The 16 coefficients outputted by each group of IMDCT are superimposed and added to obtain 8 frequency domain data. 128 similar operations are performed from the low frequency to the high frequency direction, and 1024 frequency domain coefficients are obtained.
  • Fig. 17 is a view showing the configuration of a fifth embodiment of the encoding apparatus of the present invention.
  • a frequency domain linear prediction and vector quantization module 57 is added, which is located between the output of the multi-dividing rate analyzing module 59 and the input of the quantization and entropy encoding module 53.
  • the acoustic analysis module 51 outputs a signal type analysis result thereto;
  • the frequency domain linear prediction and vector quantization module 57 is configured to perform linear prediction and multi-level vector quantization on the frequency domain coefficients subjected to the multi-resolution analysis, and output the residual sequence to the quantization and entropy.
  • the encoding module 53 simultaneously outputs the quantized codeword index to the bitstream multiplexing module 55.
  • the frequency-domain linear prediction and vector quantization module 57 needs to linearly predict the frequency domain coefficients in each time period. And multi-level vector quantization.
  • the frequency domain linear prediction and vector quantization module 57 may also be located between the output of the time-frequency mapping module 52 and the input of the multi-resolution analysis module 59, performing linear prediction and multi-level vector quantization on the frequency domain coefficients, and outputting residuals;
  • the resolution analysis module 59 simultaneously outputs the quantized codeword index to the bitstream multiplexing module 55.
  • the encoding method based on the encoding apparatus shown in Fig. 17 is basically the same as the encoding based on the encoding apparatus shown in Fig. 13; the difference is that the following steps are added: After performing multi-resolution analysis on the frequency domain coefficients, ⁇ The frequency domain coefficient on the time period is subjected to standard linear prediction analysis; whether the prediction gain exceeds the set threshold value; if it exceeds, the frequency domain linear prediction error is filtered on the frequency domain coefficient to obtain the residual of the prediction coefficient and the frequency domain coefficient.
  • the prediction coefficient is converted into a line spectrum pair frequency coefficient, and the line spectrum is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold Then, the frequency domain coefficients are quantized and entropy encoded. Or before the multi-resolution analysis, the frequency domain coefficients are linearly predicted and multi-level vector quantized, and then the residual sequences are subjected to multi-resolution analysis.
  • Figure 18 is a block diagram showing the structure of a fifth embodiment of the decoding apparatus.
  • the decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 607 on the basis of the numerator apparatus shown in FIG. 16, and is located in the inverse quantizer group 603. Between the inputs of the resolution synthesis module 609, and the bitstream demultiplexing module 601 outputs inverse frequency domain linearity vector quantization control information thereto for inverse quantization processing and linear prediction synthesis of the inverse quantization spectrum to obtain a prediction mouth. The spectrum is output to the multi-resolution synthesis module 609.
  • the inverse frequency domain linear prediction and vector quantization module 607 can also be located between the output of the multi-resolution synthesis module 609 and the input of the page rate-time mapping module 604 for linear pre-processing of the multi-resolution integrated inverse quantization :: 5 is synthesized.
  • the decoding method based on the decoding device shown in FIG. 18 is the same as the decoding method based on the decoding device shown in FIG. 16, except that the following steps are added: After obtaining the inverse quantization spectrum, it is determined whether or not the control information needs to include the quantized spectrum. After inverse frequency domain linear prediction vector quantization information, if it is included, the inverse vector quantization is used to obtain the prediction coefficients, and the inverse quantization ⁇ is linearly predicted and synthesized to obtain the error before prediction; the pre-predictive spectral line multi-resolution synthesis is performed. .
  • the vector quantization process obtains the prediction coefficients, and performs linear prediction synthesis on the inverse quantization spectrum to obtain the spectrum before the ij prediction; the frequency-time mapping is performed before the prediction.
  • Fig. 19 is a view showing the construction of a sixth embodiment of the encoding apparatus of the present invention.
  • This embodiment is based on the encoding apparatus shown in Fig. 17, with an addition and difference stereo encoding module 56 between the output of the frequency domain linear prediction and vectoring module 57 and the input of the quantization and entropy encoding module 53.
  • the signal type analysis result from the psychoacoustic analysis module 51 is received.
  • the sum and difference stereo coding module 56 may also be located between the quantizer and the encoder in the quantization and entropy coding module 53.
  • the functions and working principles of the and difference stereo coding module 56 are the same as those in FIG. 11, and are not described herein again.
  • the encoding method based on the encoding device shown in FIG. 19 is basically the same as the encoding method based on the encoding device shown in FIG. 17, except that the following steps are added: after obtaining the residual sequence, according to whether the audio signal is a multi-channel signal and For signals of the same signal type and satisfying the encoding conditions, it is determined whether or not to perform stereo encoding on the difference; however, subsequent processing is performed.
  • the specific process has been introduced above, and will not be described here.
  • Fig. 20 is a block diagram showing the structure of the sixth embodiment of the decoding apparatus.
  • the decoding apparatus is based on the decoding apparatus shown in Fig. 18, and the sum and difference stereo decoding module 606 is added between the output of the inverse quantizer group 603 and the input of the inverse frequency domain linearity detecting and vector quantization module 607.
  • the sum and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603.
  • the function and working principle of the differential stereo decoding module 6 O 6 in this embodiment are the same as those in FIG. 12 , and details are not described herein again.
  • the decoding method based on the decoding apparatus shown in FIG. 20 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 18, except that the following steps are added: After the inverse quantization spectrum is obtained, the result and the stereo control are analyzed according to the signal type. The information judges whether the inverse quantization spectrum needs to be subjected to differential stereo decoding; then the subsequent processing is described above, and will not be described here.
  • Figure 21 is a diagram showing a seventh embodiment of the encoding apparatus of the present invention, which is based on the tomb of Figure 13, with the addition and difference stereo encoding module 56, which is located at the output and quantize of the multiresolution analysis module 59. Between the input of the domain decoding module 53 and the input. The sum and difference stereo encoding module 56 can also be located between the quantizer group and the encoder in the quantization and entropy encoding module 53. The difference and stereo encoding module 56 has been previously detailed and is no longer referred to herein.
  • the encoding method based on the encoding apparatus shown in FIG. 21 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 13, and the difference is that the following steps are added: After the multi-resolution analysis of the frequency domain coefficients, whether the audio signal is more or not
  • the channel signal is a signal of the same signal type and satisfies the encoding condition, and determines whether it is subjected to and differential stereo encoding; and then performs subsequent processing.
  • the specific process has been introduced above, and will not be described here.
  • Figure 22 is a diagram showing the seventh embodiment of the decoding apparatus.
  • the decoding apparatus adds a difference stereo decoding module 606 between the output of the inverse quantizer group 603 and the input of the multi-resolution synthesis module 609 based on the decoding apparatus shown in FIG.
  • the and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603.
  • the and difference stereo decoding module 606 has been previously detailed and will not be obscured here.
  • the decoding method based on the decoding device shown in FIG. 22 is basically the same as the decoding method based on the decoding device shown in FIG. 16, except that the following steps are added: After the inverse quantization spectrum is obtained, the result and the difference stereo control information are analyzed according to the signal type. It is judged whether or not the inverse quantization spectrum needs to be subjected to differential stereo decoding; then, subsequent processing is performed. The specific process has been introduced above and will not be described here.
  • Figure 23 is a diagram showing an eighth embodiment of the encoding apparatus of the present invention.
  • the embodiment is based on the encoding apparatus shown in Figure 13, and a signal property analyzing module 510 is added for outputting the resampling module 50.
  • the signal is subjected to signal type analysis, and the resampled signal is output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and the signal type analysis result is output to the bit stream multiplexing module 55.
  • the signal property analysis module 510 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.
  • the encoding method based on the encoding apparatus shown in Fig. 23 is basically the same as the encoding method based on the encoding apparatus shown in Fig. 13, except that the following steps are added:
  • the type of the resampled signal is analyzed as a part of signal multiplexing.
  • the signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis.
  • the specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and searching for each sub-frame. a local maximum point of the absolute value of the PCM data; selecting a subframe peak value in a local maximum point of each subframe; for a certain subframe peak, using a plurality of (typically 3) subframe peaks in front of the subframe to predict relative A typical sample value of a plurality of (typically 4) subframes of the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if the predicted difference and the ratio are both greater than
  • the set threshold value is determined to be a sudden signal in the sub-frame, and it is confirmed that the sub-frame has a local maximum peak point of the back masking pre-echo capability, if between the sub-end and the mask peak, 2.
  • the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until the judgment
  • the frame signal is broken out to be a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly-changing type signal.
  • Figures 24 to 28 are schematic views showing the ninth to thirteenth embodiments of the encoding apparatus of the present invention.
  • the above embodiment is based on the coding apparatus shown in FIG. 17, FIG. 19, FIG. 21, FIG. 9 and FIG. 11, respectively, and a signal property analysis module 510 is added for performing signal type analysis on the signal output by the resampling module 50. And outputting the resampled signal to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and outputting the signal type analysis result to the bit stream multiplexing module 55.
  • the encoding method based on the encoding apparatus shown in FIGS. 24 to 28 is basically the same as the encoding method based on the encoding apparatus shown in FIGS. 17, 19, 21, and 11, except that the following steps are added: analysis of the resampled The type of signal that is part of the signal multiplexing.
  • Figure 29 is a diagram showing a fourteenth embodiment of the encoding apparatus of the present invention.
  • the gain control module 511 is added, the audio signal outputted by the signal property analysis module 510 is received, the dynamic range of the fast-changing type signal is controlled, and the pre-echo in the audio processing is eliminated. Its output is connected to a time-frequency mapping module 52 and a psychoacoustic analysis module 51.
  • the gain control module 511 only controls the fast-change type signal, but does not perform the slow-change type signal. Direct, direct output.
  • the gain control module 511 adjusts the time domain energy envelope of the signal, and increases the gain value of the signal before the fast change point, so that the time domain signal amplitudes before and after the fast change point are relatively close; then the time domain is adjusted.
  • the time envelope signal of the energy envelope is output to the time-frequency mapping module 52, and the gain adjustment amount is output to the bit stream multiplexing module 55.
  • the encoding method based on the encoding apparatus shown in Fig. 29 is basically the same as the encoding method based on the encoding apparatus shown in Fig. 23, except that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
  • FIG. 30 is a schematic structural diagram of Embodiment 8 of the decoding apparatus.
  • the embodiment is an inverse gain control module 610 added to the output of the frequency-time mapping module 604 and the band expansion module 605.
  • the signal type analysis result and the gain adjustment amount information output by the bit stream demultiplexing module 601 are received, and the gain of the time domain signal is adjusted to control the pre-echo.
  • the inverse gain control module 610 controls the fast-change type signal, and does not process the slowly-varying type signal, and directly outputs it to the band extension module 605.
  • the inverse gain control module 610 adjusts the energy envelope of the reconstructed time domain signal according to the gain adjustment amount information, reduces the amplitude value of the signal before the fast change point, and adjusts the energy envelope back to the original front low and high height. State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-echo.
  • the decoding method based on the decoding apparatus shown in FIG. 30 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 16, except that the following steps are added: Before the frequency band expansion of the reconstructed time domain signal is performed, the reconstructed time domain signal is performed. Inverse gain control.
  • 31 to 35 are views showing the fifteenth to nineteenth embodiments of the encoding apparatus of the present invention.
  • the five embodiments are respectively added to the encoding device shown in FIG. 24 to FIG. 28, and a gain control module 511 is added for controlling the dynamic range of the audio signal analyzed by the signal type, and eliminating the pre-echo in the audio processing. Its output is connected to a time-frequency mapping module 52 and a psychoacoustic analysis module 51.
  • the encoding method based on the above five encoding means is basically the same as the encoding method based on the encoding apparatus shown in Figs. 24 to 28, except that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
  • 36 to 40 are diagrams showing the configuration of Embodiments 9 to 13 of the decoding apparatus, and the above 5 decoding apparatuses are respectively based on the decoding apparatus shown in Figs. 18, 20, 22, 10 and 12.
  • the inverse gain control module 610 is added between the output of the frequency-time mapping module 604 and the input of the band expansion module 605, and receives the signal type analysis result output by the bit stream demultiplexing module 601 for adjusting the time domain signal. Gain, control pre-echo.
  • the decoding methods based on the above five decoding devices are also substantially the same as the decoding methods based on the decoding devices shown in Figs. 18, 20, 22, 10, and 12, respectively, except that the following steps are added:
  • the inverse time gain is controlled on the reconstructed time domain signal before the time domain signal is reconstructed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention relates to enhanced audio encoding equipment, including band extending module , re-sampling module, psychological acoustic analyzing module , time-frequency mapping module , quantizing module , entropy coding module and bit stream diplexing module ; the band extending module analyzes the original inputted audio signals in whole band-width , extracts the spectral envelope of the high-frequency part and the parameter characterized of the dependency between the lower and higher parts of the frequency spectral , and then outputs them to the bit stream diplexing module ; the re-sampling module re-samples the inputted audio signals , changes the sampling rate , and outputs them to the psychological acoustic analyzing module and the time-frequency mapping module ; The present invention is applicable to hi-fi compress encoding of audio signals which have configurations about all sorts of sampling rates and sound channels , it can support audio signals whose sampling rate has a range of 8kHz - 192kHz ; and it can support all the sound channel configurations which are available ; it also can support audio encode/decode which has wide range object code rate.

Description

一种增强音频编解码装置及方法 技术领域  Enhanced audio codec device and method
本发明涉及音频编解码技术领域, 具体地说, 涉及一种基"^感知模型的增强音频 编解码装置及方法。  The present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec apparatus and method based on a perceptual model.
背景技术 Background technique
为得到高保真的数字音频信号 , 需对数字音频信号进行音频编码或音频压缩以便 于存储和传输。 对音频信号进行编码的目的是用尽可能少的比持数实现音频信号的透 明表示, 例如原始输入的音频信号与经编码后输出的音频信号之间几乎没有差别。  In order to obtain a high-fidelity digital audio signal, the digital audio signal is audio encoded or audio compressed for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few odds as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
在二十世纪八十年代初, CD的出现体现了用数字表示音频 言号的诸多优点, 例如 高保真度、 大动态范围和强鲁棒性。 然而, 这些优点都是以很^的数据速率为代价的。 例如 CD质量的立体声信号的数字化所要求的采样率为 44. 1kHz , 且每个采样值需用 15 比特进行均勾量化, 这样, 没有经过压缩的数据速率就达 ^了 1. 41Mb/s , 如此高 的数据速率给数据的传输和存储带来极大的不便, 特别是在多媒体应用和无线传输应 用的场合下, 更是受到带宽和成本的限制。 为了保持高质量的音频信号, 因此要求新 的网络和无线多媒体数字音频系统必须降低数据的速率, 且同曰寸不损害音频的质量。 针对上述问题, 目前已提出了多种既能得到很高压缩比又能产生高保真的音频信号的 音频压缩技术, 典型的有国际标准化组织 IS0/IEC的 MPEG- 1/— 2/- 4技术、 杜比公司 的 AC- 2/AC-3 技术、 索尼公司的 ATRAC/MiniDi sc/SDDS 拔术以及朗讯科技的 PAC/EPAC/MPAC技术等。 下面选择 MPEG-2 AAC技术、 杜比公司 的 AC- 3技术进行具体 的说明。  In the early 1980s, the emergence of CDs represented the many advantages of using digital representations of audio speech, such as high fidelity, large dynamic range, and robustness. However, these advantages are at the expense of a very high data rate. For example, the sampling rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is equally quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s. Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost. In order to maintain high quality audio signals, new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio. In response to the above problems, various audio compression techniques have been proposed which can obtain high compression ratio and high fidelity audio signals. Typical MPEG-1/-2/-4 technology of the International Organization for Standardization ISO/IEC , Dolby's AC-2/AC-3 technology, Sony's ATRAC/MiniDi sc/SDDS extraction, and Lucent's PAC/EPAC/MPAC technology. The following is a detailed description of MPEG-2 AAC technology and Dolby's AC-3 technology.
MPEG-1技术和 MPEG- 2 BC技术是主要用于单声道及立体声音频信号的高音质编码 技术, 随着对在较低码率下达到较高编码质量的多声道音频编 的需求的日益增长, 由于 MPEG- 2 BC编码技术强调与 MPEG- 1技术的后向兼容性, 因 匕无法以低于 540kbps 的码率实现五声道的高音质编码。 针对这一不足, 提出了 MPE&-2 AAC技术, 该技术 可采用 320kbps的速率对五声道信号实现较高质量的编码。  MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques mainly used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates. Increasingly, MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, because five-channel high-quality encoding cannot be achieved at a code rate lower than 540 kbps. In response to this shortcoming, MPE&-2 AAC technology was proposed, which can achieve higher quality coding for five-channel signals at a rate of 320 kbps.
图 1给出了 MPEG- 2 AAC编码器的方框图, 该编码器包括增 控制器 101、 滤波器 组 102、 时域噪声整形模块 103、 强度 /耦合模块 104、 心理声学模型、 二阶后向自适 应预测器 105、 和差立体声模块 106、 比特分配和量化编码模 107 以及比特流复用 模块 108 , 其中比特分配和量化编码模块 107进一步包括压缩! ^匕/失真处理控制器、尺 度因子模块、 非均匀量化器和熵编码模块。  Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a booster 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self. The adaptive predictor 105, and the difference stereo module 106, the bit allocation and quantization coding mode 107, and the bitstream multiplexing module 108, wherein the bit allocation and quantization coding module 107 further includes compression! ^匕/distortion processing controller, scale factor module, non-uniform quantizer, and entropy encoding module.
滤波器组 102采用改进的离散余弦变换( MDCT ) , 其分辨率是信号自适应的, 即 对于稳态信号采用 2048点 MDCT变换, 而对于瞬态信号则采用 256点 MDCT变换; 这 样, 对于 48kHz采样的信号, 其最大频率分辨率为 23Hz, 最大 间分辨率为 2. 6ras。 同时在滤波器组 102中可以使用正弦窗和 Ka i ser-Bes sel窗, 当输入信号的谐波间隔 小于 140Hz 时使用正弦窗, 当输入信号中很强的成分间 大于 220Hz 时使用 Ka i ser-Bes sel窗。  The filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48 kHz The sampled signal has a maximum frequency resolution of 23 Hz and a maximum inter-resolution of 2. 6 ras. At the same time, a sine window and a Ka i ser-Bes sel window can be used in the filter bank 102, and a sine window is used when the harmonic interval of the input signal is less than 140 Hz, and Ka i ser is used when a strong component of the input signal is greater than 220 Hz. -Bes sel window.
音频信号经过增益控制器 101后进入滤波器组 102 , 根据不同的信号进行滤波, 然后通过时域噪声整形模块 103对滤波器组 102输出的频谱系数进行处理, 时域噪声 整形技术是在频域上对频谱系数进行线性预测分析, 然后依据 述分析控制量化噪声 在时域上的形状, 以此达到控制预回声的目的。 After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103. The time domain noise shaping technique is in the frequency domain. Perform linear prediction analysis on the spectral coefficients, and then control the quantization noise according to the analysis. The shape in the time domain is used to achieve the purpose of controlling the pre-echo.
强度 /耦合模块 104是用于对信号强度的立体声编码,由于对于高频段(大于 2kHz ) 的信号, 听觉的方向感与有关信号强度的变化(信号包络)有关, 而与信号的波形无 关, 即恒包络信号对听觉方向感无影响, 因此可利用这一特点以及多声道间的相关信 息, 将若干声道合成一个共同声道进行编码, 这就形成了强度 /耦合技术。  The intensity/coupling module 104 is used for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.
二阶后向自适应预测器 105用于消除稳态信号的冗佘, 提高编码效率。 和差立体 声 (M/S )模块 106 是针对声道对进行操作, 声道对是指诸如汉声道信号或多声道信 号中的左右声道或左右环绕声道的两个声道。 M/S模块 106利用声道对中两个声道之 间的相关性以达到减少码率和提高编码效率的效果。 比特分配和量化编码模块 107是 通过一个嵌套循环过程实现的, 其中非均勾量化器是进行有损编码, 而熵编码模块是 进行无损编码, 这样可以去除冗余和减少相关。 嵌套循环包括内层循环和外层循环, 其中内层循环调整非均匀量化器的步长直到所提供的比特用完, 外层循环则利用量化 噪声与掩蔽阈值的比来估计信号的编码质量。 最后经过编码的信号通过比特流复用模 块 108形成编码的音频流输出。  The second-order backward adaptive predictor 105 is used to eliminate the redundancy of the steady-state signal and improve the coding efficiency. The difference and stereo (M/S) module 106 operates for a pair of channels, which are two channels such as a channel or a left and right surround channel in a channel signal or a multi-channel signal. The M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency. The bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-homogeneous quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation. The nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop uses the ratio of the quantization noise to the masking threshold to estimate the encoding quality of the signal. . The last encoded signal is passed through a bitstream multiplexing module 108 to form an encoded audio stream output.
在采样率可伸缩的情况下, 输入信号同时进行四频段多相位滤波器组 (PQF ) 中 产生四个等带宽的频带, 每个频带利用 MDCT产生 256个频谱系数, 总共有 1024个。 在每个频带内都使用增益控制器 101。 而在解码器中可以忽略高频的 PQF频带得到低 采样率信号。  In the case where the sampling rate is scalable, the input signal simultaneously generates four equal-band bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024. A gain controller 101 is used in each frequency band. In the decoder, the high frequency PQF band can be ignored to obtain a low sampling rate signal.
图 2给出了对应的 MPEG-2 AAC解码器的方框示意图。 该解码器包括比特流解复 用模块 201、无损解码模块 202、逆量化器 203、尺度因子模块 204、和差立体声( M/S ) 模块 205、 预测模块 206、 强度 /耦合模块 207、 时域噪声整形模块 208、 滤波器组 209 和增益控制模块 210。 编码的音频流经过比特流解复用模块 201进行解复用, 得到相 应的数据流和控制流。 上述信号通过无损解码模块 202的解码后, 得到尺度因子的整 数表示和信号谱的量化值。逆量化器 203是一组通过压扩函数实现的非均勾量化器组, 用于将整数量化值转换为重建谱。 由于编码器中的尺度因子模块是将当前尺度因子与 前一尺度因子进行差分, 然后将差分值采用 Huffman编码, 因此解码器中的尺度因子 模块 204进行 Huffman解码可得到相应的差分值, 再恢复出真实的尺度因子。 M/S模 块 205在边信息的控制下将和差声道转换成左右声道。 由于在编码器中采用二阶后向 自适应预测器 105消除稳态信号的冗余并提高编码效率, 因此在解码器中通过预测模 块 206进行预测解码。 强度 /耦合模块 207在边信息的控制下进行强度 /耦合解码, 然 后输出到时域噪声整形模块 208中进行时域噪声整形解码, 最后通过滤波器组 209进 行综合滤波, 滤波器组 209采用逆向改进离散余弦变换 ( IMDCT )技术。  Figure 2 shows a block diagram of the corresponding MPEG-2 AAC decoder. The decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, a time domain A noise shaping module 208, a filter bank 209, and a gain control module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream. After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a quantized value of the signal spectrum are obtained. The inverse quantizer 203 is a set of non-homogeneous quantizers implemented by a companding function for converting integer quantized values into reconstructed spectra. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the difference channel to the left and right channels under the control of the side information. Since the second-order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady-state signal and improve the coding efficiency, predictive decoding is performed by the prediction module 206 in the decoder. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction. Improved Discrete Cosine Transform (IMDCT) technique.
对于采样频率可伸缩的情况, 可通过增益控制模块 210忽略高频的 PQF频带, 以 得到低采样率信号。  For the case where the sampling frequency is scalable, the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
MPEG- 2 AAC编解码技术适用于中高码率的音频信号, 但对低码率或甚低码率的音 频信号的编码质量较差; 同时该编解码技术涉及的编解码模块较多, 实现的复杂度较 高, 不利于实时实现。  MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.
图 3给出了采用杜比 AC- 3技术的编码器的结构示意图, 包括暂态信号检测模块 301、 改进的离散余弦变换滤波器 MDCT 302、 频谱包絡 /指数编码模块 303、 尾数编码 模块 304、 前向 -后向自适应感知模型 305、 参数比特分配模块 306和比特流复用模 块 307。 音频信号通过暂态信号检测模块 301判别是稳态信号还是瞬态信号, 同时通过信 号自适应 MDCT滤波器组 302将时域数据映射到频域数据, 其中 512点的长窗应用于 稳态信号, 一对短窗应用于瞬态信号。 FIG. 3 is a schematic structural diagram of an encoder using Dolby AC-3 technology, including a transient signal detection module 301, an improved discrete cosine transform filter MDCT 302, a spectral envelope/exponential encoding module 303, a mantissa encoding module 304, Forward-backward adaptive sensing model 305, parameter bit allocation module 306, and bitstream multiplexing module 307. The audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.
频谱包络 /指数编码模块 303根据码率和频率分辨率的要求采用三种模式对信号 的指数部分进行编码, 分别是 D15、 D25和 D45编码模式。 AC- 3技术在频率上对频谱 包络釆取差分编码, 因为最多需要 ± 2增量, 每个增量代表 6dB 的电平变化, 对于第 一个直流项采用绝对值编码,其余指数就采用差分编码。在 D15频谱包络指数编码中, 每个 4旨数大约需要 2. 33比特, 3个差分组在一个 7比特的字长中编码, D15编码模式 通过 4西牲时间分辨率而提供精细的频率分辨率。 由于只是对相对平稳的信号才需要精 细的频率分辨率, 而这样的信号在许多块上的频谱保持相对恒定, 因此, 对于稳态信 号, D15偶尔被传送, 通常是每 6个声音块(一个数据帧) 的频谱包络被传送一次。 当信号频谱不稳定时, 需要常更新频谱估计值 估计值采用较小的频率分辨率编码, 通常使用 D25和 D45编码模式。 D25编码模式提供了合适的频率分辨率和时间分辨率, 每隔一个频率系数就进行差分编码, 这样每个指数大约需要 1. 15 比特。 当频谱在 2 至 3个块上都是稳定的, 然后突然变化时, 可以采用 D25编码模式。 D45编码模式是 每隔三个频率系数进行差分编码, 这样每个指数大约需要 0. 58比特。 D45编码模式提 供了 艮高的时间分辨率和较低的频率分辨率, 所以一般应用在对瞬态信号的编码中。 The spectral envelope/index encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes. The AC-3 technique differentially encodes the spectral envelope in frequency because at most ± 2 increments are required, each increment represents a 6 dB level change, and the first DC term is encoded in absolute value, and the remaining indices are used. Differential encoding. In the D15 spectrum envelope index coding, each 4th number needs about 2.33 bits, 3 difference packets are encoded in a 7-bit word length, and the D15 coding mode provides a fine frequency by 4 times time resolution. Resolution. Since fine frequency resolution is only required for relatively stationary signals, and the spectrum of such signals remains relatively constant over many blocks, for steady state signals, D15 is occasionally transmitted, usually every 6 sound blocks (one The spectral envelope of the data frame is transmitted once. When the signal spectrum is unstable, it is necessary to update the estimated value of the spectrum estimate with a smaller frequency resolution coding, usually using the D25 and D45 coding modes. The D25 coding mode provides a suitable frequency resolution and time resolution, and is differentially coded every other frequency coefficient, so that each index requires approximately 1.15 bits. When the spectrum is stable on 2 to 3 blocks and then suddenly changes, the D25 encoding mode can be used. The D45 coding mode is differentially coded every three frequency coefficients, so that each index requires approximately 0.58 bits. The D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.
前向 -后向自适应感知模型 305用于估计每帧信号的掩蔽阈值。 其中前向自适应 部分仅应用在编码器端, 在码率的限制下, 通过迭代循环估计一组最佳的感知模型参 数, 然后这些参数被传递到后向自适应部分以估计出每帧的掩蔽阈值。 后向自适应部 分同时应用在编码器端和解码器端。  The forward-backward adaptive sensing model 305 is used to estimate the masking threshold for each frame of the signal. The forward adaptive part is only applied to the encoder end. Under the limitation of the code rate, an optimal set of perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame. Mask the threshold. The backward adaptive part is applied to both the encoder side and the decoder side.
参数比特分配模块 306根据掩蔽准则分析音频信号的频谱包络, 以确定给每个尾 数分配的比特数。 该模块 306利用一个比特池对所有声道进行全局比特分配。 在尾数 编码模块 304中进行编码时, 从比特池中循环取出比特分配给所有的声道, 根据可以 获得的比特数来调整尾数的量化。 为达到压缩编码的目的, AC-3编码器还采用高频耦 合的技术, 将被耦合信号的高频部分按照人耳临界带宽划分成 18个子频段, 然后选 择某些声道从某个子带开始进行耦合。 最后通过比特流复用模块 307形成 AC- 3音频 流输出。  The parameter bit allocation module 306 analyzes the spectral envelope of the audio signal based on the masking criteria to determine the number of bits allocated to each mantissa. The module 306 utilizes a bit pool for global bit allocation for all channels. When encoding is performed in the mantissa encoding module 304, bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted according to the number of bits that can be obtained. For the purpose of compression coding, the AC-3 encoder also uses high frequency coupling technology to divide the high frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels starting from a certain sub-band. Coupling. Finally, an AC-3 audio stream output is formed by the bit stream multiplexing module 307.
图 4给出了采用杜比 AC- 3解码的流程示意图。 首先输入经过 AC- 3编码器编码的 比特¾¾, 对比特流进行数据帧同步和误码检测, 如果检测到一个数据误码, 则进行误 码掩盖或弱音处理。 然后对比特流进行解包,获得主信息和边信息,再进行指数解码。 在进行指数解码时, 需要有两个边信息: 一是打包的指数数目; 一个是所采用的指数 策略, 如 D15、 D25或 D45模式。 已经解码的指数和比特分配边信息再进行比特分配, 指出每个打包的尾数所用的比特数, 得到一组比特分配指针, 每个比特分配指针对应 一个编码的尾数。 比特分配指针指出用于尾数的量化器以及在码流中每个尾数占用的 比特数。 对单个编码的尾数值进行解量化, 将其转变成一个解量化的值, 占用零比特 的尾数被恢复成零, 或者在抖动标志的控制下用一个随机抖动值代替。 然后进行解耦 合的操作, 解耦合是从公共耦合声道和耦合因子中恢复出被耦合声道的高频部分, 包 括指数和尾数。 如果在编码端采用 2/0模式编码时, 会对某子带采用矩阵处理, 那么 在解; 5马端需通过矩阵恢复将该子带的和差声道值转换成左右声道值。 在码流中包含有 每个音频块的动态范围控制值, 将该值进行动态范围压缩, 以改变系数的幅度, 包括 指数和尾数。 将频域系数进行逆变换, 转变成时域样本, 然后对时域样本进行加窗处 理, 相邻的块进行重叠相加, 重构出 PCM音频信号。 当解码输出的声道数小于编码比 特流中的声道数时, 还需要对音频信号进行下混处理, 最后输出 PCM流。 Figure 4 shows the flow diagram for decoding with Dolby AC-3. First, the bit 3⁄4⁄4 encoded by the AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data error is detected, error concealment or muting processing is performed. The bit stream is then unpacked to obtain the main information and the side information, and then exponentially decoded. When performing exponential decoding, two side information is needed: one is the number of indexed packets; one is the indexing strategy used, such as D15, D25 or D45 mode. The decoded index and bit allocation side information are then bit-allocated, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa. The bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream. The single coded mantissa value is dequantized, converted to a dequantized value, the mantissa occupying zero bits is restored to zero, or replaced by a random jitter value under the control of the jitter flag. The decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa. If the 2/0 mode encoding is used at the encoding end, matrix processing is applied to a certain sub-band, then the solution is; the 5th end needs to convert the sub-band and the difference channel value into left and right channel values through matrix recovery. The dynamic range control value of each audio block is included in the code stream, and the value is subjected to dynamic range compression to change the amplitude of the coefficient, including Index and mantissa. The frequency domain coefficients are inversely transformed into a time domain sample, and then the time domain samples are windowed, and adjacent blocks are overlapped and added to reconstruct a PCM audio signal. When the number of channels outputted by the decoding is smaller than the number of channels in the encoded bit stream, it is also necessary to downmix the audio signal and finally output the PCM stream.
杜比 AC- 3编码技术主要针对高比特率多声道环绕声的信号, 但是当 5. 1声道的 编码比特率 4氐于 384kbps时, 其编码效果较差; 而且对于单声道和双声道立体声的编 码效率也较氐。  Dolby AC-3 encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is 4 氐 384 kbps, its encoding effect is poor; and for mono and dual Channel stereo coding efficiency is also awkward.
综上, 现有的编解码技术无法全面解决从甚低码率、 低码率到高码率音频信号以 及单声道、 双声道信号的编解码质量, 实现较为复杂。  In summary, the existing codec technology cannot comprehensively solve the codec quality from very low code rate, low bit rate to high bit rate audio signal and mono and two channel signals, and the implementation is complicated.
发明内容 Summary of the invention
本发明所要解决的技术问题在于提供一种增强音频编解码的装置及方法, 以解决 现有技术对于较低码率音频信号的编码效率低、 质量差的问题。  The technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio codec to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.
本发明所述增强音频编码装置, 包括频带扩展模块、 重采样模块、 心理声学分析 模块、 时频缺射模块、 量化和熵编码模块以及比特流复用模块; 所述频带扩展模块, 用于将原始输入音频信号在整个频带上进行分析, 提取高频部分的谱包络及其与低频 部分产生关系的特性, 并输出到所述比特流复用模块; 所述重采样模块, 用于对输入 音频信号进行重采样, 改变音频信号的采样率, 并将改变采样率的音频信号输出给所 述心理声学分析模块和所述时频映射模块; 所述心理声学分析模块用于计算输入音频 信号的掩蔽阈值和信掩比, 输出到所述量化和熵编码模块; 所述时频映射模块用于将 时域音频信号转变成频域系数; 所述量化和熵编码模块用于在所述心理声学分析模块 输出的信掩比的控制下对频域系数进行量化和熵编码, 并输出到所述比特流复用模 块; 所述比特流复用模块用于将接收到的数据进行复用, 形成音频编码的码流。  The enhanced audio coding apparatus of the present invention includes a frequency band extension module, a resampling module, a psychoacoustic analysis module, a time-frequency missing module, a quantization and entropy coding module, and a bit stream multiplexing module; The original input audio signal is analyzed over the entire frequency band, and the spectral envelope of the high frequency portion and its characteristics related to the low frequency portion are extracted and output to the bit stream multiplexing module; the resampling module is used for inputting The audio signal is resampled, the sampling rate of the audio signal is changed, and the audio signal of the changed sampling rate is output to the psychoacoustic analysis module and the time-frequency mapping module; the psychoacoustic analysis module is configured to calculate the input audio signal a masking threshold and a mask ratio, which are output to the quantization and entropy encoding module; the time-frequency mapping module is configured to convert a time domain audio signal into a frequency domain coefficient; and the quantization and entropy encoding module is used in the psychoacoustic analysis The frequency domain coefficients are quantized and entropy encoded under the control of the mask ratio output by the module, and output to the bits Multiplexing module; the bit stream multiplexing module for the received data are multiplexed, encoded audio stream is formed.
本发明所述增强音频解码装置, 包括比特流解复用模块、 熵解码模块、 逆量化器 组、 频率- EI于间映射模块和频带扩展模块; 所述比特流解复用模块用于对压缩音频数 据流进行解复用, 并向所述熵解码模块和所述频带扩展模块输出相应的数据信号和控 制信号; 所述熵解码模块用于对上述信号进行解码处理, 恢复谱的量化值, 输出到所 述逆量化器纽; 所述逆量化器组用于重建逆量化谱, 并输出到所述到频率-时间映射 模块; 所述频率-时间映射模块用于对谱系数进行频率 -时间映射, 得到低频带的时域 音频信号; 所述频带扩展模块, 用于接收所述比特流解复用模块和所述频率-时间映 射模块输出的频带扩展控制信息和低频段的时域音频信号, 重建高频信号部分, 输出 宽频带音频信号。  The enhanced audio decoding device of the present invention comprises a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-EI inter-mapping module and a band extension module; and the bit stream demultiplexing module is used for compression The audio data stream is demultiplexed, and outputs corresponding data signals and control signals to the entropy decoding module and the band extension module; the entropy decoding module is configured to decode the foregoing signals, and recover the quantized values of the spectra, Outputting to the inverse quantizer; the inverse quantizer group is configured to reconstruct an inverse quantization spectrum and output to the frequency-time mapping module; the frequency-time mapping module is configured to perform frequency-time on the spectral coefficients Mapping, obtaining a time domain audio signal of a low frequency band; the frequency band expansion module, configured to receive frequency band extension control information output by the bit stream demultiplexing module and the frequency-time mapping module, and a time domain audio signal of a low frequency band , reconstructing the high frequency signal portion and outputting the wideband audio signal.
本发明适用于多种采样率、 声道配置的音频信号的高保真压缩编码, 可以支持采 样率为 8kHz到 192kHz之间的音频信号; 可支持所有可能的声道配置; 并且支持范围 很宽的目标;码率的音频编 /解码。  The invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported. Target; code rate audio encoding/decoding.
附图说明 DRAWINGS
图 1是 MPEG-2 AAC编码器的方框图;  Figure 1 is a block diagram of an MPEG-2 AAC encoder;
图 2是 MPEG-2 AAC解码器的方框图;  Figure 2 is a block diagram of an MPEG-2 AAC decoder;
图 3是采用杜比 AC- 3技术的编码器的结构示意图;  Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology;
图 4是采用杜比 AC- 3技术的解码流程示意图;  Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology;
图 5是本发明音频编码装置的结构示意图;  Figure 5 is a schematic structural view of an audio encoding device of the present invention;
图 6是本发明音频解码装置的结构示意图;  6 is a schematic structural diagram of an audio decoding device of the present invention;
图 7是本发明编码装置的实施例一的结构示意图; 图 8是本发明解码装置的实施例一的结构示意图; Figure 7 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention; FIG. 8 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention; FIG.
图 9是本发明编码装置的实施例二的结构示意图;  9 is a schematic structural diagram of Embodiment 2 of an encoding apparatus according to the present invention;
图 10是 发明解码装置的实施例二的结构示意图;  Figure 10 is a schematic structural diagram of Embodiment 2 of the decoding device of the present invention;
图 11是^发明编码装置的实施例三的结构示意图;  Figure 11 is a schematic structural view of Embodiment 3 of the invention encoding device;
图 12是 发明解码装置的实施例三的结构示意图;  Figure 12 is a schematic structural diagram of Embodiment 3 of the decoding device of the present invention;
图 1 3是;^发明编码装置的实施例四的结构示意图;  Figure 13 is a schematic structural view of Embodiment 4 of the invention encoding device;
图 14是采用 Harr小波基小波变换的滤波结构示意图;  14 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform;
图 15是釆用 Harr小波基小波变换得到的时频划分示意图;  Figure 15 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform;
图 16是^发明解码装置的实施例四的结构示意图;  16 is a schematic structural diagram of Embodiment 4 of the invention decoding apparatus;
图 17是本发明编码装置的实施例五的结构示意图;  Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention;
图 18是 发明解码装置的实施例五的结构示意图;  18 is a schematic structural diagram of Embodiment 5 of the decoding device of the present invention;
图 19是;^发明编码装置的实施例六的结构示意图;  Figure 19 is a schematic structural view of Embodiment 6 of the invention encoding device;
图 20是 发明解码装置的实施例六的结构示意图;  20 is a schematic structural diagram of Embodiment 6 of the decoding device of the present invention;
图 21是^发明编码装置的实施例七的结构示意图;  Figure 21 is a schematic structural view of Embodiment 7 of the invention encoding device;
图 22是 发明解码装置的实施例七的结构示意图;  Figure 22 is a schematic structural diagram of Embodiment 7 of the decoding device of the present invention;
图 23是 发明编码装置的实施例八的结构示意图;  Figure 23 is a schematic structural view of Embodiment 8 of the inventive encoding device;
图 24是本发明编码装置的实施例九的结构示意图;  Figure 24 is a schematic structural view of Embodiment 9 of the coding apparatus of the present invention;
图 25是本发明编码装置的实施例十的结构示意图;  Figure 25 is a schematic structural view of Embodiment 10 of the encoding apparatus of the present invention;
图 26是本发明编码装置的实施例十一的结构示意图;  Figure 26 is a schematic structural view of Embodiment 11 of the encoding apparatus of the present invention;
图 27是本发明编码装置的实施例十二的结构示意图;  Figure 27 is a schematic structural view of Embodiment 12 of the encoding apparatus of the present invention;
图 28是本发明编码装置的实施例十三的结构示意图;  Figure 28 is a schematic structural view of Embodiment 13 of the encoding apparatus of the present invention;
图 29是本发明编码装置的实施例十四的结构示意图;  Figure 29 is a schematic structural view of Embodiment 14 of the encoding apparatus of the present invention;
图 30是本发明解码装置的实施例八的结构示意图;  Figure 30 is a schematic structural diagram of Embodiment 8 of the decoding apparatus of the present invention;
图 31是 发明编码装置的实施例十五的结构示意图;  Figure 31 is a schematic structural view of Embodiment 15 of the inventive encoding device;
图 32是本发明编码装置的实施例十六的结构示意图;  Figure 32 is a schematic structural view of Embodiment 16 of the encoding apparatus of the present invention;
图 33是 发明编码装置的实施例十七的结构示意图;  Figure 33 is a schematic structural view of Embodiment 17 of the inventive encoding device;
图 34是本发明编码装置的实施例十八的结构示意图;  Figure 34 is a schematic structural view of Embodiment 18 of the encoding apparatus of the present invention;
图 35是本发明编码装置的实施例十九的结构示意图;  Figure 35 is a schematic structural view of Embodiment 19 of the encoding apparatus of the present invention;
图 36是本发明解码装置的实施例九的结构示意图;  Figure 36 is a schematic structural diagram of Embodiment 9 of the decoding apparatus of the present invention;
图 37是本发明解码装置的实施例十的结构示意图;  Figure 37 is a schematic structural diagram of Embodiment 10 of the decoding apparatus of the present invention;
图 38是 发明解码装置的实施例十一的结构示意图;  Figure 38 is a schematic structural diagram of Embodiment 11 of the decoding device of the present invention;
图 39是本发明解码装置的实施例十二的结构示意图;  39 is a schematic structural diagram of Embodiment 12 of a decoding apparatus according to the present invention;
图 40是本发明解码装置的实施例十三的结构示意图。  Figure 40 is a block diagram showing the structure of a thirteenth embodiment of the decoding apparatus of the present invention.
具体实施方式 Detailed ways
图 1至图 4是现有技术的几种编码器的结构示意图,已在背景技术中进行了介绍, 此处不再赘述。  FIG. 1 to FIG. 4 are schematic structural diagrams of several encoders of the prior art, which have been introduced in the background art, and are not described herein again.
需要说明的是: 为方便、 清楚地说明本发明, 下述编解码装置的具体实施例是釆 用对应的方式说明的, 但并不表明编码装置与解码装置必须是——对应的。  It should be noted that, for convenience and clarity of the present invention, the specific embodiment of the following codec apparatus is described in a corresponding manner, but does not indicate that the encoding apparatus and the decoding apparatus must be corresponding.
如图 5所示, 本发明提供的音频编码装置包括重采样模块 50、 心理声学分析模块 51、 时频映射模块 52、 量化和熵编码模块 53、 频带扩展模块 54以及比特流复用模块 55 ; 其中重采样模块 50用于对输入的音频信号进行重采样; 心理声学分析模块 51用 于计算重采样后音频信号的掩蔽阈值和信掩比, 分析信号类型; 时频映射模块 52 用 于将时域音频信号转变成频域系数; 量化和熵编码模块 53用于在心理声学分析模块 51 输出的信掩比的控制下对频域系数进行量化和熵编码, 并输出到比特流复用模块 55 ; 频带扩展模块 54 用于将输入的音频信号在整个频带上进行分析, 提取高频部分 的谱包络及其与^ ί氐频部分产生关系的特性, 并输出到比特流复用模块 55 ; 比特流复用 模块 55用于将重采样模块 50、 量化和熵编码模块 53以及频带扩展模块 54输出的数 据进行复用, 形成音频编码码流。 ' As shown in FIG. 5, the audio encoding apparatus provided by the present invention includes a resampling module 50, a psychoacoustic analysis module 51, a time-frequency mapping module 52, a quantization and entropy encoding module 53, a band extension module 54, and a bit stream multiplexing module 55; The resampling module 50 is configured to resample the input audio signal; the psychoacoustic analysis module 51 The masking threshold and the signal mask ratio of the audio signal after resampling are calculated, and the signal type is analyzed; the time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients; and the quantization and entropy encoding module 53 is used in the psychoacoustic analysis module 51. The frequency domain coefficients are quantized and entropy encoded under the control of the output mask ratio, and output to the bit stream multiplexing module 55. The band expansion module 54 is configured to analyze the input audio signal over the entire frequency band to extract the high frequency portion. The spectral envelope and its characteristics related to the frequency portion are output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is used to expand the resampling module 50, the quantization and entropy encoding module 53 and the frequency band The data output by module 54 is multiplexed to form an audio coded stream. '
数字音频信号在重采样模块 50 中进行重采样, 改变了采样率, 重采样后的信号 分别输入心理声学分析模块 51和时频映射模块 52 ,—方面在心理声学分析模块 51中 计算该帧音频信号的掩蔽阔值和信掩比, 然后将信掩比作为控制信号传送给量化和熵 编码模块 53; 另一方面时域的音频信号通过时频映射模块 52转变成频域系数。 上述 频域系数在心理声学分析模块 51输出的信掩比的控制下, 在量化和熵编码模块 53中 进行量化和熵编码。 另一方面, 原始数字音频信号经过频带扩展模块 54 的分析, 获 得高频部分的谱包络及谱特性参数, 输出到比特流复用模块 55。 经过编码后的数据和 控制信号在比特:; jfc复用模块 55进行复用, 形成增强音频编码的码流。  The digital audio signal is resampled in the resampling module 50, the sampling rate is changed, and the resampled signal is input to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, respectively, and the frame audio is calculated in the psychoacoustic analysis module 51. The masked threshold and the signal mask ratio of the signal are then passed to the quantization and entropy encoding module 53 as a control signal; on the other hand, the audio signal in the time domain is converted to frequency domain coefficients by the time-frequency mapping module 52. The above-described frequency domain coefficients are quantized and entropy encoded in the quantization and entropy coding module 53 under the control of the mask ratio output by the psychoacoustic analysis module 51. On the other hand, the original digital audio signal is subjected to analysis by the band expansion module 54, and the spectral envelope and spectral characteristic parameters of the high frequency portion are obtained and output to the bit stream multiplexing module 55. The encoded data and control signals are multiplexed in a bit:; jfc multiplexing module 55 to form an enhanced audio coded code stream.
下面对上述音频编码装置的各个组成模块进行具体详细地说明。  The respective constituent modules of the above audio encoding device will be described in detail below.
重采样模块 50 用于对输入音频信号进行重采样, 重采样包括上采样和下采样两 种, 下面以下采样为例说明重采样。 在本实施例中, 重采样模块 50 包括低通滤波器 和下采样器,其中^ ί氐通滤波器用于限制音频信号的频带,消除下采样可能引起的混叠。 输入的音频信号经过低通滤波后, 进行下采样。 假设输入的音频信号为 s (n) , 经过脉 冲响应为 h (n)的低通滤波器滤波后的输出为 V (η) ,则有 v(n) = h(k)s(n - k) ;对 (η)  The resampling module 50 is used to resample the input audio signal, and the resampling includes upsampling and downsampling. The following sampling is used as an example to illustrate resampling. In this embodiment, the resampling module 50 includes a low pass filter and a down sampler, wherein the pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by downsampling. The input audio signal is low-pass filtered and downsampled. Assuming that the input audio signal is s (n) and the output filtered by the low-pass filter with the impulse response h (n) is V (η), then v(n) = h(k)s(n - k ); pair (η)
k=-∞  k=-∞
进行 Μ倍的下采样后的序列为 (η) , 则 x(m) = v(Mm) = ¾ h(k)s{Mm - k)。 这样, 重采 样后的音频信号 fe的采样率就比原始输入的音频信号 <? ^>的采样率降低了 M倍。 The sequence after downsampling is η, then x(m) = v(Mm) = 3⁄4 h(k)s{Mm - k). Thus, the sample rate of the resampled audio signal fe is reduced by a factor of M compared to the original input audio signal <?^>.
心理声学分析模块 51主要用于计算输入音频信号的掩蔽阁值、信掩比和感知熵, 分析信号类型。 才艮据心理声学分析模块 51 计算出的感知熵可动态地分析当前帧信号 进行透明编码所需的比特数, 从而调整帧间的比特分配。 心理声学分析模块 51 输出 各个子带的信掩比到量化和熵编码模块 53 , 对其进行控制。  The psychoacoustic analysis module 51 is mainly used for calculating the masking value, the mask ratio and the perceptual entropy of the input audio signal, and analyzing the signal type. The perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current frame signal to be transparently encoded, thereby adjusting the bit allocation between frames. The psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantization and entropy coding module 53, which controls it.
时频映射模块 52 用于实现音频信号从时域信号到频域系数的变换, 由滤波器组 构成, 具体可以是离散傅立叶变换(DFT )滤波器组、 离散余弦变换(DCT )滤波器组、 修正离散余弦变换 ( MDCT ) 滤波器组、 余弦调制滤波器组、 小波变换滤波器组等。  The time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc.
通过时频映射得到的频域系数被输出到量化和熵编码模块 53 中, 进行量化和编 码处理。  The frequency domain coefficients obtained by the time-frequency mapping are output to the quantization and entropy coding module 53 for quantization and coding processing.
量化和熵编码模块 53进一步包括了非线性量化器组和编码器, 其中量化器可以 是标量量化器或矢量量化器。 矢量量化器进一步分为无记忆矢量量化器和有记忆矢量 量化器两大类。 对于无记忆矢量量化器, 每个输入矢量是独立进行量化的, 与以前的 各矢量无关; 有 i己忆矢量量化器是在量化一个矢量时考虑以前的矢量, 即利用了矢量 之间的相关性。主要的无记忆矢量量化器包括全搜索矢量量化器、树搜索矢量量化器、 多级矢量量化器、 增益 /波形矢量量化器和分离均值矢量量化器; 主要的有记忆矢量 量化器包括预测矢量量化器和有限状态矢量量化器。 如果采用标量量化器, 则非线性量化器组进一步包括 M个子带量化器。 在每个子 带量化器中主要利用尺度因子进行量化, 具体是: 对 M个尺度因子带中所有的频域系 数进行非线.性压缩, 再 'j用尺度因子对该子带的频域系数进行量化, 得到整数表示的 量化谱输出到编码器, 寻每帧信号中的第一个尺度因子作为公共尺度因子输出到比特 流复用模块 55, 其它尺度因子与其前一个尺度因子进行差分处理后输出到编码器。 The quantization and entropy encoding module 53 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer. The vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; the i-resonant vector quantizer considers the previous vector when quantifying a vector, ie, uses correlation between vectors Sex. The main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a split mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer. If a scalar quantizer is employed, the non-linear quantizer group further includes M sub-band quantizers. In each sub-band quantizer, the scale factor is mainly used for quantization, specifically: performing nonlinear suppression on all frequency domain coefficients in the M scale factor bands, and then using the scale factor to determine the frequency domain coefficient of the sub-band. Quantization is performed, and the quantized spectrum obtained by the integer is output to the encoder, and the first scale factor in each frame signal is output as a common scale factor to the bit stream multiplexing module 55, and other scale factors are differentially processed with the previous scale factor. Output to the encoder.
上述步骤中的尺度因子是不断变化的值, 按照比特分配策略来调整。 本发明提供 了一种全局感知失真最小的比特分配策略, 具体如下:  The scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy. The present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:
首先, 初始化每个子带量化器, 所有子带中的谱系数的量化值均为零。 此时每个 子带的量化噪声等于每今子带的能量值, 每个子带的噪声掩蔽比丽 R等于它的信掩比 SMR , 量化所消耗的比恃数为 0 , 剩余比特数^等于目标比特数  First, each sub-band quantizer is initialized, and the quantized values of the spectral coefficients in all sub-bands are all zero. At this time, the quantization noise of each sub-band is equal to the energy value of each sub-band, and the noise masking ratio R of each sub-band is equal to its signal-to-mask ratio SMR, and the ratio of the number of remaining pixels is 0, and the number of remaining bits is equal to the target. Number of bits
其次, 查找噪声掩蔽比丽 R最大的子带, 若最大噪声掩蔽比丽 R小于等于 1 , 则 尺度因子不变, 输出分 己结果, 比特分配过程结束; 否则, 将对应的子带量化器的尺 度因子减小一个单位, 然后计算该子带所需增加的比特数 Δ5,. (β)。 若该子带的剩余 比特数
Figure imgf000009_0001
) , 则确认此次尺度因子的修改, 并将剩余比特数 A减去 Δ5; (¾ ) , 重新计算该子带的噪声掩蔽比丽 R , 然后继续查找噪声掩蔽比 NMR最大的子带, 重复 执行后续步骤。 如果该子带的剩余比特数 < Δ5; (β) , 则取消此次修改, 保留上一次 的尺度因子以及剩余比特数, 最后输出分配结果, 比特分配 t程结束。
Secondly, finding the subband with the largest noise masking ratio R, if the maximum noise masking ratio R is less than or equal to 1, the scale factor is unchanged, the output is divided into the result, and the bit allocation process ends; otherwise, the corresponding subband quantizer is The scale factor is reduced by one unit, and then the number of bits required to increase the subband Δ5, . (β) is calculated. If the remaining bits of the subband
Figure imgf000009_0001
), confirm the modification of the scale factor, and subtract the remaining bit number A by Δ5 ; (3⁄4), recalculate the noise mask of the subband, and then continue to find the subband with the noise masking ratio NMR, repeat Perform the next steps. If the remaining number of bits of the subband is < Δ5 ; (β), the modification is canceled, the previous scale factor and the remaining number of bits are retained, and finally the allocation result is output, and the bit allocation t process ends.
如果采用矢量量化器, 则频域系数组成多个 维矢量输入到非线性量化器组中, 对于每个 维矢量都根据平整因子进行谱平整, 即缩小谱的动态范围, 然后由矢量量 化器根据主观感知距离 度准则在码书中找到与待量化矢量距离最小的码字, 将对应 的码字索引传递给编码器。 平整因子是根据矢量量化的比特分配策略调整的, 而矢量 量化的比特分配则根据不同子带间感知重要度来控制。  If a vector quantizer is used, the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group. For each dimensional vector, the spectral flattening is performed according to the flattening factor, that is, the dynamic range of the spectrum is reduced, and then the vector quantizer is used. The subjective perceptual distance criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and passes the corresponding codeword index to the encoder. The flattening factor is adjusted according to the bit quantization strategy of the vector quantization, and the bit allocation of the vector quantization is controlled according to the perceived importance between different subbands.
在经过上述量化处理后, 利用熵编码技术进一步去除量化后的系数以及边信息的 统计冗余。 熵编码是一种信源编码技术, 其基本思想是: 对出现概率较大的符号给予 较短长度的码字,而对出现概率小的符号给予较长的码字,这样平均码字的长度最短。 根据 Shannon的无噪声编码定理, 如果传输的 N个源消息的符号是独立的, 那么使用  After the above quantization process, the quantized coefficients and the statistical redundancy of the side information are further removed by the entropy coding technique. Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then use
H(x) 1  H(x) 1
适当的变长度编码,码宇的平均长度 将满足 +■ 其中 H(x) l g2 (D)Appropriate variable length coding, the average length of the code will satisfy +■ where H(x) lg 2 (D)
Figure imgf000009_0002
Figure imgf000009_0002
表示信源的熵, 表示符号变量。 由于熵 r>>是平均码字长度的最短极限, 上述公式 表明此时码字的平均长度很接近于它的下限熵 HUd ,因此这种变长度编码技术又成为 "熵编码" 。 熵编码主要有 Huffman编码、 算术编码或游程编码等方法, 本发明中的 熵编码均可采用上述编;5马方法的任一种。 Represents the entropy of the source, representing the symbol variable. Since the entropy r>> is the shortest limit of the average codeword length, the above formula indicates that the average length of the codeword is very close to its lower bound entropy HUd, so this variable length coding technique becomes "entropy coding". Huffman entropy encoding major coding, arithmetic coding, or run-length coding method, the entropy encoding may be employed in the present invention, the above-described coding; method of any one of five horses.
经过标量量化器量化后输出的量化谙和差分处理后的尺度因子在编码器中进行 熵编码, 得到码书序号、 尺度因子编码值和无损编码量化谱, 再对码书序号进行熵编 码, 得到码书序号编码值, 然后将尺度因子编码值、 码书序号编码值和无损编码量化 谱输出到比特流复用模块 55中。  After the scalar quantizer quantizes the quantized 谙 and the differentially processed scale factor is entropy encoded in the encoder, the codebook serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the codebook sequence number is entropy encoded to obtain The code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantization spectrum to the bit stream multiplexing module 55.
经过矢量量化器量化后得到的码字索引在编码器中进行一维或多维熵编码, 得到 码字索引的编码值, 然后将码字索引的编码值输出到比特流复用模块 55中。 The codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain The coded value of the codeword index is then output to the bitstream multiplexing module 55.
原始输入音频信号输入 j频带扩展模块 54后, 在整个频带上进行分析, 提取出 高频部分的谱包络及其与低频部分产生关系的特性, 作为频带扩展控制信息输出到比 特流复用模块 55。  After the original input audio signal is input to the j-band extension module 54, the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the band extension control information to the bit stream multiplexing module. 55.
频带扩展的基本原理是: 对于大多数音频信号, 其高频部分的特性与低频部分的 特性存在很强的相关性, 因 jt匕音频信号的高频部分可以通过其低频部分有效地重构出 来, 这样, 音频信号的高频 分可以不传输。 为确保高频部分能够正确的重构, 在压 缩音频码流中仅传输少量的频带扩展控制信号就可以了。  The basic principle of band extension is: For most audio signals, the characteristics of the high-frequency part have a strong correlation with the characteristics of the low-frequency part, because the high-frequency part of the jt匕 audio signal can be effectively reconstructed through its low-frequency part. Thus, the high frequency component of the audio signal may not be transmitted. To ensure proper reconstruction of the high frequency portion, only a small number of band extension control signals are transmitted in the compressed audio stream.
频带扩展模块 54 包括参数提取模块和谱包络提取模块, 输入信号进入参数提取 模块中,提取在不同时频区 i或表示输入信号谱特性的参数,然后在谱包络提取模块中, 以一定的时频分辨率估计信号高频部分的谱包络。 为了确保时频分辨率最适合于当前 输入信号的特性, 谱包络的日寸频分辨率可以自由选择。 输入信号谱特性的参数和高频 部分的谱包络作为频带扩展的输出送到比特流复用模块 55中复用。  The band expansion module 54 includes a parameter extraction module and a spectral envelope extraction module, and the input signal enters the parameter extraction module to extract parameters in different time-frequency regions i or representing spectral characteristics of the input signal, and then in the spectral envelope extraction module, The time-frequency resolution estimates the spectral envelope of the high frequency portion of the signal. In order to ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the spectral resolution of the spectral envelope can be chosen freely. The parameters of the input signal spectrum characteristics and the spectral envelope of the high frequency portion are sent to the bit stream multiplexing module 55 for multiplexing as the output of the band extension.
比特流复用模块 55收到量化和熵编码模块 53输出的包括公共尺度因子、 尺度因 子编码值、 码书序号编码值和无损编码量化谱的码流或者是码字索引的编码值以及频 带扩展模块 54输出的有关信息后, 对其进行复用, 得到压缩音频数据流。  The bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum output by the quantization and entropy coding module 53 or the coded value of the codeword index and the band extension. After the information output by the module 54 is multiplexed, a compressed audio data stream is obtained.
基于上述编码器的编码方法, 具体包括: 在整个频带上分析输入音频信号, 提取 其高频谱包络和信号谱特性参数作为频带扩展控制信号; 对输入音频信号进行重采样 处理; 计算重采样后信号的信掩比; 对重采样后的信号进行时频映射, 获得音频信号 的频域系数; 对频域系数进行量化和熵编码; 将频带扩展控制信号和编码后的音频码 流进行复用, 得到压缩音频码流。  The encoding method based on the above encoder specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal; calculating the resampling Signal-to-mask ratio of the signal; time-frequency mapping of the resampled signal to obtain frequency domain coefficients of the audio signal; quantization and entropy coding of the frequency domain coefficients; multiplexing of the band extension control signal and the encoded audio code stream , get the compressed audio stream.
重采样处理包括两个步骤: 限制音频信号的频带; 对限制频带的音频信号进行多 倍的下釆样。  The resampling process consists of two steps: limiting the frequency band of the audio signal; and multiplying the audio signal of the limited band.
对时域音频信号进行时频变换的方法有很多, 如离散傅立叶变换(DFT ) 、 离散 余弦变换(DCT ) 、 修正离散余弦变换(MDCT ) 、 佘弦调制滤波器组、 小波变换等。 下面以修正离散余弦变换 MDCT和余弦调制滤波为例说明时频映射的过程。  There are many methods for time-frequency transform of time-domain audio signals, such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), chord-modulated filter bank, wavelet transform, and so on. The following is a description of the process of time-frequency mapping with modified discrete cosine transform MDCT and cosine modulation filtering as examples.
对于采用修正离散余弦变换 MDCT进行时频变换的情况, 首先选取前一帧 个样 本和当前帧 个样本的时域信号, 再对这两帧共 2 个样本的时域信号进行加窗操作, 然后对经过加窗后的信号进行 MDCT变换, 从而获得 个频域系数。  For the case of time-frequency transform using the modified discrete cosine transform (MDCT), first select the time domain signals of the previous frame and the current frame samples, and then window the time domain signals of the two frames of the two frames, and then The MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.
MDCT分析滤波器的脉冲响应为:  The impulse response of the MDCT analysis filter is:
则 MDCT变换为: X(k) = 函数; x (n) MDCTThen the MDCT transform is: X(k) = function; x (n) MDCT
Figure imgf000010_0001
Figure imgf000010_0001
变换的输入时域信号; J >为 MDCT变换的输出频域信号。 Transformed input time domain signal; J > is the output frequency domain signal of the MDCT transform.
为满足信号完全重构的条件, MDCT变换的窗函数 必须满足以下两个条件: w(2M - l - n) = w(n) 且 w2 (n) + w2 (n + M) = l . In order to satisfy the condition of complete signal reconstruction, the window function of MDCT transformation must satisfy the following two conditions: w(2M - l - n) = w(n) and w 2 (n) + w 2 (n + M) = l .
在实际中, 可选用 S ine 窗作为窗函数。 当然, 也可以通过使用双正交变换, 用 特定的分析滤波器和合成滤波器修改上述对窗函数的限制。 对于采用余弦调制滤波进行时频变换的情况, 则首先选取前一帧 个样本和当前 帧 个样本的时域信号, 再对这两帧共 2 个样本的时域信号进行加窗操作, 然后对 经过加窗后的信号进行余弦调制变换, 从而获得 个频域系数。 In practice, the S ine window can be used as a window function. Of course, the above limitation on the window function can also be modified by using a bi-orthogonal transform with a specific analysis filter and synthesis filter. For the case of time-frequency transform using cosine modulation filtering, first select the time domain signal of the previous frame sample and the current frame sample, and then perform windowing operation on the time domain signals of two samples of the two frames, and then The windowed signal is subjected to cosine modulation conversion to obtain a frequency domain coefficient.
传统的余弦调制滤波技术的冲击响应为
Figure imgf000011_0001
The impulse response of the traditional cosine modulation filtering technique is
Figure imgf000011_0001
n=0X---,N,— 1
Figure imgf000011_0002
其中 0≤ <M— 1, 0≤n<2KM-\, 为大于零的整数, 1 ^。 假设 M子带余弦调制滤波器组的分析窗 (分析原型滤波器) 的冲击响应长 度为 N。, 综合窗(综合原型滤波器)
Figure imgf000011_0003
的冲击响应长度为 Ns。 当分析窗和综合窗 相等时, 即 ? αθ) = Αθ),且 Na =N, 由上面两式所表示的余弦调制滤波器组为正交滤 波器组, 此时矩阵 7和 F ( [H]nk =hk (n),[F]llk = fk{n) ) 为正交变换矩阵。 为获得线 性相位滤波器组, 进一步规定对称窗'; 足 pa(2KM-\-n) = pa(n)。 为保证正交和双正 交系统的完全重构性,窗函数还需满 L一定的条件,详见文献" Multirate Systems and Filter Banks" , P. P. Vaidynathan, Prentice Hall, Englewood Cliffs, NJ, 1993。
n=0X---,N,- 1
Figure imgf000011_0002
Where 0 ≤ <M-1, 0 ≤ n < 2KM-\, is an integer greater than zero, 1 ^. Assume that the analysis window of the M sub-band cosine-modulated filter bank (analytical prototype filter) has an impulse response length of N. , integrated window (integrated prototype filter)
Figure imgf000011_0003
The impact response length is N s . When the analysis window and the synthesis window are equal, that is, α θ) = Α θ), and N a = N, the cosine-modulated filter bank represented by the above two equations is an orthogonal filter bank, at which time the matrix 7 and F ( [H] nk =h k (n), [F] llk = f k {n) ) is an orthogonal transformation matrix. To obtain a linear phase filter bank, a symmetric window ' is further defined'; foot p a (2KM-\-n) = p a (n). In order to ensure the complete reconfiguration of orthogonal and biorthogonal systems, the window function needs to be full of certain conditions, see the literature "Multirate Systems and Filter Banks", PP Vaidynathan, Prentice Hall, Englewood Cliffs, NJ, 1993.
计算重采样后信号的掩蔽阈值和信掩比包括以下步骤:  Calculating the masking threshold and the mask ratio of the resampled signal includes the following steps:
第一步、 将信号进行时域到频 i或的映射。 可采用快速傅立叶变换和汉宁窗 (hanning window)技术将时域数据转换成频域系数 X[k , 用幅度 和相位 ί¾ ]表示为; :] = Γ[ : Μ ,那么每个子带的能量 是该子带内所有谱线能量的和, 即 e[b]= , 其中 和^分別表示子带 6的上下边界。The first step is to map the signal to the time domain to frequency i or. The fast Fourier transform and the hanning window technique can be used to convert the time domain data into frequency domain coefficients X[k , expressed as amplitude and phase ί3⁄4 ]; :] = Γ[ : Μ , then the energy of each subband Is the sum of the energy of all the spectral lines in the subband, ie e[b]=, where sum and ^ respectively represent the upper and lower boundaries of the subband 6.
Figure imgf000011_0004
Figure imgf000011_0004
第二步、 确定信号中的音调和非奮调成分。 信号的音调性是通过对每个谱线进行 帧间预测来估计的, 每个谱线的预测值和真实值的欧氏距离被映射为不可预测测度, 高预测性的谱成分被认为是音调性很强的, 而低预测性的谱成分被认为是类噪声的。  The second step is to determine the tonal and non-sounding components in the signal. The tonality of the signal is estimated by inter-predicting each spectral line. The Euclidean distance between the predicted and true values of each spectral line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones. Very strong, and low-predictive spectral components are considered to be noise-like.
预测值的幅度 rpred和相位 φρ ά可用以下公式来表示: rpred [k] = rM [k] + (rf_! [k]― rt_2 [^] ) , 表示当前帧的系数; - 1表示前一帧的系数; - 2表示前两帧的系数。 The magnitude r pred and the phase φ ρ ά of the predicted value can be expressed by the following formula: r pred [k] = r M [k] + (r f _! [k] - r t _ 2 [^] ) , indicates the coefficient of the current frame; -1 indicates the coefficient of the previous frame; - 2 indicates the coefficient of the first two frames.
那么, 不可预测度 的计算公式为: C —
Figure imgf000012_0001
其中, 欧氏距离 ifet(jq ],jrprerf[i:])采用下式计算: dist(X[k],Xpred[k]) = - Xpred[k]
Then, the formula for calculating the unpredictability is: C —
Figure imgf000012_0001
Among them, the Euclidean distance ifet(jq), jr prerf [i:]) is calculated by the following formula: dist(X[k], X pred [k]) = - X pred [k]
Figure imgf000012_0002
Figure imgf000012_0002
= irik] cos(( ]) - rpred [k] cos [A:]))2 +- (r[k] sin(^[^])― rpred [k] sin( [ ]))22 因此,每个子带的不可预测度 是该子带内所有谱线的能量对其不可预测度的 k-kh = i r ik] cos(( ]) - r pred [k] cos [A:])) 2 +- (r[k] sin(^[^])- r pred [k] sin( [ ])) 22 Therefore, the unpredictability of each sub-band is the kk h of the unpredictability of the energy of all the lines in the sub-band
加权和, 即 c[b]= 2 :]Γ2[:]。 子带能量 eO]和不可预测度 分别与扩展函数进 行卷积运算, 得到子带能量扩展 和予带不可预测度扩展 ^[6] , 掩模 对子带 6 的扩展函数表示为 为了消除扩展函数对能量变换的影响, 需要对子带不可预 测度扩展 6]做归一化处理, 其归一化的结果用 S [b]表示为 [b] = ^¾。 同样, 为 消除扩展函数对子带能量的影响,定义归一化能量扩展? | ]为 [6] = ^, 其中归一 n[b] Weighted sum, ie c[b]= 2 :]Γ 2 [:]. The subband energy eO] and the unpredictability are respectively convoluted with the extension function to obtain the subband energy extension and the preband unpredictability extension^[6], and the extension function of the mask pair subband 6 is expressed as a function to eliminate the extension function. The effect on the energy transformation needs to be normalized to the sub-band unpredictability extension 6], and the normalized result is expressed as [b] = ^3⁄4 by S [b]. Similarly, to eliminate the effect of the extension function on the energy of the subband, define the normalized energy extension? | ] is [6] = ^, where normal n[b]
6— max  6- max
化因子 为: 《[b]= 为该帧信号所分的子带数。 The factor is: "[b]= is the number of subbands divided by the frame signal.
;=1 根据归一化不可预测度扩展 jb] , 可计算子带的音调性 : t[b] = -0.299 - 0.43 loge (cs [b]) ,且 0≤t[6]≤l。 当 时,表示该子带信号为纯音调; 当 6]=0时, 表示该子带信号为白噪声。 ;=1 According to the normalized unpredictability extension jb], the pitch of the subband can be calculated: t[b] = -0.299 - 0.43 log e (c s [b]) , and 0 ≤ t[6] ≤ l . At that time, it indicates that the sub-band signal is a pure tone; when 6] = 0, it indicates that the sub-band signal is white noise.
第三步、 计算每个子带所需的信噪比 ( Signal- to- Noise Ratio, 筒称 SNR) 。 将 所有子带的噪声掩蔽音调 (Noise-Masking- Tone, 简称 NMT ) 的值设为 5dB, 音调掩 蔽噪声 (Tone- Masking-Noise, 简称 TMN) 的值设为 18dB, 若要使噪声不被感知, 则 每个子带所需的信噪比 SNR [ b]是 SA¾[6] = 1 St[b] + 6(1 - t[b])。  The third step is to calculate the signal-to-noise ratio (SNR) of each sub-band. Set the value of Noise-Masking-Tone (NMT) of all sub-bands to 5dB, and the value of Tone-Making-Noise (TMN) to 18dB, so that noise is not perceived. , then the signal-to-noise ratio SNR [b] required for each sub-band is SA3⁄4[6] = 1 St[b] + 6(1 - t[b]).
第四步、 计算每个子带的掩蔽阐值以及信号的感知熵, 并进行信号类型分析。 根 据前述步驟得到的每个子带的归一化信号 ¾量和所需的信噪比 SNR, 计算每个子带的 噪声能量阈值 [6]为 / ] = [&] 10— 6]/10The fourth step is to calculate the masking value of each sub-band and the perceptual entropy of the signal, and perform signal type analysis. According to the normalized signal amount of each sub-band obtained by the foregoing steps and the required signal-to-noise ratio SNR, the noise energy threshold [6] of each sub-band is calculated as / ] = [&] 10 - 6]/10 .
为了避免预回声的影响,将当前帧的噪声能量阈值 ? [6]与前一帧的噪声能量阈值 In order to avoid the effects of pre-echo, the noise energy threshold of the current frame? [6] and the noise energy threshold of the previous frame.
"prev[b]进行比较, 得到信号的掩蔽阔值为 "[0] = minO ],2 rev[b]), 这样可以确保掩蔽 阈值不会因为在分析窗的近末端有高能量的冲击产生而出现偏差。 " prev [b] compares and obtains the masked threshold of the signal as "[0] = minO ], 2 rev [b]), which ensures that the masking threshold is not generated by high-energy shock at the near end of the analysis window. And there is a deviation.
进一步地, 考虑静止掩蔽阈值 ^i ?r[6]的影响, 选择最终的信号的掩蔽阈值为静 止掩蔽阈值与上述计算的掩蔽阈值两者中的数值大者, 即
Figure imgf000013_0001
。 然 后采用如下公式计算感知熵,即 pe ^ - ifbwidthb X loglQ (n[b]/(e[b] + l))),其中 cbwidthb 表示各子带所包含的谱线个数。
Further, considering the influence of the static masking threshold ^i ?r[6], the masking threshold for selecting the final signal is static. The value of the masking threshold and the masked threshold calculated above is greater, ie
Figure imgf000013_0001
. Then the perceptual entropy calculated using the following formula, i.e. pe ^ - ifbwidthb X log lQ ( n [b] / (e [b] + l))), where represents the number of spectral lines cbwidth b included in each sub-band.
判断某一帧信号的感知熵是否超过指定的门限 PE— SWITCH , 如果超过, 则该帧信 号为快变类型, 否则为緩变类型。  Determine whether the perceptual entropy of a certain frame signal exceeds the specified threshold PE-SWITCH. If it exceeds, the frame signal is fast-changing type, otherwise it is a slow-change type.
第五步: 计算每个子带信号的信掩比 ( S i gna l - to- Ma sk Ra t i o , 简称 SMR ) 。 每
Figure imgf000013_0002
Step 5: Calculate the mask ratio (S i gna l - to- Ma sk Ra tio , SMR for short) of each sub-band signal. each
Figure imgf000013_0002
在获得了子带信号的信掩比后, 根据信掩比对频域系数进行量化和熵编码, 其中 量化可以是标量量化或矢量量化。  After the mask ratio of the subband signal is obtained, the frequency domain coefficients are quantized and entropy encoded according to the mask ratio, wherein the quantization may be scalar quantization or vector quantization.
标量量化包括以下步骤: 对所有尺度因子带中的频域系数进行非线性压缩; 再利 用每个子带的尺度因子对该子带的频域系数进行量化, 得到整数表示的量化谱; 选择 每帧信号中的第一个尺度因子作为公共尺度因子; 其它尺度因子与其前一个尺度因子 进行差分处理。  The scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer; selecting each frame The first scale factor in the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.
矢量量化包括以下步骤: 将频域系数构成多个多维矢量信号; 对于每个 维矢量 都根据平整因子进行谱平整; 根据主覌感知距离测度准则在码书中查找与待量化矢量 距离最小的码字, 获得其码字索引。  The vector quantization comprises the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; performing spectral flattening according to a flattening factor for each dimension vector; finding a code having the smallest distance from the vector to be quantized in the codebook according to the main 覌 perceptual distance measure criterion Word, get its codeword index.
熵编码步骤包括:对量化谱和差分处理后的尺度因子进行熵编码,得到码书序号、 尺度因子编码值和无损编码量化谱; 对码书序号进行熵编码, 得到码书序号编码值。  The entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the codebook sequence number to obtain a codebook serial number coded value.
或者是: 对码字索引进行一维或多维熵编码, 将到码字索引的编码值。  Or: A one-dimensional or multi-dimensional entropy encoding of the codeword index, which will be the encoded value of the codeword index.
上述的熵编码方法可以采用现有的 Huffman编 马、 算术编码或游程编码等方法中 的任一种。  The above entropy coding method may use any of the existing Huffman codec, arithmetic coding or run length coding.
经过量化和熵编码处理后, 得到编码后的音频码流, 将该码流与公共尺度因子、 频带扩展控制信号一起进行复用, 得到压缩音频码流。  After the quantization and entropy coding processing, the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and the band extension control signal to obtain a compressed audio code stream.
图 6是本发明音频解码装置的结构示意图。 音 解码装置包括比特流解复用模块 601、熵解码模块 602、逆量化器组 603、频率-时间映射模块 604和频带扩展模块 605。 压缩音频码流经过比特流解复用模块 601的解复用后, 得到相应的数据信号和控制信 号, 输出到熵解码模块 602和频带扩展模块 605。 教据信号和控制信号在熵解码模块 602中进行解码处理, 恢复出谱的量化值。 上迷量化值在逆量化器组 603中重建, 得 到逆量化后的谱, 逆量化谱输出到频率 -时间映射模块 604中, 经过频率 -时间映射得 到时域的音频信号, 并在频带扩展模块中 605中, 重建高频信号部分, 得到宽频带的 时域音频信号。  Figure 6 is a block diagram showing the structure of an audio decoding device of the present invention. The tone decoding apparatus includes a bit stream demultiplexing module 601, an entropy decoding module 602, an inverse quantizer group 603, a frequency-time mapping module 604, and a band extension module 605. After the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 601, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 602 and the band extension module 605. The teaching signal and the control signal are subjected to decoding processing in the entropy decoding module 602 to recover the quantized value of the spectrum. The upper quantized value is reconstructed in the inverse quantizer group 603, and the inverse quantized spectrum is obtained. The inverse quantized spectrum is output to the frequency-time mapping module 604, and the time domain audio signal is obtained through frequency-time mapping, and is in the band extension module. In the middle 605, the high frequency signal portion is reconstructed to obtain a wide-band time domain audio signal.
比特流解复用模块 601对压缩音频码流进行分解, 得到相应的数据信号和控制信 号, 为其他模块提供相应的解码信息。 压缩音频数据流经过解复用后, 输出到熵解码 模块 602的信号包括公共尺度因子、 尺度因子编码 直、 码书序号编码值和无损编码量 化谱, 或者是码字索引的编码值; 输出频带扩展控喇信息到频带扩展模块 605。  The bit stream demultiplexing module 601 decomposes the compressed audio code stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules. After the compressed audio data stream is demultiplexed, the signals output to the entropy decoding module 602 include a common scale factor, a scale factor encoding straight, a codebook serial number encoded value, and a lossless encoded quantized spectrum, or an encoded value of a codeword index; The control information is extended to the band extension module 605.
在编码装置中, 如果量化和熵编码模块 53中采用标量量化器, 则在解码装置中, 嫡解码模块 602收到的是比特流解复用模块 601输出的公共尺度因子、 尺度因子编码 值、 码书序号编码值和无损编码量化谱, 然后对其进行码书序号解码、 谱系数解码和 尺度因子解码, 重建出量化谱, 并向逆量化器组 603输出尺度因子的整数表示和谱的 量化值。 熵解码模块 602采用的解码方法与编码装置中熵编码的编码方法相对应, 如 Huffman解码、 #术解码或游程解码等。 In the encoding apparatus, if the scalar quantizer is used in the quantization and entropy coding module 53, in the decoding apparatus, the 嫡 decoding module 602 receives the common scale factor, the scale factor coded value output by the bit stream demultiplexing module 601, The code book serial number coded value and the lossless coded quantized spectrum are then decoded by the code book number, the spectral coefficient is decoded, and The scale factor is decoded, the quantized spectrum is reconstructed, and the integer representation of the scale factor and the quantized value of the spectrum are output to the inverse quantizer group 603. The decoding method adopted by the entropy decoding module 602 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, #术 decoding, or run-length decoding.
逆量化器组 603接收到谱的量化值和尺度因子的整氣表示后, 将傳的量化值逆量 化为无缩放的重建谱(逆量化谱) , 并向频率-时间映射模块 604输出逆量化谱。 逆 量化器组 603可以是均匀量化器组, 也可以是通过压扩 数实现的非均匀量化器组。 在编码装置中, 量化器组采用的是标量量化器, 则在解瑪装置中逆量化器组 603也釆 用标量逆量化器。 在标量逆量化器中, 首先对谱的量化值进行非线性扩张, 然后利用 每个尺度因子得到对应尺度因子带中所有的谱系数(逆 t化谱) 。  After receiving the quantized value of the spectrum and the gas expression of the scale factor, the inverse quantizer group 603 inversely quantizes the transmitted quantized value into a non-scaled reconstructed spectrum (inverse quantized spectrum), and outputs inverse quantization to the frequency-time mapping module 604. Spectrum. The inverse quantizer group 603 may be a uniform quantizer group or a non-uniform quantizer group realized by a companding number. In the encoding apparatus, the quantizer group employs a scalar quantizer, and the inverse quantizer group 603 also employs a scalar inverse quantizer in the numerator apparatus. In the scalar inverse quantizer, the quantized values of the spectra are first nonlinearly expanded, and then each spectral factor is used to obtain all the spectral coefficients (inverse t-spectrum) in the corresponding scale factor bands.
如果量化和域编码模块 53中采用矢量量化器, 则在解码装置中, 熵解码模块 602 收到比特流解复用模块 601输出的码字索引的编码值, ^l 码字索引的编码值采用与编 码时的熵编码方法对应的熵解码方法进行解码, 得到对应的码字索引。  If the vector quantizer is used in the quantization and domain coding module 53, in the decoding apparatus, the entropy decoding module 602 receives the coded value of the codeword index output by the bitstream demultiplexing module 601, and the coded value of the ^1 codeword index is used. The entropy decoding method corresponding to the entropy encoding method at the time of encoding is decoded to obtain a corresponding codeword index.
码字索引输出到逆量化器组 603中, 通过查询码书, 得到量化值(逆量化侮) , 输出到频率-时间映射模块 604。 逆量化器组 603采用逆矢量量化器。  The codeword index is output to the inverse quantizer group 603, and the quantized value (inverse quantization 侮) is obtained by querying the codebook, and output to the frequency-time mapping module 604. The inverse quantizer group 603 employs an inverse vector quantizer.
逆量化谱通过频率-时间映射模块 604的映射处理, 得到低频段的时域音频信号。 频率-时间映射樓块 604 可以是逆离散余弦变换(IDCT ) 滤波器组、 逆离散傅立叶变 换(IDFT ) 滤波器组、 逆修正离散余弦变换(IMDCT ) 滤波器组、 逆小波变换滤波器 组以及余弦调制 波器组等。  The inverse quantization spectrum is processed by the mapping of the frequency-time mapping module 604 to obtain a time domain audio signal of a low frequency band. The frequency-time mapping block 604 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and Cosine modulating wave group, etc.
频带扩展模块 605接收比特流解复用模块 601输出的频带扩展信息和频率 -时间 映射模块 604输 t±!的低频段的时域音频信号, 通过频谱搬移和高频调整重建高频信号 部分, 输出宽频带音频信号。  The band extension module 605 receives the band extension information output by the bit stream demultiplexing module 601 and the time domain audio signal of the low frequency band of the frequency-time mapping module 604, and reconstructs the high frequency signal part by spectrum shifting and high frequency adjustment. Output a wideband audio signal.
基于上述解 器的解码方法包括: 对压缩音频码流进行解复用, 得到数据信息和 控制信息;对上 信息进行熵解码,得到 的量化值; 对谱的量化值进行逆量化处理, 得到逆量化谱; 4寄逆量化谱进行频率 -时间映射, 得到低频段的时域音频信号; 根据 频带扩展控制信, 重建时域音频信号的高频部分, 得到宽频带音频信号。  The decoding method based on the above solution includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the upper information to obtain a quantized value; performing inverse quantization processing on the quantized value of the spectrum to obtain an inverse Quantization spectrum; 4 inverse quantization spectrum for frequency-time mapping, to obtain a low-frequency time domain audio signal; according to the band extension control signal, reconstruct the high-frequency portion of the time domain audio signal to obtain a wide-band audio signal.
如果解复用后的信息中包括码书序号编码值、 公共尺度因子、 尺度因子编码值和 无损编码量化谱, 则表明在编码装置中谱系数是采用标量量化技术进行量化, 则熵解 码的步驟包括: 对码书序号编码值进行解码, 获得所有尺度因子带的码书序号; 根据 码书序号对应的码书, 解码所有尺度因子带的量化系数; 解码所有尺度因子带的尺度 因子, 重建量化 i鲁。 上述过程所采用的熵解码方法对应编码方法中的熵编码方法, 如 游程解码方法、 Huffman解码方法、 算术解码方法等。  If the demultiplexed information includes a codebook serial number code value, a common scale factor, a scale factor coded value, and a lossless coded quantized spectrum, it indicates that the spectral coefficients are quantized using a scalar quantization technique in the encoding device, and the entropy decoding step is performed. The method comprises: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing the quantization i Lu. The entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.
下面以采用游程解码方法解码码书序号、 采用 Huffman解码方法解码量化系数和 采用 Huffman解码方法解码尺度因子为例, 说明熵解码的过程。  In the following, the process of entropy decoding is illustrated by using the run length decoding method to decode the code book number, using the Huffman decoding method to decode the quantized coefficients, and using the Huffman decoding method to decode the scale factor.
首先通过游程解码方法获得所有尺度因子带的码书序号, 解码后的码书序号为某 一区间的整数, ¾Wg殳该区间为 [0, 11] , 那么只有位于设有效范围内的, 即 0 至 11 之间的码书序号才与对应的谱系数 Huffman码书相对应。 对于全零子带, 可选择某一 码书序号对应, 與型的可选 0序号。  First, the codebook number of all scale factor bands is obtained by the run-length decoding method. The decoded codebook sequence number is an integer of a certain interval, and the interval is [0, 11], then only the valid range is set, that is, 0. The codebook number between 11 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For the all-zero sub-band, you can select the corresponding code number of the code book, and the optional 0 number of the type.
当解码得到各尺度因子带的码书号后, 使用与该码书号对应的谱系数 Huffman码 书, 对所有尺度因子带的量化系数进行解码。 如果一个尺度因子带的码书号在有效范 围内, 本实施例^口在 1至 11之间, 那么该码书号对应一个谱系数码书, 则使用该码 书从量化谱中解码得到尺度因子带的量化系数的码字索引 , 然后从码字索引中解包得 到量化系数。 如果尺度因子带的码书号不在 1至 11之间, 么该码书号不对应任何 谱系数码书, 该尺度因子带的量化系数也就不用解码, 直接将该子带的量化系数全部 置为零。 After decoding the codebook number of each scale factor band, the quantized coefficients of all scale factor bands are decoded using the spectral coefficient Huffman codebook corresponding to the codebook number. If the codebook number of a scale factor is within the valid range, and the embodiment is between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the codebook is decoded from the quantized spectrum to obtain the scale factor band. The codeword index of the quantized coefficients, and then unpacked from the codeword index To the quantized coefficient. If the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the scale factor band is not decoded, and the quantized coefficients of the subband are all set to zero.
尺度因子用于在逆量化谱系数基础上重构谱值, 如果尺度因子带的码书号处于有 效范围内, 则每一个码书号都对应一个尺度因子。 在对上述尺度因子进行解码时, 首 先读取第一个尺度因子所占用的码流, 然后对其它尺度因子进行 Huffman解码, 依次 得到各尺度因子与前一尺度因子之间的差值, 将该差值与前一尺度因子值相加, 得到 各尺度因子。如果当前子带的量化系数全部为零,那么该子带的尺度因子不需要解码。  The scale factor is used to reconstruct the spectral value based on the inverse quantized spectral coefficient. If the codebook number of the scale factor is in the effective range, each codebook number corresponds to a scale factor. When decoding the above scale factor, first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. If the quantized coefficients of the current subband are all zero, the scale factor of the subband does not need to be decoded.
经过上述熵解码过程后, 得到谱的量化值和尺度因子的整数表示, 然后对谱的量 化值进行逆量化处理, 获得逆量化谱。 逆量化处理包括: 对谱的量化值进行非线性扩 张; 根据每个尺度因子得到对应尺度因子带中的所有谱系数 (逆量化 ) 。  After the above entropy decoding process, an integer representation of the spectral quantized value and the scale factor is obtained, and then the quantized value of the spectrum is inversely quantized to obtain an inverse quantized spectrum. The inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all the spectral coefficients (inverse quantization) in the corresponding scale factor bands according to each scale factor.
如果解复用后的信息中包括码字索引的编码值, 则表明編码装置中采用矢量量化 技术对谱系数进行量化, 则熵解码的步骤包括: 采用与编码装置中熵编码方法对应的 熵解码方法对码字索引的编码值进行解码, 得到码字索引。 然后对码字索引进行逆量 化处理, 获得逆量化谱。  If the demultiplexed information includes the coded value of the codeword index, it indicates that the coding device uses the vector quantization technique to quantize the spectral coefficients, and the entropy decoding step includes: adopting an entropy corresponding to the entropy coding method in the encoding device. The decoding method decodes the encoded value of the codeword index to obtain a codeword index. The codeword index is then inversely quantized to obtain an inverse quantized spectrum.
对逆量化谱进行频率-时间映射处理的方法与编码方法中的时-频映射处理方法 相对应, 可以采用逆离散余弦变换(IDCT) 、 逆离散傅立叶变换( IDFT) 、 逆修正离 散余弦变换(IMDCT) 、 逆小波变换等方法完成。  The method of performing frequency-time mapping processing on the inverse quantization spectrum corresponds to the time-frequency mapping processing method in the encoding method, and may be an inverse discrete cosine transform (IDCT), an inverse discrete Fourier transform (IDFT), or an inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.
下面以逆修正离散余弦变换 IMDCT为例说明频率 -时间映射过程。 频率-时间映射 过程包括三个步骤: IMDCT变换、 时域加窗处理和时域叠加运算。  The inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process. The frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.
首先对预测前的谱或逆量化谱进行 IMDCT变换,得到变换后的时域信号 x,.,„。 IMDCT  First, the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x, ., „. IMDCT
,
变换的表达式为: 其中, 表示样本序号,
Figure imgf000015_0001
The transformed expression is: where, represents the sample number,
Figure imgf000015_0001
且 0≤w< N , 表示时域样本数, 取值为 2048, 2+1)/ 2; /表示帧序号; r表 示谱序号。 And 0 ≤ w < N , indicating the number of time domain samples, the value is 2048, 2 + 1) / 2; / indicates the frame number; r indicates the spectrum number.
其次,对 IMDCT变换获得的时域信号在时域进行加窗处理。为满足完全重构条件, 窗函数 &必须满足以下两个条件: w(2M-l- ") = w(")且 w2(") + w2(w + M) = l。 Secondly, the time domain signal obtained by the IMDCT transform is windowed in the time domain. To satisfy the full reconstruction condition, the window function & must satisfy the following two conditions: w(2M-l- ") = w(") and w 2 (") + w 2 (w + M) = l.
典型的窗函数有 Sine窗、 Kaiser- Bessel窗等。 本发明采用一种固定的窗函数, 其窗函数为: w( -£)=cos(pl/2*( +Q.5)/N-0.94*sin (2*/? / * (1+0.5)) / (2*pi))) , 其中 k= 0... N-l; &表示窗函数的第 k个系数, 有 w(k) = w (2*^-1-1) ; 表示编 码帧的样本数, 取值为 1024。 另外可以利用默正交变换, 采用特定的分析滤波器和 合成滤波器修改上述对窗函数的限制。  Typical window functions are Sine windows, Kaiser-Bessel windows, and the like. The present invention employs a fixed window function whose window function is: w( -£) = cos(pl/2*( +Q.5)/N-0.94*sin (2*/? / * (1+0.5) )) / (2*pi))) , where k = 0... Nl; & represents the kth coefficient of the window function, with w(k) = w (2*^-1-1) ; The number of samples, which is 1024. Alternatively, the scalar orthogonal transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.
最后, 对上述加窗时域信号进行叠加处理, 得到时域音频信号。 具体是: 将加窗 操作后获得的信号的前 m个样本和前一帧信号的后 ΝΙΪ个样本重叠相加, 获得 m 个输出的时域音频样本, timeSamln = preSamin+ preSam^wn, 其中 表示帧序号, 表示样本序号, 有 0≤"≤ 且 N的取值为 2048。 Finally, the windowed time domain signal is superimposed to obtain a time domain audio signal. Specifically, the first m samples of the signal obtained after the windowing operation and the subsequent samples of the previous frame signal are overlapped and added to obtain m output time domain audio samples, timeSam ln = preSam in + preSam^wn, Which represents the frame number, Indicates the sample number, with 0 ≤ " ≤ and N is 2048.
2  2
在获得时域音频信号后, 根据频滞扩展控制信息和时域音频信号, 重枸音频信号 的高频部分, 得到宽频带音频信号。  After obtaining the time domain audio signal, the high frequency portion of the audio signal is reproduced based on the frequency extension spread control information and the time domain audio signal to obtain a wideband audio signal.
图 7是本发明编码装置的第一个实施例的示意图。 该实施例在图 5的基础上, 增 加了和差立体声 (M/S )编码模块 56 , 该模块位于时频映射模块 52的输出与量化和熵 编码模块 53的输入之间, 心理声学分析模块 51向其输出信号类型分析绪果。 对于多 声道信号, 心理声学分析模块 51 除了计算音频信号单声道的掩蔽阈值, 还要计算和 差声道的掩蔽阈值, 输出到量化和熵 码模块 53。 和差立体声编码模块 56还可以位 于量化和熵编码模块 53中的量化器组与编码器之间。  Figure 7 is a schematic illustration of a first embodiment of an encoding device of the present invention. This embodiment adds a difference and difference stereo (M/S) encoding module 56 on the basis of FIG. 5, which is located between the output of the time-frequency mapping module 52 and the input of the quantization and entropy encoding module 53, the psychoacoustic analysis module. 51 to the output signal type analysis results. For multi-channel signals, the psychoacoustic analysis module 51 calculates the masking threshold for the sum and difference channels in addition to the masking threshold for the audio signal mono, and outputs to the quantization and entropy code module 53. The and difference stereo encoding module 56 can also be located between the quantizer group and the encoder in the quantization and entropy encoding module 53.
和差立体声编码模块 56是利用声道对中两个声道之间的相关性, 将左右声道的 频域系数 /残差序列转换为和差声道 频域系数 /残差序列, 以达到提高编码效率和立 体声声像效果的目的, 因此只适用于 言号类型一致的声道对信号。 如果是单声道信号 或者信号类型不一致的声道对信号 ,. >]不进行和差立体声编码处理。  And the difference stereo encoding module 56 converts the frequency domain coefficients/residual sequences of the left and right channels into the difference channel frequency domain coefficients/residual sequences by using the correlation between the two channels of the channel pair to achieve The purpose of improving coding efficiency and stereo panning is therefore only applicable to channel-pair signals with consistent word types. If it is a mono signal or a channel pair signal with inconsistent signal type, .] does not perform and differential stereo encoding.
基于图 7 所示编码装置的编码 法与基于图 5 所示编码装置的编码方法基本相 同, 区别在于增加了下述步骤: 在对频域系数进行量化和熵编码处理之前, 判断音频 信号是否为多声道信号, 如果是多声道信号, 则判断左、 右声道信号的信号类型是否 一致, 如果信号类型一致, 则判断两声道对应的尺度因子带之间是否满足和差立体声 编码条件, 如果满足, 则对其进行和 立体声编码, 得到和差声道的频: ¾系数; 如果 不满足, 则不进行和差立体声编码; 果是单声道信号或信号类型不一致的多声道信 号, 则对频域系数不进行处理。  The encoding method based on the encoding apparatus shown in FIG. 7 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain coefficients, it is judged whether the audio signal is Multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo coding conditions. If it is satisfied, it is stereo encoded, and the frequency of the difference channel is obtained: 3⁄4 coefficient; if not, no and difference stereo coding is performed; if it is a mono signal or a signal type inconsistent multi-channel signal , the frequency domain coefficients are not processed.
和差立体声编码除了可以应用在量化处理之前, 还可以应用在量化之后、 熵编码 之前, 即: 在对频域系数量化后, 判断音频信号是否为多声道信号, 如杲是多声道信 号, 则判断左、 右声道信号的信号类 是否一致, 如果信号类型一致, 则判断两声道 对应的尺度因子带之间是否满足和差 体声编码条件, 如果满足, 则对其进行和差立 体声编码; 如果不满足, 则不进行和差立体声编码处理; 如果是单声道信号或信号类 型不一致的多声道信号, 则对频域系教不进行和差立体声编码处理。  And the difference stereo coding can be applied before the quantization process, before the entropy coding, that is, after the quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, such as a multi-channel signal. , determining whether the signal classes of the left and right channel signals are consistent. If the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels satisfy the difference and the sound coding conditions, and if they are satisfied, the difference is performed. Stereo encoding; if not, no and difference stereo encoding processing is performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain teaching is not performed and the differential stereo encoding processing is performed.
判断尺度因子带是否可进行和差立体声编码的方法很多, 本发明采用 的判断方法 是: 通过 K-L变换。 具体判断过程如" f :  There are many methods for judging whether the scale factor band can be performed and differential stereo coding. The judgment method adopted by the present invention is: by K-L transform. The specific judgment process such as "f:
假如左声道尺度因子带的谱系数^ 7 右声道相对应的尺度因子带的谱系数为 r (k) ,其相关矩阵 If the spectral coefficient of the left channel scale factor band is 7, the spectral factor of the corresponding scale factor band of the right channel is r (k) , and its correlation matrix
Figure imgf000016_0001
Figure imgf000016_0001
Crr =丄^; r(A} * r{k); 是尺度因子 的谱线数目 C rr =丄^; r(A} * r{k); is the number of spectral lines of the scale factor
N k=0  N k=0
对相关矩阵 C进行 K - L变换, 得 J  Perform K-L transformation on the correlation matrix C to get J
r  r
RCRi = A = RCRi = A =
Figure imgf000016_0002
Figure imgf000016_0003
Figure imgf000016_0002
Figure imgf000016_0003
2C,„  2C, „
旋转角度 a满足 tan(2 ) = 当 β = ± 时, 就是和差立 声编码模式。  The rotation angle a satisfies tan(2) = when β = ±, it is the sum and difference encoding mode.
cll - or 因此当旋转角度 a 的绝对值偏离; r/4较小时 (如3 /16 < |^| < 5 /16 ) , 两声道牙应的 尺度因子带之间可以进行和差立体声编码。 c ll - o r Therefore, when the absolute value of the rotation angle a deviates; when r/4 is small (for example, 3 /16 < |^| < 5 /16 ), the two-channel tooth can be subjected to differential and stereo coding between the scale factor bands.
如果和差立体声编码应用在量化处理之前, 则将左右声道对应尺度因子带内 的频 域系数通过线性变换用和差声道的频域系数代替:
Figure imgf000017_0001
If the sum and difference stereo coding are applied before the quantization process, the frequency domain coefficients in the scale factor band corresponding to the left and right channels are replaced by the frequency domain coefficients of the linear transform and the difference channel:
Figure imgf000017_0001
其中, M表示和声道频域系数; S表示差声道频域系数; L表示左声道频域系数; R表 示为右声道频域系数。 Wherein, M represents the sum frequency domain coefficient; S represents the difference channel frequency domain coefficient; L represents the left channel frequency domain coefficient; and R represents the right channel frequency domain coefficient.
如果和差立体声编码应用在量化之后, 则左右声道在尺度因子带的量化后的频域 系数通过线性变换用和差声道的频域系数代替:
Figure imgf000017_0002
If the sum and difference stereo coding are applied after quantization, the quantized frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:
Figure imgf000017_0002
其中: A表示量化后的和声道频域系数; 表示量化后的差声道频域系数; £ 示量 化后的左声道频域系数; 表示量化后的右声道频域系数。 Where: A represents the quantized sum channel frequency domain coefficient; represents the quantized difference channel frequency domain coefficient; £ quantized left channel frequency domain coefficient; represents the quantized right channel frequency domain coefficient.
将和差立体声编码放在量化处理之后, 不仅可以有效的去除左右声道的相 , 而 且由于在量化后进行, 因此可以达到无损编码。  After the quantization and the difference stereo coding are placed, the phase of the left and right channels can be effectively removed, and since the quantization is performed, lossless coding can be achieved.
图 8是解码装置的实施例一的示意图。 该解码装置在图 6所示的解码装置的基础 上,增加了和差立体声解码模块 606 , 位于逆量化器组 603的输出与频率 -时间 射模 块 604的输入之间,接收比特流解复用模块 601输出的信号类型分析结果与和盖立体 声控制信号,用于根据上述控制信息将和差声道的逆量化谱转换成左右声道的 ϋ量化 谱。  Fig. 8 is a schematic diagram of the first embodiment of the decoding apparatus. The decoding apparatus adds a difference and stereo decoding module 606, based on the decoding apparatus shown in FIG. 6, between the output of the inverse quantizer group 603 and the input of the frequency-time generating module 604, and receives the bit stream demultiplexing. The signal type analysis result and the cover stereo control signal output by the module 601 are used to convert the inverse quantized spectrum of the difference channel and the ϋ quantized spectrum of the left and right channels according to the above control information.
在和差立体声控制信号中, 有一个标志位用于表明当前声道对是否需要和盖立体 声解码, 若需要, 则在每个尺度因子带上也有一个标志位表明对应尺度因子带 否需 要和差立体声解码, 和差立体声解码模块 606根据尺度因子带的标志位, 确定 否需 要对某些尺度因子带中的逆量化谱 /谱的量化值进行和差立体声解码。 如果在编码装 置中进行了和差立体声编码, 则在解码装置中必须对逆量化谱进行和差立体声) I 码。  In the sum and difference stereo control signals, there is a flag to indicate whether the current channel pair needs to be decoded by the cover stereo. If necessary, there is also a flag on each scale factor band indicating whether the corresponding scale factor is required or not. The stereo decoding, and difference stereo decoding module 606 determines whether the quantized values of the inverse quantized spectrum/spectrum in some scale factor bands need to be subjected to differential stereo decoding based on the flag bits of the scale factor band. If the sum and difference stereo coding are performed in the encoding device, the inverse quantized spectrum must be subjected to the sum and difference stereo I code in the decoding device.
和差立体声解码模块 606还可以位于熵解码模块 602的输出与逆量化器組 6 03的 输入之间, 接收比特流解复用模块 601输出的和差立体声控制信号和信号类型 析结 果。  The sum and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603, and receive the difference stereo control signal and signal type output from the bit stream demultiplexing module 601.
基于图 8 所示解码装置的解码方法基本与基于图 6 所示解码装置的解码方法相 同, 区别在于增加了下述步骤: 在得到逆量化谱后, 如果信号类型分析结果表明信号 类型一致, 则根据和差立体声控制信号判断是否需要对逆量化谱进行和差立体声解 码; 如果需要, 则根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立 体声解码, 如果需要, 则将该尺度因子带中的和差声道的逆量化谱转换成左右声道的 逆量化谱, 再进行后续处理; 如果信号类型不一致或者不需要进行和差立体声勝码, 则对逆量化谱不进行处理, 直接进行后续处理。  The decoding method based on the decoding apparatus shown in FIG. 8 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 6, except that the following steps are added: After the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the signal types are the same, Determining whether inverse quantization spectrum and differential stereo decoding are required according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, The inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be performed with the difference stereo win code, the inverse quantization spectrum is not performed. Processing, direct processing.
和差立体声解码还可以在熵解码处理之后、 逆量化处理之前进行, 即: 当得到语 的量化值后, 如果信号类型分析结果表明信 ~ 类型一致, 则根据和差立体声控制信号 判断是否需要对谱的量化值进行和差立体声僻码; 如果需要, 则根据每个尺度因 带 上的标志位判断该尺度因子带是否需要和差 体声解码, 如果需要, 则将该尺度 Θ子 带中的和差声道的谱的量化值转换成左右声道的谱的量化值, 再进行后续处理; 口果 信号类型不一致或者不需要进行和差立体声:^码, 则对 的量化值不进行处理, r接 进行后续处理。 And the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is: when the language is obtained After the quantized value, if the signal type analysis result indicates that the letter~ type is consistent, it is judged according to the difference and the stereo control signal whether the quantized value of the spectrum needs to be performed and the difference stereo code; if necessary, according to each scale The flag bit determines whether the scale factor band needs to be decoded with the difference body sound, and if necessary, converts the quantized value of the spectrum of the difference channel in the scale dice band into the quantized value of the spectrum of the left and right channels, and then performs subsequent Processing; The word signal type is inconsistent or does not need to be performed and the difference stereo: ^ code, then the paired quantized value is not processed, r is followed by subsequent processing.
如果和差立体声解码在熵解码之后、 逆^:化之前, 则左右声道在尺度因子带^频 域系数采用下列运算通过和差声道的频域系教得到 其中: 表示 If the difference and the stereo decoding are after the entropy decoding and before the inverse, the left and right channels are obtained by using the following operations in the frequency factor of the scale factor band and the frequency domain of the difference channel.
Figure imgf000018_0002
Figure imgf000018_0002
量化后的和声道频域系数; S表示量化后的 声道频域系数; Z表示量化后的左声道 频域系数; ^表示量化后的右声道频域系数。 The quantized sum channel frequency domain coefficients; S represents the quantized channel frequency domain coefficients; Z represents the quantized left channel frequency domain coefficients; ^ represents the quantized right channel frequency domain coefficients.
如果和差立体声解码在逆量化之后, 则 右声道在子带的逆量化后的频域系歡才艮  If the sum and difference stereo decoding are after inverse quantization, then the right channel is in the frequency domain of the inverse quantization of the subband.
I 1 1 m  I 1 1 m
据下面的矩阵运算通过和差声道的频域系数 寻到: 其中: 表 和 According to the following matrix operation, the frequency domain coefficients of the difference channel and the difference channel are found: where:
1 一 1  1 one 1
声道频域系数; s表示差声道频域系数; 7表 左声道频域系数; r表示右声道频域系教。 Channel frequency domain coefficient; s represents the difference channel frequency domain coefficient; 7 table left channel frequency domain coefficient; r represents the right channel frequency domain system.
图 9给出了本发明编码装置的第二个实 例的结构示意图。 该实施例在图 5 ό 7基 础上增加了频域线性预测及矢量量化模块 57 ,其中频域线性预测及矢量量化模块 57位 于时频映射模块 52的输出与量化和熵编码樣块 53的输入之间,并输出残差序列到量 化和熵编码模块 53,同时将量化得到的码 ^索引作为边信息输出到比特流复用模块 55。  Fig. 9 is a view showing the construction of a second embodiment of the encoding apparatus of the present invention. This embodiment adds a frequency domain linear prediction and vector quantization module 57 to FIG. 5 , 7 , wherein the frequency domain linear prediction and vector quantization module 57 is located at the output of the time-frequency mapping module 52 and the input of the quantization and entropy coding sample 53 . The residual sequence is outputted to the quantization and entropy encoding module 53, and the quantized code index is output to the bit stream multiplexing module 55 as side information.
时频映射模块 52输出的频域系数传送至频域线性预测及矢量量化模块 57中, 如 果频域系数的预测增益值满足给定的条件, J5'J对频域系数进行线性预测滤波, 获得的 预测系数转换成线侮对频率系数 LSF ( Line Spec trum Frequency ) , 再采用最佳 ό 失 真度量准则搜索计算出各级码本的码字索引, 并将码字索引作为边信息传送到比 ^流 复用模块 55 , 而经过预测分析得到的残差序列则输出到量化和熵编码模块 53。  The frequency domain coefficients output by the time-frequency mapping module 52 are transmitted to the frequency domain linear prediction and vector quantization module 57. If the predicted gain value of the frequency domain coefficients satisfies a given condition, J5'J performs linear prediction filtering on the frequency domain coefficients to obtain The prediction coefficient is converted into a line coefficient LSF (Line Spec trum Frequency), and then the codeword index of each codebook is calculated by using the best distortion metric, and the codeword index is transmitted as side information to the ratio ^ The stream multiplexing module 55, and the residual sequence obtained by the predictive analysis is output to the quantization and entropy encoding module 53.
频域线性预测及矢量量化模块 57 由线 生预测分析器、 线性预测滤波器、 转提器 和矢量量化器构成。 频域系数输入到线性预5¾分析器中进行预测分析, 得到预测壤益 和预测系数, 如预测增益的值满足一定条件 ,则将频域系数输出到线性预测滤波^中 进行线性预测误差滤波,得到频域系数的预:? 残差序列;残差序列直接输出到量 和 熵编码模块 53中,而预测系数则通过转换器 换成线谱对频率系数 LSF , 再将 LSF 参 数送入矢量量化器中进行多级矢量量化,对 的码字索引被传送到比特流复用模块 55 中。  The frequency domain linear prediction and vector quantization module 57 is composed of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer. The frequency domain coefficients are input into a linear pre-conductor for prediction analysis, and the predicted soil and prediction coefficients are obtained. If the value of the prediction gain satisfies certain conditions, the frequency domain coefficients are output to the linear prediction filter to perform linear prediction error filtering. Get the pre-frequency of the frequency domain coefficient: The residual sequence is directly output to the quantity and entropy coding module 53, and the prediction coefficient is converted into a line spectrum pair frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization. The codeword index of the pair is transferred to the bitstream multiplexing module 55.
对音频信号进行频域线性预测处理能 有效地抑制预回声并获得较大的编冯增 益。假设实信号 Jf <¾) ,其平方 H i l ber t包络 e 表示为: e(t) = F~l Frequency domain linear prediction processing of the audio signal can effectively suppress the pre-echo and obtain a large humming gain. Assuming the real signal Jf <3⁄4), its square H il ber t envelope e is expressed as: e(t) = F~ l
Figure imgf000018_0001
Figure imgf000018_0001
其中 为对应于信号 正频率成分的举边语, 即信号的 Hi l ber t 包络是与谈信 号谘的自相关函数有关的。 而信号的功率诿 度函数与其时域波形的自相关函数 ^关 系为: PS (/) = _F fx(i:) ' jc* (r - t)^r}, 因此信号在时域的平方 Hi l ber t 包络与信 ^"在 频域的功率谱密度函数是互为对偶关系的。 上可知, 每个一定频率范围内的部 带 通信号, 如果它的 H i lber t包络保持恒定, |5么相邻谱值的自相关也将保持恒定, 这 就意味着谱系数序列相对于频率而言是 ¾fe态序列, 从而可以用预测编码技术来牙譜值 进行处理, 用公用的一组预测系数来有 地表示该信号。 Among them, the edge language corresponding to the positive frequency component of the signal, that is, the Hi l ber t envelope of the signal is related to the autocorrelation function of the signal. The relationship between the power intensity function of the signal and the autocorrelation function of its time domain waveform is: PS (/) = _F fx(i:) ' jc* (r - t)^r}, so the square of the signal in the time domain Hi l ber t envelope and signal ^" power spectral density function in the frequency domain is mutually dual relationship. It can be seen that the bandpass signal in each certain frequency range, if its H i lber t envelope remains constant, |5, the autocorrelation of adjacent spectral values will also remain constant, this This means that the sequence of spectral coefficients is a sequence of 3⁄4fe states with respect to frequency, so that the spectral value can be processed by predictive coding techniques, and the signal is represented by a common set of prediction coefficients.
基于图 9 所示编码装置的编码方法与基于图 5 所示编码装置的编码方法基本相 同, 区别在于增加了下述步骤: 对频域系数进行标准的线性预测分析, 得到预测增益 和预测系数; 判断预测增益是否超过设 t的阈值, 如果超过, 则根据预测系数牙频域 系数进行频域线性预测误差滤波, 得到 域系数的预测残差序列; 将预测系数柃化成 线谱对频率系数, 并对线谱对频率系数 行多级矢量量化处理, 得到边信息; 牙残差 序列进行量化和熵编码; 如果预测增益 超过设定的阔值, 则对频域系数进行量化和 熵编码。  The encoding method based on the encoding device shown in FIG. 9 is basically the same as the encoding method based on the encoding device shown in FIG. 5, except that the following steps are added: standard linear predictive analysis is performed on the frequency domain coefficients to obtain prediction gain and prediction coefficients; Determining whether the prediction gain exceeds a threshold value of t, and if so, performing frequency domain linear prediction error filtering according to the prediction coefficient tooth frequency domain coefficient to obtain a prediction residual sequence of the domain coefficient; degenerating the prediction coefficient into a line spectrum pair frequency coefficient, and The multi-level vector quantization processing of the line spectrum on the frequency coefficient line is performed to obtain the side information; the tooth residual sequence is quantized and entropy encoded; if the prediction gain exceeds the set threshold, the frequency domain coefficients are quantized and entropy encoded.
当获得了频域系数后, 首先对频域系数进行标准的线性预测分析, 包括计 自相 关矩阵、 递推执行 Levinson- Durb in算法荻得预测增益和预测系数。 然后判断 if算的 预测增益是否超过预先设定的阈值, 如果超过, 则根据预测系数对频域系数进行线性 预测误差滤波; 否则对频域系数不作处理, 执行下一步骤, 对频域系数进行量^ ί匕和嫡 编码。  When the frequency domain coefficients are obtained, the standard linear prediction analysis of the frequency domain coefficients is first performed, including the self-correlation matrix and the recursive Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the prediction gain of the if calculation exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient; otherwise, the frequency domain coefficient is not processed, and the next step is performed, and the frequency domain coefficient is performed. Quantity ^ 匕 匕 and 嫡 encoding.
线性预测可分为前向预测和后向预测两种, 前向预测是指利用某一时刻之前的值 预测当前值, 而后向预测是指利用某一时刻之后的值预测当前值。 下面以前向预测为 例说明线性预测误差滤波, 线性预测误差 ¾|波器的传递函数为 ^( ) = 1 -^ '' , 其中  Linear prediction can be divided into forward prediction and backward prediction. Forward prediction refers to the prediction of the current value by the value before a certain moment, while backward prediction refers to the prediction of the current value by the value after a certain moment. In the following, the linear prediction error filtering is explained as an example. The transfer function of the linear prediction error is ^( ) = 1 -^ '' , where
;=1  ;=1
A表示预测系数, 为预测阶数。 经过时 Γ司-频率变换后的频域系数 经过滤波后, 得 到 预 测 误 差 E (k) , 也 称 差 序 列 , 两 者 之 间 满 足 关 系 A represents the prediction coefficient, which is the prediction order. After passing through the frequency-domain coefficients of the Γ division-frequency transformation, the predicted error E (k) is obtained, which is also called the difference sequence, and the relationship between the two is satisfied.
E(k) = X(k) · A(z) = X{k) - atX(k - i)。 E(k) = X(k) · A(z) = X{k) - a t X(k - i).
这样, 经过线性预测误差滤波, 时间—频率变换模块输出的频域系数 X (k)就可以 用残差序列 E 0>和一组预测系数 ai表示。 ^^后将这组预测系数 α,.转换成线谱对频率系 数 LSF, 并对其进行多级矢量量化。 矢量 化选择最佳的失真度量准则 (如最近 准 则) , 依次分别在各级码本中搜索得到与 寺量化的 LSF参数矢量(残差矢量)最匹配 的码字, 将对应的码字索引作 边信息输 ±±!。 同时, 对残差序列 进行量化和嫡编 码。 由线性预测分析编码原理可知, 谱系教的残差序列的动态范围小于原始侮系教的 动态范围, 因此在量化时可以分配较少的 1=匕特数, 或者对于相同比特数的条件, 可以 获得改进的编码增益。 Thus, after linear prediction error filtering, the frequency domain coefficient X (k) output by the time-frequency transform module can be represented by the residual sequence E 0> and a set of prediction coefficients ai . After ^^, the set of prediction coefficients α,. is converted into a line spectrum pair frequency coefficient LSF, and multi-level vector quantization is performed thereon. Vectorization selects the best distortion metric (such as the most recent criterion), and searches for the codeword that best matches the temple-quantized LSF parameter vector (residual vector) in each codebook, and the corresponding codeword index is used. Side information is transmitted ±±!. At the same time, the residual sequence is quantized and encoded. According to the linear prediction analysis coding principle, the dynamic range of the residual sequence of the pedigree teaching is smaller than the dynamic range of the original syllabus, so less 1=匕 special number can be allocated in the quantization, or the condition of the same number of bits can be Improved coding gain is obtained.
图 10是解码装置的实施例二的示意图, 该解码装置在图 6所示解码装置的基础 上, 增加了逆频域线性预测及矢量量化模块 607 , 位于逆量化器组 603的输出与频率- 时间映射模块 604的输入之间, 并且比特: 解复用模块 601向其输出逆频域线性 测 矢量量化控制信息, 用于对逆量化谱(残^谱)进行逆量化处理和逆线性预测滤波, 得到预测前的谱, 并输出到频率 -时间映好模块 604中。  10 is a schematic diagram of a second embodiment of a decoding apparatus. The decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 607 to the output and frequency of the inverse quantizer group 603 based on the decoding apparatus shown in FIG. Between the inputs of the time mapping module 604, and the bits: the demultiplexing module 601 outputs inverse frequency domain linear vector quantization control information thereto for inverse quantization processing and inverse linear prediction filtering on the inverse quantization spectrum (residual spectrum) The pre-predicted spectrum is obtained and output to the frequency-time mapping module 604.
在编码器中,采用频域线性预测矢量量化技术来抑制预回声,并获得较大的编码 增益。 因此在解码器中,逆量化谱和比特¾ ^解复用模块 601输出的逆频域线性预¾ ^矢 量量化控制信息输入到逆频域线性预测及矢量量化模块 607 中恢复出线性预测 的 錯。 逆频域线性预测及矢量量化模块 607包括逆矢量量化器、 逆转换器和逆线性预测 滤波器, 其中逆矢量量化器用于对码字索引 行逆量化得到线谱对频率系数 LSF; 逆 转换器则用于将线谱对频率系数 LSF逆转换 预测系数; 逆线性预测滤波器用于根据 预测系数对逆量化谱进行逆滤波,得到预测前妁谱,并输出到频率 -时间映射模块 604。 In the encoder, frequency domain linear predictive vector quantization techniques are employed to suppress pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear pre-quantization control information outputted by the inverse quantization spectrum and the bit multiplex demodulation module 601 is input to the inverse frequency domain linear prediction and vector quantization module 607 to recover the linear prediction error. . The inverse frequency domain linear prediction and vector quantization module 607 includes an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is used to inversely quantize the codeword index row to obtain a line spectrum pair frequency coefficient LSF; Then, the line spectrum is inversely converted into a prediction coefficient by a frequency coefficient LSF; the inverse linear prediction filter is used to inversely filter the inverse quantization spectrum according to the prediction coefficient, to obtain a pre-predicted spectrum, and output to the frequency-time mapping module 604.
基于图 10所示解码装置的解码方法与基于图 6所示解码装置的解码方法基本相 同, 区别在于增加了下述步骤: 在得到逆量 谱后, 判断控制信息中是否包含逆量化 旙需要经过逆频域线性预测矢量量化的信息, 如果含有, 则进行逆矢量量化处理, 得 到预测系数, 并根据预测系数对逆量化谱进亍线性预测合成, 得到预测前的谱; 将预 测前的语进行频率-时间映射。  The decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 6, except that the following steps are added: After obtaining the inverse spectrum, it is judged whether or not the inverse quantization is included in the control information. Inverse frequency domain linear prediction vector quantization information, if it is, then inverse vector quantization processing is performed to obtain prediction coefficients, and the inverse quantization spectrum is linearly predicted and synthesized according to the prediction coefficients to obtain a spectrum before prediction; Frequency-time mapping.
在获得逆量化谱后, 根据控制信息判断该帧信号是否经过频域线性预测矢量量 化, 如果是, 则从控制信息中获取预测系数 ^量量化后的码字索引; 再根据码字索引 得到量化的线谱对频率系数 LSF, 并以此计算出预测系数; 然后将逆量化谱进行线性 预测合成, 得到预测前的谱。  After obtaining the inverse quantization spectrum, determining whether the frame signal is subjected to frequency domain linear predictive vector quantization according to the control information, and if so, obtaining the quantized codeword index from the control information; and then obtaining the quantization according to the codeword index The line spectrum pairs the frequency coefficient LSF, and calculates the prediction coefficient; then the inverse quantization spectrum is linearly predicted and synthesized to obtain the spectrum before prediction.
线性预测误差滤波处理所采用的传递函数 fe>为: Α(ζ) = ί ~^α , 其中: 是预测系数; 为预测阶数。 因此残差序列 0 >与预测前的谱 YU满足: The transfer function fe> used in the linear prediction error filtering process is: Α(ζ) = ί ~^ α , where: is the prediction coefficient; is the prediction order. Therefore, the residual sequence 0 > is satisfied with the spectrum YU before prediction:
X k) = E(k) --i- = E(A) +∑ aiX{k - )。 X k) = E(k) --i- = E(A) +∑ ai X{k - ).
A z) =1 A z ) =1
这样, 残差序列 和计算出的预测系数 经过频域线性预测合成, 就可得到预测前 的谱 (k) , 将预测前的谱 J (k)进行频率-时 ί 映射处理。 Thus, the residual sequence and the calculated prediction coefficients are synthesized by frequency domain linear prediction, and the pre-predicted spectrum (k) can be obtained, and the pre-predicted spectrum J (k) is subjected to frequency-time ί mapping processing.
如果控制信息表明该帧信号没有经过频域线性预测矢量量化, 则不进行逆频域线 性预测矢量量化处理, 将逆量化谱直接进行频率-时间映射处理。  If the control information indicates that the frame signal has not undergone frequency domain linear predictive vector quantization, the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantized spectrum is directly subjected to frequency-time mapping processing.
图 11给出了本发明编码装置的第三个实施例的结构示意图。 该实施例在图 9的 基础上, 增加了和差立体声编码模块 56 , 位于频域线性预测及矢量量化模块 57的输 出与量化和熵编码模块 53的输入之间, 心理声学分析模块 51向其输出信号类型分析 结果, 并将和差声道的掩蔽阁值输出到量化 ^熵编码模块 53。  Figure 11 is a block diagram showing the structure of a third embodiment of the encoding apparatus of the present invention. This embodiment, based on FIG. 9, adds a difference and difference stereo coding module 56 between the output of the frequency domain linear prediction and vector quantization module 57 and the input of the quantization and entropy coding module 53, to which the psychoacoustic analysis module 51 The signal type analysis result is output, and the masked value of the difference channel is output to the quantization entropy encoding module 53.
和差立体声编码模块 56也可以位于量化 熵编码模块 53中的量化器组与编码器 之间, 接收心理声学分析模块 51输出的信号类型分析结果。  The sum and difference stereo coding module 56 may also be located between the quantizer group and the encoder in the quantization entropy coding module 53 to receive the signal type analysis result output by the psychoacoustic analysis module 51.
在本实施例中, 和差立体声编码模块 56的功能及工作原理与其在图 7中的相同, 此处不再赞述。  In the present embodiment, the function and working principle of the and difference stereo coding module 56 are the same as those in FIG. 7, and will not be described here.
基于图 11所示编码装置的编码方法与基于图 9所示编码装置的编码方法基本相 同, 区别在于增加了下述步驟: 在对频域系 进行量化和熵编码处理之前, 判断音频 信号是否为多声道信号, 如果是多声道信号, 则判断左、 右声道信号的信号类型是否 一致, 如果信号类型一致, 则判断尺度因子 是否满足编码条件, 如果满足, 则对该 尺度因子带进行和差立体声编码; 如果不满 则不进行和差立体声编码处理; 如果 是单声道信号或信号类型不一致的多声道信^" , 则不进行和差立体声编码处理。  The encoding method based on the encoding apparatus shown in FIG. 11 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain system, it is determined whether the audio signal is Multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor satisfies the coding condition, and if it is satisfied, the scale factor band is performed. And difference stereo coding; if not full, the difference stereo coding process is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo coding process is not performed.
和差立体声编码除了可以应用在量化处 之前, 还可以应用在量化之后、 熵编码 之前, 即: 在对频域系数量化后, 判断音频^ T号是否为多声道信号, 如果是多声道信 号, 则判断左、 右声道信号的信号类型是否一致, 如果信号类型一致, 则判断尺度因 子带是否满足编码条件, 如果满足, 则对该 ^度因子带进行和差立体声编码; 如果不 满足, 则不进行和差立体声编码处理; 如果是单声道信号 信号类型不一致的多声道 信号, 则不进行和差立体声编码处理。 And the difference stereo coding can be applied before the quantization and before the entropy coding, that is, after the quantization of the frequency domain coefficients, it is judged whether the audio signal is a multi-channel signal, if it is multi-channel. Signal, then determine whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determine whether the scale factor band satisfies the coding condition, and if so, perform a differential stereo coding on the ^-factor band; If it is satisfied, the difference and the stereo encoding processing are not performed; if it is a multi-channel signal in which the mono signal signal types are inconsistent, the difference and the stereo encoding processing are not performed.
图 12是解码装置的实施例三的结构图。 该解码装置在图 10所示解码装置的基础 上, 增加了和差立体声解码模块 606 , 位于逆量化器组 603的输出与逆频域线性预测 及矢量量化模块 607的输入之间, 比特流解复用模块 601向其输出和差立体声控制信 号。  Figure 12 is a block diagram showing a third embodiment of the decoding apparatus. The decoding apparatus adds a difference and difference stereo decoding module 606 on the basis of the decoding apparatus shown in FIG. 10, between the output of the inverse quantizer group 603 and the input of the inverse frequency domain linear prediction and vector quantization module 607, and the bit stream solution The multiplexing module 601 outputs and the difference stereo control signals thereto.
和差立体声解码模块 606也可以位于熵解码模块 602 输出与逆量化器组 603的 输入之间, 接收比特流解复用模块 601输出的和差立体声植制信号。  The sum and difference stereo decoding module 606 may also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603, and receive the sum and difference stereo planting signals output by the bit stream demultiplexing module 601.
在本实施例中,和差立体声解码模块 606的功能及工作原理与其在图 8中的相同, 此处不再赘述。  In this embodiment, the function and working principle of the difference and stereo decoding module 606 are the same as those in FIG. 8, and details are not described herein again.
基于图 12所示解码装置的解码方法与基于图 10所示解码装置的解码方法基本相 同, 区别在于增加了下述步骤: 在得到逆量化侮后, 如果信号类型分析结果表明信号 类型一致, 则根据和差立体声控制信号判断是否需要对逆量化谱进行和差立体声解 码; 如果需要, 则根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立 体声解码, 如果需要, 则将该尺度因子带中的和差声道的 ¾量化谱转换成左右声道的 逆量化谱, 再进行后续处理; 如果信号类型不一致或者不需要进行和差立体声解码, 则对逆量化侮不进行处理, 直接进行后续处理。  The decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 10, except that the following steps are added: After the inverse quantization is obtained, if the signal type analysis result indicates that the signal types are the same, Determining whether inverse quantization spectrum and differential stereo decoding are required according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, The 3⁄4 quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and the difference stereo decoding, the inverse quantization is not processed. , directly follow-up processing.
和差立体声解码还可以在逆量化处理之前进行, 即: 当得到傳的量化值后, 如果 信号类型分析结果表明信号类型一致, 则根据和差立体声控制信号判断是否需要对谱 的量化值进行和差立体声解码; 如果需要, 则根据每个尺度因子带上的标志位判断该 尺度因子带是否需要和差立体声解码, 如果需要, 则将该尺度因子带中的和差声道的 谱的量化值转换成左右声道的借的量化值, 再进行后续处理; 如果信号类型不一致或 者不需要进行和差立体声解码, 则对谱的量化值不进行处理, 直接进行后续处理。  And the difference stereo decoding can also be performed before the inverse quantization process, that is: after the obtained quantized value, if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control signal whether the quantized value of the spectrum needs to be Poor stereo decoding; if necessary, judging whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, the quantized value of the spectrum of the sum channel in the scale factor band Converted into the quantized values of the left and right channels, and then perform subsequent processing; if the signal types are inconsistent or do not need to perform and the difference stereo decoding, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed.
图 13给出了本发明编码装置的第四个实施例的结构 意图。 该实施例在图 5的 基础上, 增加了多分辨率分析模块 59 , 其中多分辨率分析 莫块 59位于时频映射模块 52的输出与量化和熵编码模块 53的输入之间,心理声学分浙模块 51向其输出信号类 型分析结果。  Fig. 13 is a view showing the construction of a fourth embodiment of the encoding apparatus of the present invention. This embodiment adds a multi-resolution analysis module 59 on the basis of FIG. 5, wherein the multi-resolution analysis block 59 is located between the output of the time-frequency mapping module 52 and the input of the quantization and entropy coding module 53, psychoacoustic points. The Zhejiang module 51 outputs a signal type analysis result to it.
对于快变类型信号, 为有效克服编码过程中产生的预^;声现象, 提高编码质量, 本发明编码装置通过多分辨率分析模块 59来提高快变信号的频域系数的时间分辨率。 时频映射模块 52输出的频域系数输入到多分辨率分析模^ 59中, 如果是快变类型信 号, 则进行频域小波变换或频域修正离散余弦变换(MDCT > , 获得对频域系数的多分 辨率表示, 输出到量化和熵编码模块 53 中。 如果是緩变臭型信号, 则对频域系数不 进行处理, 直接输出到量化和熵编码模块 53。  For fast-changing type signals, in order to effectively overcome the pre-acquisition phenomenon generated in the encoding process and improve the encoding quality, the encoding apparatus of the present invention increases the time resolution of the frequency-domain coefficients of the fast-changing signal by the multi-resolution analyzing module 59. The frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 59. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT > is obtained, and the frequency domain coefficients are obtained. The multi-resolution representation is output to the quantization and entropy coding module 53. If it is a slowly varying odor signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy coding module 53.
多分辨率分析模块 59对输入的频域数据进行时-频域的重新组织, 以频率精度 的降低为代价提高频域数据的时间分辨率, 从而自动地 应快变类型信号的时频特 性, 达到抑制预回声的效果。 此时频映射模块 52 中滤波器组的形式可以无需随时调 整。 多分辨率分析模块 59 包括频域系数变换模块和重組湊块, 其中频域系数变换模 块用于将频域系数变换为时频平面系数; 重组模块用于将 Εϋ"频平面系数按照一定的规 则进行重组。 频域系数变换模块可采用频域小波变换滤波器组、 频域 MDCT 变换滤波 器組等。  The multi-resolution analysis module 59 performs time-frequency domain re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the expense of the frequency precision, thereby automatically adapting the time-frequency characteristics of the fast-changing type signal. The effect of suppressing the pre-echo is achieved. The form of the filter bank in the frequency mapping module 52 can be adjusted at any time. The multi-resolution analysis module 59 includes a frequency domain coefficient transform module and a recombination patch, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the reassembly module is configured to use the 频"frequency plane coefficients according to certain rules. The frequency domain coefficient transform module may use a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, and the like.
下面以频域小波变换和频域 MDCT变换为例, 说明多分 率分析模块 59的工作过 程。 The following takes the frequency domain wavelet transform and the frequency domain MDCT transform as an example to illustrate the operation of the multi-segment analysis module 59. Cheng.
1 ) 频域小波变换  1) Frequency domain wavelet transform
殳设时序序列 χ(ζ·),ζ· = 0,1,...,2Μ - 1 ,经过时频映射后得到 ό 频域系数为 X(k), 1=0、 1 M-\ a 频域小波或小波包变换的小波基可以是固定的, 也可以是自适应的。 Set the time series χ(ζ·), ζ· = 0,1,...,2Μ - 1 , and obtain the ό frequency domain coefficients after time-frequency mapping as X(k), 1=0, 1 M-\ a The wavelet base of the frequency domain wavelet or wavelet packet transform may be fixed or adaptive.
下面以最简单的 Harr小波基小波变换为例, 说明对频域 、数进行多分辨率分析 的过程。  Let's take the simplest Harr wavelet-based wavelet transform as an example to illustrate the process of multi-resolution analysis of frequency domain and number.
Harr 小波基的尺度系数为 , 图 14 示出了采用
Figure imgf000022_0001
The scale factor of the Harr wavelet basis is, Figure 14 shows the adoption
Figure imgf000022_0001
Harr 小波基进行小波变换的滤波结构示意图, 其中 。表示 氐通滤波 (滤波系数为  Schematic diagram of the filtering structure of wavelet transform based on wavelet transform, in which. Indicates 氐 pass filtering (filter coefficient is
) , 表示高通滤波(滤波系数为 - =] ) , 'c I 2" 表示 1 倍的下采 V2 V2 V2 V2 ) , indicating high-pass filtering (filter coefficient is - =) ), ' c I 2' means 1 times lowering V2 V2 V2 V2
样操作。 对于频域系数的中低频部分 = 0,… 不进行,卜波变换, 对频域系数的 高频部分进行 Harr小波变换,得到不同的时间 -频率区间的系数 X2 (k)、 X3 (k)、 X (k)、 5 (;)、 X^ ^ X. ik) ,对应的时间-频率平面划分如图 15所^。选择不同的小波基, 可选用不同的小波变换结构进行处理, 得到其他类似的时间 -频率平面划分。 因此可 以根据需要, 任意调整信号分析时的时频平面划分, 满足不同 的时间和频率分辨率的 分析要求。 Sample operation. For the low-frequency part of the frequency domain coefficient = 0,... No, the wave-wave transform is performed, and the high-frequency part of the frequency-domain coefficient is subjected to Harr wavelet transform to obtain coefficients X 2 (k) and X 3 of different time-frequency intervals ( k), X (k), 5 (;), X^ ^ X. ik) , the corresponding time-frequency plane is divided as shown in Figure 15. Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, the time-frequency plane division of the signal analysis can be arbitrarily adjusted as needed to meet the analysis requirements of different time and frequency resolutions.
上述时频平面系数在重组模块中按照一定的规则进行重 例如: 可先将时频平 面系数在频率方向组织, 每个频带中的系数在时间方向组织, 然后将组织好的系数按 照子窗、 尺度因子带的顺序排列。  The above-mentioned time-frequency plane coefficients are weighted according to certain rules in the recombination module. For example, the time-frequency plane coefficients may be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are according to the sub-window, The order of the scale factor bands.
2 ) 频域 MDCT变换  2) Frequency domain MDCT transform
设输入频域 MDCT变换滤波器组的频域数据为 X (k) , k= 0, 1 , ... , N-1, 依次对这 N 点频域数据进行 M点的 MDCT变换, 使得时频域数据的频率 度有所下降, 而时间精 度则相应地提高了。 在不同的频域范围内使用不同长度的频 MDCT 变换, 可以获得 不同的时-频平面划分即不同的时、 频精度。 重组模块对频 MDCT 变换滤波器组输 出的时-频域数据进行重组, 一种重组方法是先将时频平面系 数在频率方向组织, 同 时每个频带中的系数在时间方向组织, 然后将组织好的系数 照子窗、 尺度因子带的 顺序排列。  Let the frequency domain data of the input frequency domain MDCT transform filter bank be X (k), k= 0, 1 , ..., N-1, and perform MDCT transform of M point in sequence for the N point frequency domain data, so that time The frequency of the frequency domain data is reduced, and the time accuracy is correspondingly increased. Different frequency-frequency MDCT transforms can be used in different frequency domains to obtain different time-frequency plane partitions, that is, different time and frequency precisions. The recombination module reorganizes the time-frequency domain data outputted by the frequency MDCT transform filter bank. A recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then the organization Good coefficients are arranged in the order of the sub-window and scale factor bands.
基于图 13所示编码装置的编码方法, 基本流程与基于图 5所示编码装置的编码 方法相同, 区别在于增加了下述步骤: 如果是快变类型信号, 则对频域系数进行多分 辨率分析, 然后对频域系数的多分辨率表示进行量化和熵编^ ; 如果不是快变类型信 号, 则直接将频域系数进行量化和熵编码。  Based on the encoding method of the encoding apparatus shown in FIG. 13, the basic flow is the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: If it is a fast-changing type signal, multi-resolution is performed on the frequency domain coefficient. Analysis, then quantizing and entropy coding the multi-resolution representation of the frequency domain coefficients; if not fast-type signals, the frequency domain coefficients are directly quantized and entropy encoded.
多分辨率分析可采用频域小波变换法或频域 MDCT变换法。 频域小波分析法包括: 对频域系数进行小波变换, 得到时频平面系数; 将上述时频平面系数按照一定的规则 重组。 而 MDCT变换法则包括: 对频域系数进行 n次 MDCT变 ^ , 得到时频平面系数; 将上述时频平面系数按照一定的规则重组。 重组的方法可以 括: 先将时频平面系数 在频率方向组织,每个频带中的系数在时间方向组织,然后将 i£织好的系数按照子窗、 尺度因子带的顺序排列。 图 16是解码装置的实施例四的结构示意图。 该解码装置在图 6所示解 马装置的 基 上, 增加多分辨率综合模块 609。 多分辨率综合模块 609位于逆量化器纟 JL 603的 输出与频率-时间映射模块 604的输入之间, 用于对逆量化谱进行多分辨率绿合。 The multi-resolution analysis may use a frequency domain wavelet transform method or a frequency domain MDCT transform method. The frequency domain wavelet analysis method comprises: performing wavelet transform on the frequency domain coefficients to obtain time-frequency plane coefficients; and recombining the above-mentioned time-frequency plane coefficients according to a certain rule. The MDCT transform method includes: performing n times MDCT transformation on the frequency domain coefficients to obtain a time-frequency plane coefficient; and recombining the above-mentioned time-frequency plane coefficients according to a certain rule. The method of recombination may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the coefficients of the woven fabric are arranged in the order of the sub-window and the scale factor band. Figure 16 is a block diagram showing the structure of a fourth embodiment of the decoding apparatus. The decoding device adds a multi-resolution synthesis module 609 to the base of the anti-horse device shown in FIG. The multi-resolution synthesis module 609 is located between the output of the inverse quantizer 纟JL 603 and the input of the frequency-time mapping module 604 for multi-resolution green combining of the inverse quantized spectrum.
在编码器中, 对快变类型信号采用了多分辨率滤波技术以提高编码快变类型信号 的时间分辨率。 相应地, 在解码器中, 需采用多分辨率综合模块 609对快变 型信号 恢复多分辨率分析前的频域系数。 多分辨率综合模块 609包括: 系数重組模块和系数 变换模块, 其中系数变换模块可以采用频域逆小波变换滤波器组或频域 IMDC Ί变换滤 波器组。  In the encoder, a multi-resolution filtering technique is employed for the fast-changing type signal to improve the temporal resolution of the encoded fast-changing type signal. Accordingly, in the decoder, the multi-resolution synthesis module 609 is required to recover the frequency domain coefficients before the multi-resolution analysis for the fast-changing signal. The multi-resolution synthesis module 609 includes: a coefficient recombination module and a coefficient transformation module, wherein the coefficient transformation module can adopt a frequency domain inverse wavelet transform filter bank or a frequency domain IMDC Ί transform filter group.
基于如图 16所示的解码装置的解码方法, 基本流程与基于图 6所示解 马装置的 解码方法相同, 区别在于增加了下述步骤: 在获得了逆量化谱后, 对逆量化"^普进行多 分辨率综合, 再将得到的预测系数进行频率 -时间映射。  Based on the decoding method of the decoding device shown in FIG. 16, the basic flow is the same as the decoding method based on the de-horse device shown in FIG. 6, except that the following steps are added: After the inverse quantization spectrum is obtained, the inverse quantization "^ Multi-resolution synthesis is performed, and the obtained prediction coefficients are subjected to frequency-time mapping.
下面以频域 IMDCT变换为例说明多分辨率综合的方法, 具体包括: 对逆量化谱系 数进行重组; 再对每个系数进行多个 IMDCT变换, 得到多分辨率分析前的逆 i化谱。 下面以 128个 IMDCT变换( 8个输入, 16个输出)详细说明该过程。 首先, 3寄逆量化 谱系数按照子窗、尺度因子带的顺序排列;再按照频序进行重组,这样每个子盲的 128 个系数按频序被组织在一起。然后,将按子窗排列的系数每 8个一组按频率:^ "向组织, 每组 8个系数按时序排列, 这样在频率方向共有 128组系数。 将每组系数 行 16点 IMDCT变换, 每组 IMDCT变换后输出的 16个系数重叠相加, 获得 8个频域数据。 依次 由低频向高频方向进行 128次类似的操作, 获得 1024个频域系数。  The following is a description of the multi-resolution synthesis method by taking the frequency domain IMDCT transform as an example, which specifically includes: recombining the inverse quantized spectral coefficients; and performing multiple IMDCT transforms on each coefficient to obtain an inverse i-spectrum before multi-resolution analysis. The process is described in detail below with 128 IMDCT transforms (8 inputs, 16 outputs). First, the 3 inverse quantization spectral coefficients are arranged in the order of sub-window and scale factor bands; then recombined according to the frequency order, so that 128 coefficients of each sub-blind are organized in frequency order. Then, the coefficients arranged by the sub-window are grouped by frequency every 8 groups: ^ "to the organization, each group of 8 coefficients are arranged in time series, so that there are 128 sets of coefficients in the frequency direction. Each group of coefficients is converted by 16 points IMDCT, The 16 coefficients outputted by each group of IMDCT are superimposed and added to obtain 8 frequency domain data. 128 similar operations are performed from the low frequency to the high frequency direction, and 1024 frequency domain coefficients are obtained.
图 17给出了本发明编码装置的第五个实施例的结构示意图。 该实施例;^在图 13 所示编码装置的基础上, 增加了频域线性预测及矢量量化模块 57 , 位于多分辦率分析 模块 59的输出与量化和熵编码模块 53的输入之间, 心理声学分析模块 51向其输出 信号类型分析结果; 频域线性预测及矢量量化模块 57 用于对经过多分辨率分析的频 域系数进行线性预测和多级矢量量化, 输出残差序列到量化和熵编码模块 53, 同时将 量化得到的码字索引输出到比特流复用模块 55。  Fig. 17 is a view showing the configuration of a fifth embodiment of the encoding apparatus of the present invention. In this embodiment, based on the encoding apparatus shown in FIG. 13, a frequency domain linear prediction and vector quantization module 57 is added, which is located between the output of the multi-dividing rate analyzing module 59 and the input of the quantization and entropy encoding module 53. The acoustic analysis module 51 outputs a signal type analysis result thereto; the frequency domain linear prediction and vector quantization module 57 is configured to perform linear prediction and multi-level vector quantization on the frequency domain coefficients subjected to the multi-resolution analysis, and output the residual sequence to the quantization and entropy. The encoding module 53 simultaneously outputs the quantized codeword index to the bitstream multiplexing module 55.
由于频域系数在经过多分辨率分析后得到的是具有特定时频平面划分 时频系 数, 因此频域线性预测及矢量量化模块 57 需对每个时间段上的频域系数进 f寸线性预 测和多级矢量量化。  Since the frequency domain coefficients are obtained by multi-resolution analysis with time-frequency coefficients divided by specific time-frequency planes, the frequency-domain linear prediction and vector quantization module 57 needs to linearly predict the frequency domain coefficients in each time period. And multi-level vector quantization.
频域线性预测及矢量量化模块 57也可位于时频映射模块 52的输出与多分辨率分 析模块 59 的输入之间, 对频域系数进行线性预测和多级矢量量化, 输出残; 序列到 多分辨率分析模块 59, 同时将量化得到的码字索引输出到比特流复用模块 55。  The frequency domain linear prediction and vector quantization module 57 may also be located between the output of the time-frequency mapping module 52 and the input of the multi-resolution analysis module 59, performing linear prediction and multi-level vector quantization on the frequency domain coefficients, and outputting residuals; The resolution analysis module 59 simultaneously outputs the quantized codeword index to the bitstream multiplexing module 55.
基于图 17所示编码装置的编码方法, 与基于图 13所示编码装置的编码 ;8"法基本 相同, 区别在于增加了下述步骤: 在对频域系数进行多分辨率分析后, 对每 ^时间段 上的频域系数进行标准的线性预测分析; 判断预测增益是否超过设定的阈值, 如果超 过,则对频域系数进行频域线性预测误差滤波,得到预测系数与频域系数的残 序列; 将预测系数转化为线谱对频率系数, 并对线谱对频率系数进行多级矢量量化处理, 得 到边信息; 对残差序列进行量化和熵编码; 如果预测增益未超过设定的阈值, 则对频 域系数进行量化和熵编码。 或者在进行多分辨率分析前, 先对频域系数进行线性预测 和多级矢量量化处理, 再将残差序列进行多分辨率分析。  The encoding method based on the encoding apparatus shown in Fig. 17 is basically the same as the encoding based on the encoding apparatus shown in Fig. 13; the difference is that the following steps are added: After performing multi-resolution analysis on the frequency domain coefficients, ^The frequency domain coefficient on the time period is subjected to standard linear prediction analysis; whether the prediction gain exceeds the set threshold value; if it exceeds, the frequency domain linear prediction error is filtered on the frequency domain coefficient to obtain the residual of the prediction coefficient and the frequency domain coefficient. The prediction coefficient is converted into a line spectrum pair frequency coefficient, and the line spectrum is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold Then, the frequency domain coefficients are quantized and entropy encoded. Or before the multi-resolution analysis, the frequency domain coefficients are linearly predicted and multi-level vector quantized, and then the residual sequences are subjected to multi-resolution analysis.
图 18是解码装置的实施例五的结构示意图。 该解码装置在图 16所示解玛装置的 基础上, 增加逆频域线性预测及矢量量化模块 607 , 位于逆量化器组 603的^ Γ出与多 分辨率综合模块 609的输入之间, 并且比特流解复用模块 601向其输出逆频域线性 员 测矢量量化控制信息, 用于对逆量化谱进行逆量化处理和线性预测合成, 得到预测嘴 的谱, 并输出到多分辨率综合模块 609中。 Figure 18 is a block diagram showing the structure of a fifth embodiment of the decoding apparatus. The decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 607 on the basis of the numerator apparatus shown in FIG. 16, and is located in the inverse quantizer group 603. Between the inputs of the resolution synthesis module 609, and the bitstream demultiplexing module 601 outputs inverse frequency domain linearity vector quantization control information thereto for inverse quantization processing and linear prediction synthesis of the inverse quantization spectrum to obtain a prediction mouth. The spectrum is output to the multi-resolution synthesis module 609.
逆频域线性预测及矢量量化模块 607也可位于多分辨率综合模块 609的输出与 页 率-时间映射模块 604 的输入之间, 用于对多分辨率综合后的逆量化镨进行线性预:5则 合成。  The inverse frequency domain linear prediction and vector quantization module 607 can also be located between the output of the multi-resolution synthesis module 609 and the input of the page rate-time mapping module 604 for linear pre-processing of the multi-resolution integrated inverse quantization :: 5 is synthesized.
基于图 18所示解码装置的解码方法, 与基于图 16所示解码装置的解码方法基 相同, 区别在于增加了下述步骤: 在得到逆量化谱后, 判断控制信息中是否包含有 量化谱需要经过逆频域线性预测矢量量化的信息,如果含有,则进行逆矢量量化处 得到预测系数, 并对逆量化镨进行线性预测合成, 得到预测前的錯; 将预测前的谱 行多分辨率综合。 也可以是: 在得到逆量化谱后, 对逆量化谱进行多分辨率综合; 断控制信息中是否包含有逆量化谱需要经过逆频域线性预测矢量量化的信息, 如果 ^ 有, 则进行逆矢量量化处理, 得到预测系数, 并对逆量化谱进行线性预测合成, 得 ij 预测前的谱; 将预测前的讲进行频率 -时间映射。  The decoding method based on the decoding device shown in FIG. 18 is the same as the decoding method based on the decoding device shown in FIG. 16, except that the following steps are added: After obtaining the inverse quantization spectrum, it is determined whether or not the control information needs to include the quantized spectrum. After inverse frequency domain linear prediction vector quantization information, if it is included, the inverse vector quantization is used to obtain the prediction coefficients, and the inverse quantization 镨 is linearly predicted and synthesized to obtain the error before prediction; the pre-predictive spectral line multi-resolution synthesis is performed. . It may also be: after obtaining the inverse quantization spectrum, performing multi-resolution synthesis on the inverse quantization spectrum; whether the information of the inverse quantization spectrum needs to be quantized by the inverse frequency domain linear prediction vector is included in the break control information, and if there is, the inverse is performed. The vector quantization process obtains the prediction coefficients, and performs linear prediction synthesis on the inverse quantization spectrum to obtain the spectrum before the ij prediction; the frequency-time mapping is performed before the prediction.
图 19给出了本发明编码装置的第六个实施例的结构示意图。 该实施例是在图 17 所示编码装置的基础上, 增加了和差立体声编码模块 56 ,位于频域线性预测及矢量^: 化模块 57的输出与量化和熵编码模块 53的输入之间, 接收心理声学分析模块 51 出的信号类型分析结果。 和差立体声编码模块 56也可位于量化和熵编码模块 53中 々 量化器组与编码器之间。 在本实施例中, 和差立体声编码模块 56 的功能及工作原理 与图 11中的相同, 此处不再赘述。  Fig. 19 is a view showing the construction of a sixth embodiment of the encoding apparatus of the present invention. This embodiment is based on the encoding apparatus shown in Fig. 17, with an addition and difference stereo encoding module 56 between the output of the frequency domain linear prediction and vectoring module 57 and the input of the quantization and entropy encoding module 53. The signal type analysis result from the psychoacoustic analysis module 51 is received. The sum and difference stereo coding module 56 may also be located between the quantizer and the encoder in the quantization and entropy coding module 53. In this embodiment, the functions and working principles of the and difference stereo coding module 56 are the same as those in FIG. 11, and are not described herein again.
基于图 19所示编码装置的编码方法与基于图 17所示编码装置的编码方法基本 目 同, 区别在于增加了下述步骤: 在获得残差序列后, 根据音频信号是否为多声道信号 且为信号类型一致的信号且满足编码条件, 确定是否对其进行和差立体声编码; 然 进行后续处理。 具体的流程在上面已经介绍, 此处不再赘述。  The encoding method based on the encoding device shown in FIG. 19 is basically the same as the encoding method based on the encoding device shown in FIG. 17, except that the following steps are added: after obtaining the residual sequence, according to whether the audio signal is a multi-channel signal and For signals of the same signal type and satisfying the encoding conditions, it is determined whether or not to perform stereo encoding on the difference; however, subsequent processing is performed. The specific process has been introduced above, and will not be described here.
图 20给出了解码装置的实施例六的结构示意图。 该解码装置是在图 18所示解码 装置的基础上, 增加了和差立体声解码模块 606 , 位于逆量化器组 603的输出与逆频 域线性检测及矢量量化模块 607的输入之间。 和差立体声解码模块 606也可位于熵解 码模块 602的输出与逆量化器组 603的输入之间。本实施例中和差立体声解码模块 6 O 6 的功能及工作原理与其在图 12 , 此处不再赘述。  Fig. 20 is a block diagram showing the structure of the sixth embodiment of the decoding apparatus. The decoding apparatus is based on the decoding apparatus shown in Fig. 18, and the sum and difference stereo decoding module 606 is added between the output of the inverse quantizer group 603 and the input of the inverse frequency domain linearity detecting and vector quantization module 607. The sum and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603. The function and working principle of the differential stereo decoding module 6 O 6 in this embodiment are the same as those in FIG. 12 , and details are not described herein again.
基于图 20所示解码装置的解码方法, 与基于图 18所示解码装置的解码方法基本 相同, 区别在于增加了下述步驟: 在得到逆量化谱后, 根据信号类型分析结果与和 ¾ 立体声控制信息判断是否需要对逆量化谱进行和差立体声解码; 然后进行后续处理 具体的流程在上面已经介绍, 此处不再赘述。  The decoding method based on the decoding apparatus shown in FIG. 20 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 18, except that the following steps are added: After the inverse quantization spectrum is obtained, the result and the stereo control are analyzed according to the signal type. The information judges whether the inverse quantization spectrum needs to be subjected to differential stereo decoding; then the subsequent processing is described above, and will not be described here.
图 21给出了本发明编码装置的第七个实施例的示意图, 该实施例是在图 13的墓 础上, 增加了和差立体声编码模块 56 , 位于多分辨率分析模块 59的输出与量化和域 解码模块 53的输入之间。 和差立体声编码模块 56也可位于量化和熵编码模块 53中 的量化器组与编码器之间。 和差立体声编码模块 56已在前面详述过, 此处不再赘 ¾_。  Figure 21 is a diagram showing a seventh embodiment of the encoding apparatus of the present invention, which is based on the tomb of Figure 13, with the addition and difference stereo encoding module 56, which is located at the output and quantize of the multiresolution analysis module 59. Between the input of the domain decoding module 53 and the input. The sum and difference stereo encoding module 56 can also be located between the quantizer group and the encoder in the quantization and entropy encoding module 53. The difference and stereo encoding module 56 has been previously detailed and is no longer referred to herein.
基于图 21所示编码装置的编码方法与基于图 13所示编码装置的编码方法基本 目 同, 区别在于增加了下述步驟: 在频域系数经过多分辨率分析后, 根据音频信号是否 为多声道信号且为信号类型一致的信号且满足编码条件, 确定是否对其进行和差立体 声编码; 然后进行后续处理。 具体的流程在上面已经介绍, 此处不再赘述。 图 22是解码装置的实施例七的示意图。 该解码装置在图 16所示的解码装置的基 础上, 增加了和差立体声解码模块 606 , 位于逆量化器组 603的输出与多分辨率综合 模块 609的输入之间。 和差立体声解码模块 606也可位于熵解码模块 602的输出与逆 量化器組 603的输入之间。和差立体声解码模块 606已在前面详述过,此处不再赘迷。 The encoding method based on the encoding apparatus shown in FIG. 21 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 13, and the difference is that the following steps are added: After the multi-resolution analysis of the frequency domain coefficients, whether the audio signal is more or not The channel signal is a signal of the same signal type and satisfies the encoding condition, and determines whether it is subjected to and differential stereo encoding; and then performs subsequent processing. The specific process has been introduced above, and will not be described here. Figure 22 is a diagram showing the seventh embodiment of the decoding apparatus. The decoding apparatus adds a difference stereo decoding module 606 between the output of the inverse quantizer group 603 and the input of the multi-resolution synthesis module 609 based on the decoding apparatus shown in FIG. The and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603. The and difference stereo decoding module 606 has been previously detailed and will not be obscured here.
基于图 22所示解码装置的解码方法与基于图 16所示解码装置的解码方法基本相 同, 区别在于增加了下述步骤: 在得到逆量化谱后, 根据信号类型分析结果与和差立 体声控制信息判断是否需要对逆量化谱进行和差立体声解码; 然后进行后续处理。 具 体的流程在上面已经介绍, 此处不再赘述。  The decoding method based on the decoding device shown in FIG. 22 is basically the same as the decoding method based on the decoding device shown in FIG. 16, except that the following steps are added: After the inverse quantization spectrum is obtained, the result and the difference stereo control information are analyzed according to the signal type. It is judged whether or not the inverse quantization spectrum needs to be subjected to differential stereo decoding; then, subsequent processing is performed. The specific process has been introduced above and will not be described here.
图 23给出了本发明编码装置的第八个实施例的示意图, 该实施例是在图 13所示 编码装置的基础上, 增加了信号性质分析模块 510, 用于对重采样模块 50输出的信号 进行信号类型分析, 并将重采样后的信号输出到时频映射模块 52 和心理声学分析模 块 51 , 同时将信号类型分析结果输出比特流复用模块 55。  Figure 23 is a diagram showing an eighth embodiment of the encoding apparatus of the present invention. The embodiment is based on the encoding apparatus shown in Figure 13, and a signal property analyzing module 510 is added for outputting the resampling module 50. The signal is subjected to signal type analysis, and the resampled signal is output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and the signal type analysis result is output to the bit stream multiplexing module 55.
信号性质分析模块 510基于自适应阈值和波形预测进行前、后向掩蔽效应分析来确定 信号的类型为緩变信号还是快变信号, 若是快变类型信号, 则继续计算突变成分的相关参 数信息, 如突变信号发生的位置以及突变信号的强度等。  The signal property analysis module 510 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.
基于图 23所示编码装置的编码方法与基于图 13所示编码装置的编码方法基本相 同, 区别在于增加了下述步骤: 分析重采样后的信号的类型, 作为信号复用的一部分。  The encoding method based on the encoding apparatus shown in Fig. 23 is basically the same as the encoding method based on the encoding apparatus shown in Fig. 13, except that the following steps are added: The type of the resampled signal is analyzed as a part of signal multiplexing.
信号类型是基于自适应阈值和波形预测进行前、 后向掩蔽效应分析来确定的, 具 体步骤是: 将输入的音频数据分解成帧; 把输入帧分解成多个子帧, 并查找各个子 帧上 PCM数据绝对值的局部最大点; 在各子帧的局部最大点中选出子帧峰值; 对某 个子帧峰值, 利用该子帧前面的多个 (典型的可取 3 个) 子帧峰值预测相对该子帧 前向延迟的多个 (典型的可取 4 个) 子帧的典型样本值; 计算该子帧峰值与所预测 出的典型样本值的差值和比值; 如果预测差值和比值都大于设定的阈值, 则判断该 子帧存在突跃信号, 确认该子帧具备后向掩蔽预回声能力的局部最大峰点, 如果在 该子桢 ^端与掩蔽峰点前 2. 5ms 处之间存在一个峰值足够小的子帧, 则判断该帧信 号属于快变类型信号; 如果预测差值和比值不大于设定的阈值, 则重复上述步骤直 到判断出该帧信号是快变类型信号或者到达最后一个子帧, 如果到达最后一个子帧 仍未判断出该帧信号是快变类型信号, 则该帧信号属于緩变类型信号。  The signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis. The specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and searching for each sub-frame. a local maximum point of the absolute value of the PCM data; selecting a subframe peak value in a local maximum point of each subframe; for a certain subframe peak, using a plurality of (typically 3) subframe peaks in front of the subframe to predict relative A typical sample value of a plurality of (typically 4) subframes of the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if the predicted difference and the ratio are both greater than The set threshold value is determined to be a sudden signal in the sub-frame, and it is confirmed that the sub-frame has a local maximum peak point of the back masking pre-echo capability, if between the sub-end and the mask peak, 2. 5 ms If there is a sub-frame with a sufficiently small peak, it is judged that the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until the judgment The frame signal is broken out to be a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly-changing type signal.
图 24 至图 28 给出了本发明编码装置的第九个实施例至第十三个实施例的示意 图。 上述实施例分别是在图 17、 图 19、 图 21、 图 9和图 11所示编码装置的基础上, 增加了信号性质分析模块 510, 用于对重采样模块 50输出的信号进行信号类型分析, 并将重采样后的信号输出到时频映射模块 52和心理声学分析模块 51, 同时将信号类 型分析结果输出比特流复用模块 55。  Figures 24 to 28 are schematic views showing the ninth to thirteenth embodiments of the encoding apparatus of the present invention. The above embodiment is based on the coding apparatus shown in FIG. 17, FIG. 19, FIG. 21, FIG. 9 and FIG. 11, respectively, and a signal property analysis module 510 is added for performing signal type analysis on the signal output by the resampling module 50. And outputting the resampled signal to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and outputting the signal type analysis result to the bit stream multiplexing module 55.
基于图 24至图 28所示编码装置的编码方法, 基本与基于图 17、 图 19、 图 21和 图 11所示编码装置的编码方法相同, 区别在于增加了下述步骤:分析重采样后的信号 的类型, 作为信号复用的一部分。  The encoding method based on the encoding apparatus shown in FIGS. 24 to 28 is basically the same as the encoding method based on the encoding apparatus shown in FIGS. 17, 19, 21, and 11, except that the following steps are added: analysis of the resampled The type of signal that is part of the signal multiplexing.
图 29给出了本发明编码装置的第十四个实施例的示意图。 本实施例是在图 23所 示编码装置的基 上, 增加了增益控制模块 511 , 接收信号性质分析模块 510输出的 音频信号, 控制快变类型信号的动态范围, 消除音频处理中的预回声,其输出连接到 时频映射模块 52和心理声学分析模块 51。  Figure 29 is a diagram showing a fourteenth embodiment of the encoding apparatus of the present invention. In this embodiment, on the basis of the encoding apparatus shown in FIG. 23, the gain control module 511 is added, the audio signal outputted by the signal property analysis module 510 is received, the dynamic range of the fast-changing type signal is controlled, and the pre-echo in the audio processing is eliminated. Its output is connected to a time-frequency mapping module 52 and a psychoacoustic analysis module 51.
增益控制模块 511只对快变类型信号进行控制, 而对緩变类型信号, 则不进行处 理, 直接输出。 对于快变类型信号, 增益控制模块 511调整信号的时域能量包络, 提 升快变点前信号的增益值, 使得快变点前、 后的时域信号幅度较为接近; 然后将调整 了时域能量包絡的时域信号输出到时 -频映射模块 52 ,同时将增益调整量输出到比特 流复用模块 55。 The gain control module 511 only controls the fast-change type signal, but does not perform the slow-change type signal. Direct, direct output. For the fast variable type signal, the gain control module 511 adjusts the time domain energy envelope of the signal, and increases the gain value of the signal before the fast change point, so that the time domain signal amplitudes before and after the fast change point are relatively close; then the time domain is adjusted. The time envelope signal of the energy envelope is output to the time-frequency mapping module 52, and the gain adjustment amount is output to the bit stream multiplexing module 55.
基于图 29所示编码装置的编码方法, 基本与基于图 23所示编码装置的编码方法 相同, 区别在于增加了下述步骤: 对经过信号类型分析的信号进行增益控制。  The encoding method based on the encoding apparatus shown in Fig. 29 is basically the same as the encoding method based on the encoding apparatus shown in Fig. 23, except that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
图 30是解码装置的实施例八的结构示意图, 该实施例是在图 16所示解码装置的 基础上增加了逆增益控制模块 610,位于频率-时间映射模块 604的输出与频带扩展模 块 605的输入之间, 接收比特流解复用模块 601输出的信号类型分析结果和增益调整 量信息, 用于调整时域信号的增益, 控制预回声。 逆增益控制模块 610接收到频率- 时间映射模块 604输出的重建时域信号后, 对快变类型信号进行控制, 而对緩变类型 信号不进行处理, 直接输出到频带扩展模块 605。 对于快变类型信号, 逆增益控制模 块 610根据增益调整量信息调整重建时域信号的能量包络, 减小快变点前信号的幅度 值, 将能量包络调回原先的前低后高的状态, 这样快变点前的量化噪声的幅度值会和 信号的幅度值一起相应地减小 , 从而控制了预回声。  FIG. 30 is a schematic structural diagram of Embodiment 8 of the decoding apparatus. The embodiment is an inverse gain control module 610 added to the output of the frequency-time mapping module 604 and the band expansion module 605. Between the inputs, the signal type analysis result and the gain adjustment amount information output by the bit stream demultiplexing module 601 are received, and the gain of the time domain signal is adjusted to control the pre-echo. After receiving the reconstructed time domain signal output by the frequency-time mapping module 604, the inverse gain control module 610 controls the fast-change type signal, and does not process the slowly-varying type signal, and directly outputs it to the band extension module 605. For the fast variable type signal, the inverse gain control module 610 adjusts the energy envelope of the reconstructed time domain signal according to the gain adjustment amount information, reduces the amplitude value of the signal before the fast change point, and adjusts the energy envelope back to the original front low and high height. State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-echo.
基于图 30所示解码装置的解码方法, 基本与基于图 16所示解码装置的解码方法 相同, 区别在于增加了下述步驟: 在对重建时域信号进行频带扩展之前, 对重建时域 信号进行逆增益控制。  The decoding method based on the decoding apparatus shown in FIG. 30 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 16, except that the following steps are added: Before the frequency band expansion of the reconstructed time domain signal is performed, the reconstructed time domain signal is performed. Inverse gain control.
图 31至图 35给出了本发明编码装置的第十五个至第十九个实施例的示意图。 这 五个实施例是分别在图 24至图 28所示编码装置的基础上,增加了增益控制模块 511 , 用于控制经过信号类型分析的音频信号的动态范围, 消除音频处理中的预回声,其输 出连接到时频映射模块 52和心理声学分析模块 51。  31 to 35 are views showing the fifteenth to nineteenth embodiments of the encoding apparatus of the present invention. The five embodiments are respectively added to the encoding device shown in FIG. 24 to FIG. 28, and a gain control module 511 is added for controlling the dynamic range of the audio signal analyzed by the signal type, and eliminating the pre-echo in the audio processing. Its output is connected to a time-frequency mapping module 52 and a psychoacoustic analysis module 51.
基于上述五个编码装置的编码方法, 基本与基于图 24至图 28所示编码装置的编 码方法相同, 区别在于增加了下述步驟: 对经过信号类型分析的信号进行增益控制。  The encoding method based on the above five encoding means is basically the same as the encoding method based on the encoding apparatus shown in Figs. 24 to 28, except that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
图 36至图 40给出了解码装置的实施例九至实施例十三的结构示意图, 上述 5个 解码装置分别在图 18、 图 20、 图 22、 图 10和图 12所示解码装置的基础上, 增加了 逆增益控制模块 610,位于频率-时间映射模块 604的输出与频带扩展模块 605的输入 之间, 接收比特流解复用模块 601输出的信号类型分析结果, 用于调整时域信号的增 益, 控制预回声。  36 to 40 are diagrams showing the configuration of Embodiments 9 to 13 of the decoding apparatus, and the above 5 decoding apparatuses are respectively based on the decoding apparatus shown in Figs. 18, 20, 22, 10 and 12. The inverse gain control module 610 is added between the output of the frequency-time mapping module 604 and the input of the band expansion module 605, and receives the signal type analysis result output by the bit stream demultiplexing module 601 for adjusting the time domain signal. Gain, control pre-echo.
基于上述五个解码装置的解码方法, 也分别与基于图 18、 图 20、 图 22、 图 10和 图 12 所示解码装置的解码方法基本相同, 区别在于增加了下述步-骤: 在对重建时域 信号进行频带扩展之前, 对重建时域信号进行逆增益控制。  The decoding methods based on the above five decoding devices are also substantially the same as the decoding methods based on the decoding devices shown in Figs. 18, 20, 22, 10, and 12, respectively, except that the following steps are added: The inverse time gain is controlled on the reconstructed time domain signal before the time domain signal is reconstructed.
最后所应说明的是, 以上实施例仅用以说明本发明的技术方案而非限制, 尽管参 照较佳实施例对本发明进行了详细说明 , 本领域的普通技术人员应当理解, 可以对本 发明的技术方案进行修改或者等同替换, 而不脱离本发明技术方案的精神和范围, 其 均应涵盖在本发明的权利要求范围当中。  It should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and are not intended to be limiting, and the present invention will be described in detail with reference to the preferred embodiments. The modifications and equivalents of the present invention are intended to be included within the scope of the appended claims.

Claims

权利要求 Rights request
1、 一种增强音频编码装置, 包括心理声学分析模块、 时频映射模块、 量化和燏 编码模块以及比特流复用模块,其特征在于,还包括频带扩展模块和重采样模块;其中 所述频带扩展模块, 用于将原始输入音频信号在整个频带上进行分析, 提取高频 部分的谱包络及表征低、 高频谱之间相关性的有关参数, 并输出到所述比特流复用模 块;  An enhanced audio encoding apparatus, comprising a psychoacoustic analysis module, a time-frequency mapping module, a quantization and chirp encoding module, and a bitstream multiplexing module, further comprising a band extension module and a resampling module; wherein the frequency band An expansion module, configured to analyze the original input audio signal over the entire frequency band, extract a spectral envelope of the high frequency portion, and correlate parameters related to the correlation between the low and high frequency spectra, and output the same to the bit stream multiplexing module;
所述重采样模块, 用于对输入音频信号进行重采样, 改变音频信号的采样率, 并 将改变采样率后的音频信号输出到所述心理声学分析模块和所述时频映射模块;  The resampling module is configured to resample the input audio signal, change the sampling rate of the audio signal, and output the audio signal after changing the sampling rate to the psychoacoustic analysis module and the time-frequency mapping module;
所述心理声学分析模块用于计算输入音频信号的掩蔽阈值和信掩比, 输出到所述 量化和熵编码模块;  The psychoacoustic analysis module is configured to calculate a masking threshold and a mask ratio of the input audio signal, and output to the quantization and entropy encoding module;
所述时频映射模块用于将时域音频信号变换为频域系数;  The time-frequency mapping module is configured to transform a time domain audio signal into a frequency domain coefficient;
所述量化和熵编码模块用于在所述心理声学分析模块输出的信掩比的控制下对 频域系数进行量化和熵编码, 并输出到所述比特流复用模块;  The quantization and entropy coding module is configured to quantize and entropy encode frequency domain coefficients under control of a mask ratio output by the psychoacoustic analysis module, and output the same to the bitstream multiplexing module;
所述比特流复用模块用于将接收到的数据进行复用, 形成音频编码的码流。  The bitstream multiplexing module is configured to multiplex the received data to form an audio coded code stream.
2、 根据权利要求 1所述的增强音频编码装置, 其特征在于, 所述重采样模块包 括低通滤波器和下采样器; 其中所述低通滤波器用于限制音频信号的频带, 所述下采 样器用于对限制频带的音频信号进行下采样, 降低信号的采样率。  2. The enhanced audio encoding apparatus according to claim 1, wherein the resampling module comprises a low pass filter and a down sampler; wherein the low pass filter is for limiting a frequency band of the audio signal, the lower The sampler is used to downsample the audio signal of the limited frequency band and reduce the sampling rate of the signal.
3、 根据权利要求 1所述的增强音频编码装置, 其特征在于, 所述频带扩展模块 包括参数提取模块和譜包络提取模块; 所述参数提取模块用于提取输入信号在不同时 频区域表示输入信号谱特性的参数; 所述谱包络提取模块用于以一定的时频分辨率估 计信号高频部分的谱包络, 然后将输入信号的谱特性参数和高频部分的谱包络输出到 所述比特流复用模块。  The enhanced audio encoding apparatus according to claim 1, wherein the frequency band expansion module comprises a parameter extraction module and a spectral envelope extraction module; and the parameter extraction module is configured to extract input signals in different time-frequency regions. a parameter for inputting a spectral characteristic of the signal; the spectral envelope extraction module is configured to estimate a spectral envelope of a high frequency portion of the signal with a certain time-frequency resolution, and then output a spectral characteristic parameter of the input signal and a spectral envelope of the high frequency portion To the bitstream multiplexing module.
4、 根据权利要求 1所述的增强音频编码装置, 其特征在于,还包括频域线性预 测及矢量量化模块, 位于所述时频映射模块的输出与所述量化和熵编码模块的输入之 间, 具体由线性预测分析器、 线性预测滤波器、 转换器和矢量量化器构成;  4. The enhanced audio encoding apparatus of claim 1, further comprising a frequency domain linear prediction and vector quantization module between the output of the time-frequency mapping module and the input of the quantization and entropy encoding module Specifically composed of a linear predictive analyzer, a linear predictive filter, a converter, and a vector quantizer;
所述线性预测分析器,用于对频域系数进行预测分析,得到预测增益和预测系数, 并将满足一定条件的频域系数输出到所述线性预测滤波器; 对于不满足条件的频域系 数直接输出到所述量化和熵编码模块;  The linear predictive analyzer is configured to perform predictive analysis on frequency domain coefficients, obtain prediction gains and prediction coefficients, and output frequency domain coefficients satisfying certain conditions to the linear prediction filter; for frequency domain coefficients that do not satisfy the condition Directly output to the quantization and entropy coding module;
所述线性预测滤波器, 用于对频域系数进行线性预测误差滤波, 得到残差序列, 并将残差序列输出到所述量化和熵编码模块, 将预测系数输出到转换器;  The linear prediction filter is configured to perform linear prediction error filtering on the frequency domain coefficients to obtain a residual sequence, and output the residual sequence to the quantization and entropy coding module, and output the prediction coefficients to the converter;
所述转换器, 用于将预测系数转换成线谱对频率系数;  The converter is configured to convert a prediction coefficient into a line spectrum pair frequency coefficient;
所述矢量量化器, 用于对线谱对频率系数进行多级矢量量化, 有关边信息被传送 到所述比特流复用模块。  The vector quantizer is configured to perform multi-level vector quantization on the line spectrum on the frequency coefficient, and the related side information is transmitted to the bit stream multiplexing module.
5、 根据权利要求 1所述的增强音频编码装置 , 其特征在于,还包括多分辨率分 析模块, 位于所述时频映射模块的输出与所述量化和熵编码模块的输入之间, 接收所 述心理声学分析模块输出的信号类型分析结果, 用于对快变类型信号进行多分辨率分 析; 具体包括频域系数变换模块和重组模块, 其中所述频域系数变换模块是频域小波 变换滤波器组或频域 MDCT 变换滤波器组, 用于将频域系数变换为时频平面系数; 所 述重组模块用于将时频平面系数按照一定的规则进行重组。  5. The enhanced audio encoding apparatus of claim 1, further comprising a multi-resolution analysis module located between an output of the time-frequency mapping module and an input of the quantization and entropy encoding module, The signal type analysis result output by the psychoacoustic analysis module is used for multi-resolution analysis of the fast variable type signal; specifically, the frequency domain coefficient transform module and the recombination module, wherein the frequency domain coefficient transform module is a frequency domain wavelet transform filter a group or a frequency domain MDCT transform filter bank for transforming frequency domain coefficients into time-frequency plane coefficients; and the reassembly module is configured to recombine time-frequency plane coefficients according to a certain rule.
6、 根据权利要求 5所述的增强音频编码装置, 其特征在于,还包括频域线性预 测及矢量量化模块, 位于所述时频映射模块的输出与所述多分辨率分析模块的输入之 间, 用于对频域系数进行线性预测和多级矢量量化, 输出残差序列到所述多分辨率分 析模块, 同时将边信息输出到所述比特流复用模块。 6. The enhanced audio encoding apparatus according to claim 5, further comprising a frequency domain linear pre- And a vector quantization module, located between an output of the time-frequency mapping module and an input of the multi-resolution analysis module, for performing linear prediction and multi-level vector quantization on frequency domain coefficients, and outputting a residual sequence to the The multi-resolution analysis module simultaneously outputs side information to the bit stream multiplexing module.
7、 根据权利要求 6所述的增强音频编码装置,其特征在于,还包括和差立体声 编码模块, 位于所述多分辨率分析模块的输出与所述量化和熵编码模块的输入之间或 者位于所述量化和熵编码模块中的量化器组和编码器之间, 接收所述心理声学分析模 块输出的信号类型分析结果。  7. The enhanced audio encoding apparatus of claim 6 further comprising a sum and difference stereo encoding module between or located between an output of said multi-resolution analysis module and an input of said quantization and entropy encoding module Between the quantizer group and the encoder in the quantization and entropy coding module, the signal type analysis result output by the psychoacoustic analysis module is received.
8、 一种增强音频编码方法, 其特征在于, 包括以下步驟:  8. An enhanced audio coding method, comprising the steps of:
步骤一、在整个频带上分析输入音频信号,提取其高频谱包络和信号 特性参数; 步骤二、 对输入音频信号进行重采样处理;  Step 1: analyzing the input audio signal over the entire frequency band, extracting its high spectral envelope and signal characteristic parameters; Step 2, resampling the input audio signal;
步骤三、 计算重采样后信号的信掩比;  Step 3: calculating a signal mask ratio of the signal after resampling;
步骤四、 对重采样后的信号进行时频映射, 获得音频信号的频域系数; 步骤五、 对频域系数进行量化和熵编码;  Step 4: performing time-frequency mapping on the resampled signal to obtain a frequency domain coefficient of the audio signal; Step 5: performing quantization and entropy coding on the frequency domain coefficient;
步骤六、 将高频谱包络、 信号谱特性参数和编码后的音频信号进行复用, 得到压 缩音频码流。  Step 6: multiplexing the high spectral envelope, the signal spectral characteristic parameter and the encoded audio signal to obtain a compressed audio code stream.
9、 根据权利要求 8所述增强音频编码方法,其特征在于,所述步镢五的量化是 标量量化, 具体包括: 对所有尺度因子带中的频域系数进行非线性压缩; 再利用每个 子带的尺度因子对该子带的频域系数进行量化, 得到整数表示的量化谱; 选择每帧信 号中的第一个尺度因子作为公共尺度因子; 其它尺度因子与其前一个尺度因子进行差 分处理;  The enhanced audio coding method according to claim 8, wherein the quantization of the step 是 is scalar quantization, specifically comprising: performing nonlinear compression on frequency domain coefficients in all scale factor bands; The scale factor of the band quantizes the frequency domain coefficients of the subband to obtain a quantized spectrum represented by an integer; selects the first scale factor in each frame signal as a common scale factor; and other scale factors are differentially processed with the previous scale factor;
所述熵编码具体包括: 对量化语和差分处理后的尺度因子进行熵编码, 得到码书 序号、 尺度因子编码值和量化谱的无损编码值; 对码书序号进行熵编码, 得到码书序 号的编码值。  The entropy coding specifically includes: entropy coding the quantized words and the scaled factors after the difference processing, obtaining a codebook serial number, a scale factor coded value, and a lossless coded value of the quantized spectrum; entropy coding the code book serial number to obtain a code book serial number The encoded value.
10、 根据权利要求 8所述增强音频编码方法,其特征在于,在所述步驟四与步驟 五之间, 还包括: 对频域系数进行标准的线性预测分析, 得到预测增益和预测系数; 判断预测增益是否超过设定的阈值, 如果超过, 则根据预测系数对频域系数进行频域 线性预测误差滤波, 得到频域系数的预测残差序列; 将预测系数转换成线谱对频率系 数, 并对线谱对频率系数进行多级矢量量化处理, 得到边信息; 对残差序列进行量化 和熵编码; 如果预测增益未超过设定的阈值, 则对频域系数进行量化和熵编码。  10. The enhanced audio coding method according to claim 8, wherein between step 4 and step 5, the method further comprises: performing standard linear prediction analysis on the frequency domain coefficients to obtain prediction gain and prediction coefficients; Whether the predicted gain exceeds a set threshold, if exceeded, frequency domain linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient, and a prediction residual sequence of the frequency domain coefficient is obtained; the prediction coefficient is converted into a line spectrum pair frequency coefficient, and Multi-level vector quantization processing is performed on the frequency spectrum to obtain the side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold, the frequency domain coefficients are quantized and entropy encoded.
11、 根据权利要求 8所述增强音频编码方法, 其特征在于,在所述步灘四与步驟 五之间还包括: 如果是快变类型信号, 则对频域系数进行多分辨率分析; 如果不是快 变类型信号, 则直接将频域系数进行量化和嫡编码; 其中  The enhanced audio encoding method according to claim 8, wherein the step 4 and the fifth step further comprise: if it is a fast variable type signal, performing multi-resolution analysis on the frequency domain coefficients; Instead of fast-changing type signals, the frequency domain coefficients are directly quantized and encoded;
所述多分辨率分析步驟包括: 对频域系数进行 MDCT 变换, 得到时频平面系数; 将上述时频平面系数按照一定的规则重组;  The multi-resolution analysis step includes: performing MDCT transform on the frequency domain coefficients to obtain time-frequency plane coefficients; and reconstructing the time-frequency plane coefficients according to a certain rule;
所述重组包括: 先将时频平面系数按频率方向组织, 每个频带中的系数按时间方 向组织, 然后将组织好的系数按照子窗、 尺度因子带的顺序排列。  The recombination includes: firstly, the time-frequency plane coefficients are organized in a frequency direction, the coefficients in each frequency band are organized in a time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor band.
12、 根据权利要求 11所述增强音频编码方法, 其特征在于, 在执行所述多分辨 率分析步骤之前, 还包括: 对频域系数进行标准的线性预测分析, 得到预测增益和预 测系数;判断预测增益是否超过设定的阐值,如果超过, 则根据预测系数对频域系数进 行频域线性预测误差滤波, 得到频域系数的预测残差序列; 将预测系数转换成线谱对 频率系数, 并对线谱对频率系数进行多级矢量量化处理,得到边信息; 对残差序列进 行多分辨率分析;如果预测增益未超过设定的阈值,则对频域系数进行多分辨率分析。The enhanced audio coding method according to claim 11, wherein before performing the multi-resolution analysis step, the method further comprises: performing standard linear prediction analysis on the frequency domain coefficients to obtain prediction gain and prediction coefficients; Whether the predicted gain exceeds the set value, if it exceeds, frequency domain linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient, and the prediction residual sequence of the frequency domain coefficient is obtained; and the prediction coefficient is converted into a line spectrum pair frequency coefficient, And multi-level vector quantization processing on the frequency spectrum of the line spectrum to obtain the side information; Multi-resolution analysis of the line; multi-resolution analysis of the frequency domain coefficients if the predicted gain does not exceed the set threshold.
13、 根据权利要求 12所述增强音频编码方法, 其特征在于, 所述步骤五进一步 包括: 对残差序列进行量化; 判断音频信号是否为多声道信号, 如果是多声道信号, 则判断左、 右声道信号的信号类型是否一致, 如果信号类型一致, 则判断两声道对应 的尺度因子带之间是否满足和差立体声编码条件, 如果满足, 则对该尺度因子带中的 谱系数进行和差立体声编码, 得到和差声道的残差序列; 如果不满足, 则该尺度因子 带中的谱系数不进行和差立体声编码; 如果是单声道信号或信号类型不一致的多声道 信号, 则对残差序列不进行处理; 对残差序列进行熵编码; 其中 The enhanced audio encoding method according to claim 12, wherein the step 5 further comprises: quantifying the residual sequence; determining whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, determining Whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo coding conditions. If yes, the spectral coefficients in the scale factor band are satisfied. Performing and difference stereo coding to obtain a residual sequence of the difference channel; if not, the spectral coefficients in the scale factor band are not subjected to and differential stereo coding; if it is a mono signal or a signal type inconsistent multi-channel Signal, then the residual sequence is not processed; the residual sequence is entropy encoded;
所述判断尺度因子带是否满足编码条件的方法是: K- L变换, 具体是: 计算左右 声道尺度因子带的谱系数的相关矩阵; 对相关矩阵进行 K-L变换; 如果旋转角度 α的 绝对值偏离; ζ74较小时 (如3 /16 <
Figure imgf000029_0001
) , 则两声道信号对应的尺度因子带可 以进行和差立体声编码; 所述和差立体声编码为: 其中: 表示量化后的和声道频域
The method for judging whether the scale factor band satisfies the coding condition is: K-L transformation, specifically: calculating a correlation matrix of spectral coefficients of the left and right channel scale factor bands; performing KL transformation on the correlation matrix; if the absolute value of the rotation angle α Deviation; ζ74 is small (eg 3 / 16 <
Figure imgf000029_0001
And the scale factor band corresponding to the two-channel signal can be subjected to and the difference stereo coding; wherein the sum and the difference stereo coding are: wherein: the quantized sum channel frequency domain is represented
Figure imgf000029_0002
Figure imgf000029_0002
系数; 表示量化后的差声道频域系数; £表示量化后的左声道频域系数; 表示量 化后的右声道频域系数。 Coefficient; represents the quantized difference channel frequency domain coefficient; £ represents the quantized left channel frequency domain coefficient; represents the quantized right channel frequency domain coefficient.
14、 一种增强音频解码装置, 包括: 比特流解复用模块、 熵解码模块、 逆量化器 组、 频率-时间映射模块, 其特征在于, 还包括频带扩展模块;  An enhanced audio decoding device, comprising: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, and a frequency-time mapping module, further comprising a band extension module;
所述比特流解复用模块用于对压缩音频数据流进行解复用, 并向所述熵解码模块 和所述频带扩展模块输出相应的数据信号和控制信号;  The bitstream demultiplexing module is configured to demultiplex the compressed audio data stream, and output corresponding data signals and control signals to the entropy decoding module and the band extension module;
所述熵解码模块用于对上述信号进行解码处理, 恢复谱的量化值, 输出到所述逆 量化器组;  The entropy decoding module is configured to perform decoding processing on the foregoing signal, recover a quantized value of the spectrum, and output the result to the inverse quantizer group;
所述逆量化器组用于重建逆量化谱, 并输出到所述到频率 -时间映射模块; 所述频率-时间映射模块用于对谱系数进行频率-时间映射, 得到低频带的时域音 频信号;  The inverse quantizer group is configured to reconstruct an inverse quantization spectrum and output to the frequency-time mapping module; the frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients to obtain a low-band time domain audio. Signal
所述频带扩展模块, 用于接收所述比特流解复用模块和所述频率 -时间映射模块 输出的频带扩展控制信息和低频段的时域音频信号, 重建高频信号部分, 输出宽频带 音频信号。  The band extension module is configured to receive the band extension control information output by the bit stream demultiplexing module and the frequency-time mapping module, and the time domain audio signal of the low frequency band, reconstruct the high frequency signal part, and output the broadband audio signal.
15、 根据权利要求 14所述的增强音频解码装置, 其特征在于, 还包括和差立体 声解码模块, 位于所述逆量化器组的输出与所述频率-时间映射模块的输入之间或者 位于所述熵解码模块的输出与所述逆量化器组的输入之间, 接收所述比特流解复用模 块输出的和差立体声控制信号, 用于根据和差立体声控制信息将和差声道的逆量化谱 /谱的量化值转换成左右声道的逆量化谱 /谱的量化值。  15. The enhanced audio decoding device of claim 14, further comprising a sum and difference stereo decoding module, located between an output of the inverse quantizer group and an input of the frequency-time mapping module or located at Between the output of the entropy decoding module and the input of the inverse quantizer group, the sum and difference stereo control signals output by the bit stream demultiplexing module are received, and are used to inverse the sum channel and the difference channel according to the difference stereo control information. The quantized value of the quantized spectrum/spectrum is converted into a quantized value of the inverse quantized spectrum/spectrum of the left and right channels.
16、 根据权利要求 14所述的增强音频解码装置, 其特征在于, 还包括逆频域线 性预测及矢量量化模块, 位于所述逆量化器组的输出与所述频率-时间映射模块的输 入之间, 接收所述比特流解复用模块输出的逆频域线性预测矢量量化控制信息, 对逆 量化谱进行线性预测合成过程, 得到预测前的谱, 并输出到所述频率-时间映射模块 中; 具体包括逆矢量量化器、 逆转换器和逆线性预测滤波器; 所述逆矢量量化器用于 对码字索引进行逆量化, 得到线谱对频率系数; 所述逆转换器则用于将线语对频率系 数逆转换为预测系数; 所述逆线性预测滤波器用于根据预测系数将逆量化谱进行线性 预测合成过程, 得到预测前的谱。 16. The enhanced audio decoding device of claim 14, further comprising an inverse frequency domain linear prediction and vector quantization module, located at an output of the inverse quantizer group and an input of the frequency-time mapping module And receiving the inverse frequency domain linear predictive vector quantization control information output by the bit stream demultiplexing module, performing a linear prediction synthesis process on the inverse quantized spectrum, obtaining a pre-predicted spectrum, and outputting to the frequency-time mapping module Specifically comprising an inverse vector quantizer, an inverse converter and an inverse linear prediction filter; the inverse vector quantizer is configured to inverse quantize the codeword index to obtain a line spectrum pair frequency coefficient; and the inverse converter is used for the line Language pair frequency system The inverse of the number is converted into a prediction coefficient; the inverse linear prediction filter is configured to perform a linear prediction synthesis process on the inverse quantization spectrum according to the prediction coefficient to obtain a spectrum before prediction.
17、 根据权利要求 16所述的增强音频解码装置, 其特征在于, 还包括和差立体 声解码模块, 位于所述逆量化器组的输出与所述逆频域线性预测及矢量量化模块的输 入之间或者位于所述熵解码模块的输出与所述逆量化器组的输入之间, 接收所述比特 ¾¾解复用模块输出的和差立体声控制信号, 用于根据和差立体声控制信息将和差声道 ό々逆量化镨 /谱的量化值转换成左右声道的逆量化借/谱的量化值。  17. The enhanced audio decoding device of claim 16, further comprising a sum and difference stereo decoding module located at an output of the inverse quantizer group and an input of the inverse frequency domain linear prediction and vector quantization module Or between the output of the entropy decoding module and the input of the inverse quantizer group, receiving the sum and difference stereo control signals output by the bit 3⁄4⁄4 demultiplexing module, for sum and difference according to the sum and difference stereo control information The quantized value of the channel humble quantization 镨/spectrum is converted into the quantized value of the inverse quantized borrow/spectrum of the left and right channels.
18、 根据权利要求 14所述的增强音频解码装置, 其特征在于, 还包括多分辨率 综合模块, 位于所述逆量化器组的输出与所述频率-时间映射模块的输入之间, 接收 所述比特流解复用模块输出的信号类型分析结果, 用于对逆量化谱进行多分辨率综 合; 所述多分辨率综合模块具体包括: 系数重组模块和系数变换模块; 所述系数变换 模块是频域逆小波变换滤波器组或频域逆修正离散余弦变换滤波器组。  18. The enhanced audio decoding device of claim 14, further comprising a multi-resolution synthesis module located between an output of the inverse quantizer group and an input of the frequency-time mapping module, The signal type analysis result output by the bit stream demultiplexing module is used for multi-resolution synthesis of the inverse quantization spectrum; the multi-resolution synthesis module specifically includes: a coefficient recombination module and a coefficient transformation module; Frequency domain inverse wavelet transform filter bank or frequency domain inverse modified discrete cosine transform filter bank.
19、 根据权利要求 18所述的增强音频解码装置, 其特征在于, 还包括逆频域线 '^预测及矢量量化模块, 位于所述逆量化器组的输出与所述多分辨率综合模块的输入 之间或者位于所述多分辨率综合模块的输出与所述频率-时间映射模块的输入之间, 接收所述比特流解复用模块输出的逆频域线性预测矢量量化控制信息, 用于对逆量化 錯 /多分辨率综合后的逆量化谱进行逆量化处理和线性预测合成过程, 得到预测前的 镨。  19. The enhanced audio decoding device of claim 18, further comprising an inverse frequency domain prediction and vector quantization module, the output of the inverse quantizer group and the multi-resolution synthesis module Between input or between an output of the multi-resolution synthesis module and an input of the frequency-time mapping module, receiving inverse frequency-domain linear prediction vector quantization control information output by the bit stream demultiplexing module, The inverse quantization process and the linear prediction synthesis process are performed on the inverse quantization spectrum after the inverse quantization error/multiresolution synthesis, and the prediction before the prediction is obtained.
20、 根据权利要求 19所述的增强音频解码装置, 其特征在于, 还包括和差立体 声解码模块, 位于所述逆量化器组的输出与所述逆频域线性预测及矢量量化模块的输 入或所述多分辨率综合模块之间 , 或者位于所述熵解码模块的输出与所述逆量化器组 ^;输入之间, 接收所述比特流解复用模块输出的和差立体声控制信号, 用于根据和差 立体声控制信息将和差声道的逆量化谱 /谱的量化值转换成左右声道的逆量化谱 /谱 ό 量化值。  20. The enhanced audio decoding device of claim 19, further comprising a sum and difference stereo decoding module, located at an output of the inverse quantizer group and an input of the inverse frequency domain linear prediction and vector quantization module or Between the multi-resolution synthesis modules, or between the output of the entropy decoding module and the input of the inverse quantizer group, receiving the sum and difference stereo control signals output by the bit stream demultiplexing module, The quantized values of the inverse quantized spectrum/spectrum of the difference channel and the difference channel are converted into inverse quantized spectral/spectral 量化 quantized values of the left and right channels.
21、 一种增强音频解码方法, 其特征在于, 包括以下步驟:  21. An enhanced audio decoding method, comprising the steps of:
步骤一、 对压缩音频数据流进行解复用, 得到数据信息和控制信息;  Step 1: Demultiplexing the compressed audio data stream to obtain data information and control information;
步驟二、 对上述信息进行熵解码, 得到谱的量化值;  Step 2: Entropy decoding the above information to obtain a quantized value of the spectrum;
步骤三、 对谱的量化值进行逆量化处理, 得到逆量化借;  Step 3: performing inverse quantization processing on the quantized value of the spectrum to obtain inverse quantization borrowing;
步骤四、 将逆量化谱进行频率-时间映射, 得到时域音频信号;  Step 4: performing frequency-time mapping on the inverse quantization spectrum to obtain a time domain audio signal;
步骤五、 根据频带扩展控制信息, 重建时域音频信号的高频部分, 得到宽频带音 频信号。  Step 5: reconstruct a high frequency portion of the time domain audio signal according to the band extension control information to obtain a broadband audio signal.
22、 根据权利要求 21所述的增强音频解码方法, 其特征在于, 所述步糠四进一 步包括: 对逆量化谱进行逆修正离散余弦变换, 得到变换后的时域信号; 对变换后的 时域信号在时域进行加窗处理; 对上述加窗时域信号进行叠加处理, 得到时域音频信 号; 其中所述加窗处理中的窗函数为:  The enhanced audio decoding method according to claim 21, wherein the step 4 further comprises: performing an inverse modified discrete cosine transform on the inverse quantized spectrum to obtain a transformed time domain signal; The domain signal is windowed in the time domain; superimposing the windowed time domain signal to obtain a time domain audio signal; wherein the window function in the windowing process is:
w{ -k) =cos {pill* ( +0. 5) / -0. 94* s in (2 (1+0. 5) ) / (2* i) ) ) , 其中 pi 为圆周率, k = 0. . . 71^-1 ; 表示窗函数的第 k个系数, 有 w (k) =w (2*7^1- D ; N 表示编码帧的样本数。  w{ -k) =cos {pill* ( +0. 5) / -0. 94* s in (2 (1+0. 5) ) / (2* i) ) ) , where pi is the pi, k = 0. . . 71 ^-1 ; represents the kth coefficient of the window function, with w (k) = w (2*7^1- D ; N represents the number of samples of the encoded frame.
23、 根据权利要求 21所述的增强音频解码方法, 其特征在于, 在所述步驟二与 步骤三之间, 还包括: 如果信号类型分析结果表明信号类型一致, 则根据和差立体声 控制信号判断是否需要对逆量化谱进行和差立体声解码; 如果需要, 则根据每个尺度 因子带上的标志位判断该尺度因子带是否需要和差立体声解码, 如果需要, 则将该尺 度因子带中的和差声道的逆量化谱转换成左右声道的逆量化谱, 转至步骤三; 如果信 号类型不一致或者不需要进行和差立体声解码, 则对逆量化谱不进行处理, 转至步骤 三; 在上述步骤中, 所述和差立体声解码是: , 其中: 表示逆量化
Figure imgf000031_0001
Figure imgf000031_0002
后的和声道频域系数; s表示逆量化后的差声道频域系数; z表示逆量化后的左声道 频域系数; f表示逆量化后的右声道频域系数。
The enhanced audio decoding method according to claim 21, further comprising: between the step 2 and the step 3, further comprising: if the signal type analysis result indicates that the signal type is consistent, determining according to the difference and the stereo control signal Whether it is necessary to perform inverse stereo quantization on the inverse quantization spectrum; if necessary, according to each scale The flag bit on the factor band determines whether the scale factor band requires and the difference stereo decoding, and if necessary, converts the inverse quantized spectrum of the difference channel in the scale factor band into the inverse quantized spectrum of the left and right channels, and proceeds to the step If the signal type is inconsistent or does not need to perform and the difference stereo decoding, the inverse quantization spectrum is not processed, and the process proceeds to step 3. In the above step, the sum difference stereo decoding is: , where: represents inverse quantization
Figure imgf000031_0001
Figure imgf000031_0002
The following sum channel frequency domain coefficients; s represents the inverse quantized difference channel frequency domain coefficients; z represents the inverse quantized left channel frequency domain coefficients; f represents the inverse quantized right channel frequency domain coefficients.
24、 根据权利要求 21所述的增强音频解码方法, 其特征在于, 在所述步骤三与 步骤四之间, 还包括: 判断控制信息中是否包含有逆量化谱需要经过逆频域线性预测 矢量量化的信息, 如果含有, 则进行逆矢量量化处理, 得到预测系数, 并对逆量化谱 进行线性预测合成, 得到预测前的谱; 将预测前的谱进行频率 -时间映射; 其中所述 逆矢量量化处理进一步包括: 从控制信息中获得预测系数矢量量化后的码字索引; 再 才艮据码字索引得到量化的线谱对频率系数, 并由线诿对频率系数计算出预测系数。  The enhanced audio decoding method according to claim 21, further comprising: determining whether the inverse quantization spectrum needs to pass the inverse frequency domain linear prediction vector in the control information between step 3 and step 4 The quantized information, if included, is subjected to inverse vector quantization processing to obtain a prediction coefficient, and linearly predictively synthesizes the inverse quantized spectrum to obtain a spectrum before prediction; and performs frequency-time mapping on the spectrum before prediction; wherein the inverse vector The quantization process further includes: obtaining a codeword index of the prediction coefficient vector quantization from the control information; obtaining the quantized line spectrum pair frequency coefficient according to the codeword index, and calculating the prediction coefficient from the line coefficient to the frequency coefficient.
25、 根据权利要求 24所述的增强音频解码方法, 其特征在于, 在所述步驟二与 步骤三之间, 还包括: 如果信号类型分析结果表明信号类型一致, 则根据和差立体声 控制信号判断是否需要对谱的量化值进行和差立体声解码; 如果需要, 则根据每个尺 度因子带上的标志位判断该尺度因子带是否需要和差立体声解码, 如果需要, 则将该 尺度因子带中的和差声道的谱的量化值转换成左右声道的谱的量化值, 转至步骤三; 如果信号类型不一致或者不需要进行和差立体声解码, 则对谱的量化值不进行处理, 转至步骤三。  The enhanced audio decoding method according to claim 24, further comprising: between the step 2 and the step 3, further comprising: if the signal type analysis result indicates that the signal type is consistent, determining according to the difference and the stereo control signal Whether it is necessary to perform differential and stereo decoding on the quantized values of the spectrum; if necessary, determine whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, in the scale factor band And the quantized value of the spectrum of the difference channel is converted into the quantized value of the spectrum of the left and right channels, and the process proceeds to step 3; if the signal type is inconsistent or does not need to perform the difference and the stereo decoding, the quantized value of the spectrum is not processed, and the process proceeds to Step three.
26、 根据权利要求 21所述的增强音频解码方法, 其特征在于, 在所述步骤三与 所述步骤四之间, 还包括: 对逆量化谱进行多分辨率综合的步骤; 具体是: 对逆量化 谱系数按照子窗、 尺度因子带的顺序排列, 再按照频序进行重组, 然后对重组的系数 进行多个逆修正离散余弦变换, 得到多分辨率分析前的逆量化谱。  The enhanced audio decoding method according to claim 21, further comprising: step of performing multi-resolution synthesis on the inverse quantization spectrum between the step 3 and the step 4; The inverse quantized spectral coefficients are arranged in the order of sub-window and scale factor bands, and then recombined according to the frequency sequence, and then multiple inverse modified cosine transforms are performed on the recombined coefficients to obtain an inverse quantized spectrum before multi-resolution analysis.
27、 根据权利要求 26所述的增强音频解码方法, 其特征在于, 在执行多分辨率 综合步骤之前, 还包括: 判断控制信息中是否包含有逆量化谱需要经过逆频域线性预 测矢量量化的信息, 如果含有, 则进行逆矢量量化处理, 得到预测系数, 并对残差谱 进行线性预测合成, 得到预测前的谱; 将预测前的谱进行多分辨率综合。  The enhanced audio decoding method according to claim 26, further comprising: determining whether the inverse quantization spectrum needs to undergo inverse frequency domain linear prediction vector quantization before performing the multi-resolution synthesis step The information, if included, is subjected to inverse vector quantization processing to obtain prediction coefficients, and the residual spectrum is linearly predicted and synthesized to obtain a spectrum before prediction; and the spectrum before prediction is subjected to multi-resolution synthesis.
28、 根据权利要求 26所述的增强音频解码方法, 其特征在于, 在执行所述步骤 四之前, 还包括: 判断控制信息中是否包含有逆量化谱需要经过逆频域线性预测矢量 量化的信息, 如果含有, 则进行逆矢量量化处理, 得到预测系数, 并对残差谱进行线 >!·生预测合成, 得到预测前的谱; 将预测前的谱进行频率-时间映射。  The enhanced audio decoding method according to claim 26, further comprising: determining whether the inverse quantization spectrum needs to be subjected to inverse frequency domain linear prediction vector quantization before the step 4 is performed. If it is, the inverse vector quantization process is performed to obtain a prediction coefficient, and the residual spectrum is subjected to line>!·sheng prediction synthesis to obtain a spectrum before prediction; and the spectrum before prediction is subjected to frequency-time mapping.
29、 据权利要求 26至 28任一所述的增强音频解码方法,其特征在于,在所述 步骤二与所述步骤三之间, 还包括: 如果信号类型分析结果表明信号类型一致, 则根 据和差立体声控制信号判断是否需要对借的量化值进行和差立体声解码; 如果需要, 则根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立体声解码, 如果 需要, 则将该尺度因子带中的和差声道的谱的量化值转换成左右声道的谱的量化值, 转至步骤三; 如果信号类型不一致或者不需要进行和差立体声解码, 则对谱的量化值 不进行处理, 转至步驟三。  The enhanced audio decoding method according to any one of claims 26 to 28, characterized in that, between the step 2 and the step 3, the method further comprises: if the signal type analysis result indicates that the signal types are consistent, according to And the difference stereo control signal determines whether it is necessary to perform the difference and the stereo decoding on the borrowed quantized value; if necessary, judge whether the scale factor band needs the difference and the stereo decoding according to the flag bit on each scale factor band, if necessary, The quantized value of the spectrum of the difference channel in the scale factor band is converted into the quantized value of the spectrum of the left and right channels, and the process proceeds to step 3; if the signal type is inconsistent or does not need to perform and the difference stereo decoding, the quantized value of the spectrum Do not process, go to step three.
PCT/CN2004/001034 2004-04-01 2004-09-09 Enhanced audio encoding and decoding equipment, method thereof WO2005096508A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200410030947 2004-04-01
CN200410030947.3 2004-04-01

Publications (1)

Publication Number Publication Date
WO2005096508A1 true WO2005096508A1 (en) 2005-10-13

Family

ID=35064123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2004/001034 WO2005096508A1 (en) 2004-04-01 2004-09-09 Enhanced audio encoding and decoding equipment, method thereof

Country Status (1)

Country Link
WO (1) WO2005096508A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008031458A1 (en) * 2006-09-13 2008-03-20 Telefonaktiebolaget Lm Ericsson (Publ) Methods and arrangements for a speech/audio sender and receiver
CN101136202B (en) * 2006-08-29 2011-05-11 华为技术有限公司 Sound signal processing system, method and audio signal transmitting/receiving device
CN102150202A (en) * 2008-07-14 2011-08-10 三星电子株式会社 Method and apparatus to encode and decode an audio/speech signal
CN110022463A (en) * 2019-04-11 2019-07-16 重庆紫光华山智安科技有限公司 Video interested region intelligent coding method and system are realized under dynamic scene
CN110956970A (en) * 2019-11-27 2020-04-03 广州市百果园信息技术有限公司 Audio resampling method, device, equipment and storage medium
US20210343299A1 (en) * 2019-01-13 2021-11-04 Huawei Technologies Co., Ltd. High resolution audio coding
US11594235B2 (en) 2013-07-22 2023-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling in multichannel audio coding
CN116866972A (en) * 2023-08-01 2023-10-10 江苏苏源杰瑞科技有限公司 Communication monitoring system and method based on dual-mode communication

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1213935A (en) * 1997-09-03 1999-04-14 松下电器产业株式会社 Apparatus of layered picture coding apparatus of picture decoding, methods of picture decoding, apparatus of recoding for digital broadcasting signal, and apparatus of picture and audio decoding
EP1351218A2 (en) * 2002-03-06 2003-10-08 Kabushiki Kaisha Toshiba Audio signal reproducing method and an apparatus for reproducing the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1213935A (en) * 1997-09-03 1999-04-14 松下电器产业株式会社 Apparatus of layered picture coding apparatus of picture decoding, methods of picture decoding, apparatus of recoding for digital broadcasting signal, and apparatus of picture and audio decoding
EP1351218A2 (en) * 2002-03-06 2003-10-08 Kabushiki Kaisha Toshiba Audio signal reproducing method and an apparatus for reproducing the same

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136202B (en) * 2006-08-29 2011-05-11 华为技术有限公司 Sound signal processing system, method and audio signal transmitting/receiving device
CN101512639B (en) * 2006-09-13 2012-03-14 艾利森电话股份有限公司 Method and equipment for voice/audio transmitter and receiver
WO2008031458A1 (en) * 2006-09-13 2008-03-20 Telefonaktiebolaget Lm Ericsson (Publ) Methods and arrangements for a speech/audio sender and receiver
CN102150202A (en) * 2008-07-14 2011-08-10 三星电子株式会社 Method and apparatus to encode and decode an audio/speech signal
US8532982B2 (en) 2008-07-14 2013-09-10 Samsung Electronics Co., Ltd. Method and apparatus to encode and decode an audio/speech signal
US9355646B2 (en) 2008-07-14 2016-05-31 Samsung Electronics Co., Ltd. Method and apparatus to encode and decode an audio/speech signal
US9728196B2 (en) 2008-07-14 2017-08-08 Samsung Electronics Co., Ltd. Method and apparatus to encode and decode an audio/speech signal
US11887611B2 (en) 2013-07-22 2024-01-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling in multichannel audio coding
US11594235B2 (en) 2013-07-22 2023-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling in multichannel audio coding
US20210343299A1 (en) * 2019-01-13 2021-11-04 Huawei Technologies Co., Ltd. High resolution audio coding
US11735193B2 (en) * 2019-01-13 2023-08-22 Huawei Technologies Co., Ltd. High resolution audio coding
CN110022463A (en) * 2019-04-11 2019-07-16 重庆紫光华山智安科技有限公司 Video interested region intelligent coding method and system are realized under dynamic scene
CN110956970A (en) * 2019-11-27 2020-04-03 广州市百果园信息技术有限公司 Audio resampling method, device, equipment and storage medium
CN110956970B (en) * 2019-11-27 2023-11-14 广州市百果园信息技术有限公司 Audio resampling method, device, equipment and storage medium
CN116866972B (en) * 2023-08-01 2024-01-30 江苏苏源杰瑞科技有限公司 Communication monitoring system and method based on dual-mode communication
CN116866972A (en) * 2023-08-01 2023-10-10 江苏苏源杰瑞科技有限公司 Communication monitoring system and method based on dual-mode communication

Similar Documents

Publication Publication Date Title
WO2005096274A1 (en) An enhanced audio encoding/decoding device and method
JP6518361B2 (en) Audio / voice coding method and audio / voice coder
JP5539203B2 (en) Improved transform coding of speech and audio signals
JP4950210B2 (en) Audio compression
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
US8290782B2 (en) Compression of audio scale-factors by two-dimensional transformation
CA2608030C (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
US6253165B1 (en) System and method for modeling probability distribution functions of transform coefficients of encoded signal
EP1914724B1 (en) Dual-transform coding of audio signals
KR101220621B1 (en) Encoder and encoding method
JP4081447B2 (en) Apparatus and method for encoding time-discrete audio signal and apparatus and method for decoding encoded audio data
US9037454B2 (en) Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT)
WO2006003891A1 (en) Audio signal decoding device and audio signal encoding device
EP1873753A1 (en) Enhanced audio encoding/decoding device and method
CN103366750B (en) A kind of sound codec devices and methods therefor
US9230551B2 (en) Audio encoder or decoder apparatus
JP5629319B2 (en) Apparatus and method for efficiently encoding quantization parameter of spectral coefficient coding
CN114365218A (en) Determination of spatial audio parametric coding and associated decoding
IL305626B1 (en) Backward-compatible integration of harmonic transposer for high frequency reconstruction of audio signals
WO2005096508A1 (en) Enhanced audio encoding and decoding equipment, method thereof
WO2009068085A1 (en) An encoder
WO2018142018A1 (en) Stereo audio signal encoder
RU2409874C9 (en) Audio signal compression
US20100280830A1 (en) Decoder
WO2006056100A1 (en) Coding/decoding method and device utilizing intra-channel signal redundancy

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC

122 Ep: pct application non-entry in european phase

Ref document number: 04762168

Country of ref document: EP

Kind code of ref document: A1