WO2007028280A1 - Encoder and decoder for pre-echo control and method thereof - Google Patents

Encoder and decoder for pre-echo control and method thereof Download PDF

Info

Publication number
WO2007028280A1
WO2007028280A1 PCT/CN2005/001435 CN2005001435W WO2007028280A1 WO 2007028280 A1 WO2007028280 A1 WO 2007028280A1 CN 2005001435 W CN2005001435 W CN 2005001435W WO 2007028280 A1 WO2007028280 A1 WO 2007028280A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
window function
signal
echo
analysis
Prior art date
Application number
PCT/CN2005/001435
Other languages
French (fr)
Chinese (zh)
Inventor
Lei Wang
Xingde Pan
Original Assignee
Beijing E-World Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing E-World Technology Co., Ltd. filed Critical Beijing E-World Technology Co., Ltd.
Priority to CN200580051158.0A priority Critical patent/CN101228574A/en
Priority to PCT/CN2005/001435 priority patent/WO2007028280A1/en
Publication of WO2007028280A1 publication Critical patent/WO2007028280A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • G10L19/025Detection of transients or attacks for time/frequency resolution switching

Definitions

  • the present invention relates to an apparatus and method for encoding and decoding pre-echo, and more particularly to an audio encoding and decoding apparatus for controlling pre-echo using a modified window function method and method.
  • a perceptual encoder is used to encode and compress audio information.
  • a conventional perceptual encoder generally has a psychoacoustic module, and the psychoacoustic module functions to analyze "unrelated components" in an audio signal. After the "unrelated component”, the quantification module is used to process these "unrelated components", so that the audio signal reaches "perceived transparency", that is, it has no influence on the human feeling or the influence is within an acceptable range.
  • the psychoacoustic module analyzes "unrelated components”, it mainly uses the masking phenomenon of the human ear. The so-called “masking phenomenon”, as shown in Fig.
  • Masking is further divided into s imul taneous masking, pre-masking, and pos t-masking. Among them, forward masking 2 and backward masking 3 are expressed in the time domain, so there is an additional requirement for the time domain characteristics of the perceptual encoder, that is, to achieve transparent and transparent coding quality, the quantization noise must also have a time domain. The associated masking threshold. But this requirement is not easy to implement for an actual perceptual encoder.
  • the block time conversion method is used to transform the audio time domain signal into the frequency domain, and then the quantization error caused by the quantization and coding of the transformed spectral coefficient is reconstructed by the synthesis filter. Diffusion occurs in the time domain.
  • filter designs such as a modified discrete cosine transform (MDCT) filter with a window length of 2048 sample points, the signal with a sampling frequency of 48000 Hz is transformed, and after being reconstructed by the synthesis filter, the quantization is caused.
  • the diffusion of the error is about 42. 7ms. If the stronger energy of the signal in the analysis window is mainly concentrated in a very d, part, then the quantization noise will spread until the signal appears.
  • MDCT modified discrete cosine transform
  • the quantization noise is even higher than the energy level of the original signal. This is called the "pre-echo" phenomenon, as shown in Figure 2 and Figure 3.
  • 2 is an uncoded audio signal time domain graph
  • Fig. 3 is a time domain graph of the encoded reconstructed audio signal. The portion circled by an ellipse in Fig. 3 It is the pre-echo 5, according to the characteristics of the human ear, if the coding noise lasts for a short time before the signal break point, the forward echo can be masked by forward masking, otherwise the coding noise will be perceived by the human ear.
  • the time domain characteristics of the quantization noise should be considered when designing the encoder to ensure that the time domain masking condition is satisfied, and the pre-echo phenomenon is always a fast variable type signal (such as a castanets signal).
  • a major difficulty in code rate is always a fast variable type signal (such as a castanets signal).
  • the prior art includes the following: Specific bit cell control technology: The filter coefficient is covered by the filter group in the fast variable segment window, and the coding precision is increased. This greatly increases the number of bits required for fast variable frame coding. This method cannot be used for fixed rate encoders. In the MPEG-1 standard, the bit pool method is used to use the bits left by the previous frame when the bits require a peak, thereby maintaining an average constant code rate. In reality, however, if you encounter a very fast-changing signal, you need a huge bit pool that can't be encoded.
  • Adaptive sensing is used in many perceptual encoders. This method can adaptively adjust the size of the filter bank window according to the characteristics of the input signal; the steady-state part or the slow-changing part adopts a long-time window, and the fast-changing signal part adopts a short-time window for encoding.
  • This approach increases the amount of encoder computation and complicates the encoder structure. Since different window lengths require different interpretations and normalizations of psychoacoustic models, as well as different frequency bands and noise-free coding structures, window switching significantly increases the complexity of the encoder structure. In addition, when using overlapping additive structure filter banks, window switching decisions require additional buffering and delay of the encoder, resulting in greater end-to-end delay. Finally, although the long and short windows have better time-frequency local characteristics, the start and end windows introduce larger inefficient coding.
  • Filter bank switching techniques are techniques that control the pre-echo using different filter bank modes. Specifically, in the slow-change signal type, a cosine-modulated filter bank with a high frequency resolution is used; in a fast-changing signal type, a wavelet filter bank is used. When the two filter bank modes are switched to each other, it is difficult to ensure complete reconstruction of the transition block.
  • Time domain noise shaping (TNS) technology Time domain noise shaping technology is to judge the signal type after the signal is transformed into the frequency domain coefficient by the filter bank. If it is a fast variable type signal, the frequency domain coefficient is not directly quantized. Instead, the frequency domain coefficients are first linearly predicted and then the residual sequence is quantized. The use of TNS technology will increase the amount of sideband information, affecting the overall coding efficiency.
  • the encoder includes: The psychoacoustic analysis module 201, the window function module 202, the time-frequency mapping module 203, the quantization and entropy encoding module 204, and the code stream multiplexing module 205.
  • the psychoacoustic analysis module 201 is configured to calculate the perceptual entropy and the masking threshold of the input audio signal, and determine whether the audio signal frame signal type is a fast variable type signal or a slowly varying type signal according to the perceptual entropy.
  • the length of the analysis window function of the window function module 202 is determined according to the signal type output by the psychoacoustic analysis module 201. Specifically, if the frame signal is a fast-changing signal type, in order to prevent the pre-echo a window with a 256 sample point length with a higher temporal resolution and a lower frequency resolution; if the frame signal is a slowly varying signal type, to ensure encoding efficiency, a 2048 with a lower temporal resolution and a higher frequency resolution is used. The window of the sample point length.
  • the time-frequency mapping module 203 is configured to convert the time domain audio signal into frequency domain coefficients and output to the quantization and entropy coding module 204; the quantization and entropy coding module 204 controls the masking threshold output by the psychoacoustic analysis module 201, The domain coefficients are quantized and entropy encoded and output to the code stream multiplexing module 205; the code stream multiplexing module 205 is configured to multiplex the received data to form an audio coded code stream.
  • the window function module 202 uses windows of different length sample points, so that the structural complexity of the entire encoder becomes higher.
  • variance estimation is performed on a frame signal to obtain a standard deviation, where ⁇ .
  • a DCT transform is performed on the input sequence to obtain V, and is quantized to obtain V ⁇ .
  • the quantized sequence is transmitted to the decoding end, inverse DCT transform is performed to obtain V ⁇ , and the last sequence is multiplied to obtain the reconstructed speech signal ⁇ . If this method is directly applied to audio coding, the pre-echo problem that occurs in fast-changing signal frames is still powerless.
  • the transform-based speech coding method cannot change the characteristics of the frame signal 6 after dividing by the quantized standard deviation, that is, the frame signal is still non-stationary, as shown in FIG. 5, for applying the transform-based speech coding.
  • Method of audio signal time domain graphics If you improve it, estimate a standard deviation ⁇ ⁇ before the fast change point. A standard deviation of 2 is estimated after the fast change point, and quantized as and as shown in Fig. 6, which is an audio signal time domain pattern of the improved transform-based speech coding method. In this way, first of all, fast change
  • the improved method is for speech coding. For audio coding, this improved method is difficult to eliminate fast effects, and the coding efficiency is very low.
  • a primary object of the present invention is to provide an encoding apparatus for controlling pre-echo, which has a simple structure and can effectively control a pre-echo phenomenon during audio encoding.
  • Another object of the present invention is to provide an encoding method for controlling pre-echo, which can effectively control the pre-echo phenomenon during audio encoding.
  • the present invention provides an encoding apparatus for controlling pre-echo, which includes: a signal type analyzing module for judging a signal type of an input audio signal frame, and outputting a fast change point position and a quantized mutation intensity Parameter
  • a correction window function module coupled to the signal type analysis module, configured to modify the analysis window function and window-process the input audio signal frame, and output the windowed time domain audio signal; a time-frequency mapping module, And the correction window function module is configured to convert the windowed time domain audio signal into a frequency domain coefficient; a psychoacoustic analysis module, configured to perform psychoacoustic processing on the input audio signal frame, and a masking threshold parameter of the output scale factor band; a quantization and entropy coding module, which is respectively connected to the time-frequency mapping module and the psychoacoustic analysis module, and configured to output the time-frequency mapping module according to the masking threshold parameter output by the psychoacoustic analysis module
  • the frequency domain coefficients are quantized and entropy encoded, and the encoded code stream is output;
  • a code stream multiplexing module coupled to the quantization and entropy coding module and the signal type analysis module, configured to output the coded code stream and the signal type analysis module output by the quantization and entropy coding module Multiplexing is performed and an audio coded stream is formed.
  • the present invention provides an encoding method for controlling pre-echo, which includes the following steps:
  • Step 1 the signal type analysis module determines whether the signal type of the input audio signal frame is a fast change type signal, and the signal type analysis module calculates a parameter of the fast change point position and a sudden intensity parameter of the audio signal frame, and The mutation intensity parameter is quantized to obtain a quantized value of the mutation intensity, and then step 2 is performed; otherwise, the correction window function module uses the original analysis window function to window the audio signal frame to obtain a windowed time domain signal. , then perform step 4;
  • Step 2 The correction window function module linearly transforms the analysis window function to obtain a modified analysis window function
  • Step 3 The correction window function module adds a window to the audio signal frame by using a modified analysis window function to obtain a time domain signal after windowing;
  • Step 4 The time-frequency mapping module performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
  • Step 5 The quantization and entropy coding module quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module, to obtain the encoded audio code stream. ;
  • Step 6 The code stream multiplexing module multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
  • the present invention provides a decoding apparatus for controlling pre-echo, which includes:
  • a code stream demultiplexing module configured to demultiplex the compressed audio code stream
  • An inverse quantization and entropy decoding module is connected to the code stream demultiplexing module, configured to decode and inverse quantize the demultiplexed audio code stream, and output inverse quantized frequency domain coefficients; a frequency time mapping module, coupled to the inverse quantization and entropy decoding module, configured to transform the inverse quantized frequency domain coefficients into a time domain signal;
  • a correction window function module is coupled to the frequency time mapping module for modifying the integrated window function and windowing the time domain signal.
  • the present invention provides a decoding method for controlling pre-echo, which includes the following steps:
  • Step 1 The code stream demultiplexing module demultiplexes the input compressed audio code stream to obtain the demultiplexed audio code stream and side information.
  • Step 2 The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients;
  • Step 3 The frequency time mapping module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal.
  • Step 4 The correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, step 5 is performed; otherwise, step 6 is performed;
  • Step 5 The correction window function module linearly transforms the integrated window function to obtain a modified integrated window function, and then uses the modified integrated window function to window the time domain signal to obtain the reconstructed audio signal;
  • Step 6 The correction window function module uses the original integrated window function to window the time domain signal to obtain a reconstructed audio signal.
  • the present invention has the following advantages: Since the window function employs a fixed window length, the structure of the encoding device is compressed, and the pre-echo phenomenon during audio encoding is controlled while the complete reconstruction of the audio signal is ensured.
  • Figure 1 is a masking characteristic diagram of the human ear.
  • Figure 2 is an uncoded audio signal time domain graph.
  • Figure 3 is a time domain graph of the encoded audio signal after reconstruction.
  • FIG. 4 is a structural block diagram of a prior art audio encoding device.
  • FIG. 5 is an audio signal time domain graph to which a transform-based speech encoding method is applied.
  • FIG. 6 is an audio signal time domain graph to which an improved transform-based speech encoding method is applied.
  • Fig. 7 is a block diagram showing the configuration of a first embodiment of an apparatus for controlling pre-echo of the present invention.
  • Fig. 8 is a flow chart showing the first embodiment of the encoding method of the pre-control echo according to the present invention.
  • Figure 9 is a schematic illustration of the original analysis window and the original synthesis window of the encoding and decoding method for controlling the pre-echo of the present invention.
  • Figure 10 is a schematic illustration of a modified analysis window of the encoding method for controlling pre-echo in accordance with the present invention.
  • Figure 11 is a schematic illustration of a modified integrated window of the decoding method for controlling the pre-echo of the present invention.
  • Fig. 12 is a view showing the window function correction in accordance with the full reconstruction condition in the encoding method of the pre-control echo of the present invention.
  • Figure 13 is a diagram showing the modified analysis window function of the transition block of the encoding method of the pre-echo control of the present invention.
  • Figure 14 is a diagram showing the modified integrated window function of the transition block of the decoding method for controlling the pre-echo of the present invention.
  • Fig. 15 is a block diagram showing the configuration of a second embodiment of the apparatus for controlling pre-echo of the present invention.
  • Fig. 16 is a flow chart showing the second embodiment of the encoding method of the pre-control echo according to the present invention.
  • Figure 17 is a block diagram showing the structure of a first embodiment of a decoding apparatus for controlling pre-echo of the present invention.
  • Figure 18 is a flow chart showing Embodiment 1 of the decoding method of the pre-control echo according to the present invention.
  • Fig. 19 is a block diagram showing the configuration of a second embodiment of the decoding apparatus for controlling the pre-echo of the present invention.
  • FIG. 20 is a flow chart showing Embodiment 2 of the decoding method of the pre-control echo according to the present invention. detailed description
  • the present invention utilizes a modified window function (MWF) to control the pre-echo signal appearing in the audio coding, thereby realizing the pre-echo phenomenon when controlling the audio coding while ensuring complete reconstruction of the audio signal.
  • MPF modified window function
  • FIG. 7 is a structural block diagram of Embodiment 1 of an encoding apparatus for controlling pre-echo of the present invention.
  • the device is composed of the following functional modules: a signal type analysis module 301, configured to determine a signal type of the input audio signal frame, and output a fast change point position and a quantized abrupt intensity parameter, wherein the signal type analysis module 301 includes a signal.
  • a type analyzer configured to determine that the input audio frame signal is The slowly varying type signal is also a fast variable type signal; a fast change point positioner is connected, coupled to the signal type analyzer for calculating a position of the fast change point; a mutation strength calculator, connected to the signal type analyzer a mutation intensity for calculating a signal; a mutation intensity quantizer coupled to the mutation intensity calculator for quantizing the intensity of the mutation of the calculated signal; a correction window function module 302, and the signal type analysis module a 301 connection, configured to modify an analysis window function and window-process the input audio signal frame, and output a windowed time domain audio signal, thereby improving a time resolution of encoding the fast-changing signal; and a time-frequency mapping module 304, connected to the modified window function module 302, configured to convert the windowed time domain audio signal into a frequency domain coefficient; a psychoacoustic analysis module 303, configured to perform psychoacoustic processing on the input audio signal frame And outputting a masking threshold parameter of the scale factor band;
  • the time-frequency mapping module 304 is composed of a filter bank, which may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, a modified discrete cosine transform (MDCT) filter bank, and a cosine modulation filter. Group and so on.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT modified discrete cosine transform
  • the window length in the analysis window function is equal to the audio signal frame length, and
  • the window function can select Hanning window, Hamming window, Blacknan window; when using modified discrete cosine transform (MDCT) filter bank, the window length in the analysis window function is the audio signal frame. It is twice as long, and the window function can select any window function that conforms to the condition of the modified discrete cosine transform.
  • the quantizer is composed of a set of sub-quantizers, each of which quantizes the frequency domain coefficients of the local region according to the masking threshold of the specific time-frequency region output by the psychoacoustic analysis module 303, usually This area is called the scale factor band.
  • the quantizer can employ a scalar quantizer and a vector quantizer, such as a Moving Picture Experts Group Advanced Audio Coding (MPEG AAC) nonlinear scalar quantizer, and a Moving Picture Experts Group Dual (MPEG TwinVQ) vector quantizer.
  • MPEG AAC Moving Picture Experts Group Advanced Audio Coding
  • MPEG TwinVQ Moving Picture Experts Group Dual
  • Step 21 the signal type analysis module 301 determines whether the signal type of the input audio signal frame is a fast change type signal, if yes, go to step 22, otherwise go to step 25;
  • Step 22 The signal type analysis module 301 calculates a parameter of the fast change point position and a mutation intensity parameter of the audio signal frame, and quantizes the mutation strength parameter to obtain a quantized value of the mutation intensity;
  • Step 23 The correction window function module 302 performs an equal scaling reduction on the function value of the fast change point position of the analysis window function, and the reduced value is equal to the quantization value of the mutation intensity, and the modified analysis window function is obtained;
  • Step 24 the correction window function module 302 uses the modified analysis window function to window the audio signal frame to obtain a windowed time domain signal, and step 26 is performed;
  • Step 25 the correction window function module 302 uses the original analysis window function to window the audio signal frame to obtain a windowed time domain signal, and step 26 is performed;
  • Step 26 The time-frequency mapping module 304 performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
  • Step 27 The quantization and entropy coding module 305 quantizes and entropy encodes the frequency domain coefficients according to the masking threshold parameter of the scale factor band obtained by the psychoacoustic module 303 performing psychoacoustic processing on the audio signal frame, to obtain the encoded Audio stream
  • Step 28 The code stream multiplexing module 306 multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
  • step 21 of the above decoding method while the signal type analysis module 301 determines the signal type of the input audio signal frame, the psychoacoustic module 303 performs psychoacoustic processing on the audio signal frame to obtain masking of the scale factor band. Threshold parameter.
  • the psychoacoustic processing is a masking curve for calculating a current frame signal according to a human ear hearing characteristic, and a masking threshold of a specific time-frequency region can be calculated according to the masking curve for guiding quantization of a current audio frame signal, where the psychoacoustic model can be The first or second type of psychoacoustic model used by MPEG AAC.
  • the signal type analysis module 301 performs front and back masking effects based on the adaptive threshold and the waveform prediction to perform signal type determination on the frame signal.
  • the specific steps are: decomposing the input frame into multiple subframes, and searching for PCM on each subframe.
  • the local maximum point of the absolute value of the data the absolute peak value of the sub-frame is selected in the local maximum point of each sub-frame; for a certain sub-frame absolute peak, a plurality of (typically 3) sub-frames in front of the sub-frame are used absolutely
  • the peak sample predicts a typical sample value of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculates a difference and a ratio of the absolute peak of the subframe to the predicted typical sample value; If the ratio and the ratio are greater than the set threshold, it is determined that the sub-frame has a sudden signal, and the sub-frame has a local maximum peak point with a backward masking pre-echo capability, if the front end of the sub-
  • the frame signal belongs to the fast-changing type signal, and the sub-frame with the sudden signal is used as the position of the fast change point, and the sub-frame of the sudden signal will be present.
  • the ratio of the absolute peak to the largest absolute peak in all the sub-frames before the sub-frame is used as the intensity of the mutation, and the intensity of the mutation is quantified.
  • the quantization method may be rounding up, down-and-rounding, rounding, etc.; If the ratio is not greater than the set value, the above steps are repeated until it is determined that the frame signal is a fast-changing type signal or reaches the last subframe, and if the last subframe is reached, the frame signal is not determined to be a fast-changing type signal. , the frame signal belongs to a slowly varying type signal.
  • time-frequency transform of time-domain audio signals into time-frequency audio signals such as discrete Fourier transform (DFT), discrete cosine transform (DCT), cosine-modulated filter bank, and modified discrete cosine transform.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • DCT cosine-modulated filter bank
  • MDCT wavelet transform
  • the window length in the analysis window function is equal to the length of the audio signal frame, and the window function can select Hanning window, Hamming (Ha ⁇ ing) ) window, Blackmail window; when using modified discrete cosine transform (MDCT), the window length in the analysis window function is twice the frame length of the audio signal, and the window function can select any condition that matches the modified discrete cosine transform. Window function.
  • the analysis window function uses a fixed length window function, the length being an integer greater than 1, preferably 2 to the power of N, where N is a natural number. For the selection of windows, see “Discrete Time Signal Processing ( ⁇ 2)", Xi'an Jiaotong University Press, A.
  • the time domain signals of the M samples of the previous frame and the M samples of the current frame are selected, and then the time domain signals of 2M samples of the two frames are windowed by the module 302.
  • the window of the analysis window is twice as long as the frame length, and then the framed signal is subjected to MDCT transformation by the time-frequency mapping module 304 to obtain M frequency domain coefficients.
  • the impulse response of the MDCT analysis filter is:
  • MDCT transforms into: " 0, 0 ⁇ k ⁇ Ml, where: w (n) is a window function; x(n) is the input time domain audio signal of MDCT transform; X(k) is the output frequency domain of MDCT transform
  • w (n) is a window function
  • x(n) is the input time domain audio signal of MDCT transform
  • X(k) is the output frequency domain of MDCT transform
  • Sine window, KBD window, etc. can be selected as the window function.
  • the Sine window is taken as an example to illustrate how to modify it in the module 302 to achieve the purpose of controlling the pre-echo. It should be noted that the present invention is not limited to the Sine window and the KBD window, and any window function that satisfies the MDCT transformation condition can be used to correct it, and finally achieve the purpose of controlling the pre-echo.
  • Fig. 9 is a schematic diagram of the original analysis window function of the encoding and decoding method for controlling the pre-echo of the present invention. If the frame signal 9 is a fast-changing type signal, the original analysis window function is corrected. The correction processing is: performing a proportional reduction on the value of the window function after the fast change point, and the reduced value is equal to the magnitude of the mutation intensity after the quantization.
  • the modified analysis window function is shown in Fig. 10, in which the fast change point position of Fig. 10 is the 1280 sample point 10, and the quantized mutation intensity is 5.
  • the integrated window at the time of decoding must also be corrected.
  • the correction processing is: equalizing the value of the window function after the fast change point.
  • the value of the amplification is equal to the intensity of the mutation after quantification.
  • the modified integrated window function is shown in Fig. 11.
  • the fast change point position of Fig. 11 is the 1280 sample point 11, and the quantized mutation intensity is 5.
  • FIG. 12 is a schematic diagram showing the window function correction in accordance with the full reconstruction condition in the encoding method of the pre-control echo according to the present invention.
  • the signal of four consecutive frames is shown in Fig. 12, and it has been judged that the i-th frame is a fast-changing frame, the position of the fast change point 12 is the M+L point, and the mutation intensity is scs, where scs is a real number.
  • the MDCT transform is performed on the i-th frame, the original analysis window ⁇ ) and the original synthesis window are corrected as described above to obtain the modified analysis window » and the modified integrated window ⁇ ( «), then:
  • step 23 the function value after the fast change point position of the pair analysis window function is performed, etc.
  • a transition block can be added near the fast change point to slowly change the value of the analysis window function near the fast change point.
  • the window function method of adding the transition block is: Assume that the window function is 0 ⁇ " ⁇ 2M - the quantized intensity of the mutation is ⁇ The fast change point position is L. If no transition block is added, the modified window function is:
  • FIG. 13 is a schematic diagram of the modified analysis window function of the transition block of the encoding method of the pre-echo control of the present invention.
  • FIG. 14 is a schematic diagram of the modified integrated window function of the transition block of the pre-control echo decoding method of the present invention, the transition block 13 of FIG. 13 and the transition block 13 of FIG.
  • the length of the transition block 14 is 64 sample points.
  • the quantization and entropy coding includes two steps of nonlinear quantization and entropy coding, wherein the quantization process may employ scalar quantization or vector quantization.
  • the scalar quantization method can employ a nonlinear scalar used by MPEG.
  • AAC which can use vector quantization of MPEG TwinVQ.
  • the quantization process may also employ an audio coding method based on minimizing global noise masking ratio criteria and entropy coding (patent application number 03146213. 8). After the quantization process, the entropy coding technique is used to further remove the quantized coefficients and the statistical redundancy of the side information, and finally the compressed audio code stream is obtained.
  • Fig. 15 is a block diagram showing the structure of a second embodiment of the encoding apparatus for controlling the pre-echo of the present invention.
  • a sub-band analysis module 307 is added, which is connected to the signal type analysis module 301 for performing the input audio signal.
  • the subband analysis can modify the window function differently according to the difference of the intensity of the mutation and the nature of the signal for each frequency segment. For example, the general low frequency band does not generate the pre-echo phenomenon, then the window function of the low frequency band can be omitted. Corrected, thus more flexible control of window function correction for different frequency segments.
  • FIG. 16 is a flowchart of Embodiment 2 of the encoding method for controlling the pre-echo according to the present invention.
  • the steps are as follows: Step 40: Subband analysis
  • the module 307 performs subband analysis on the input audio signal frame, and the subband analysis is segmented according to the frequency, and the audio signal frame is divided into multiple subband audio signals; Step 41, the signal type analysis module 301 respectively determines the multiple subband audio signals.
  • step 42 the signal type analyzing module 301 calculates parameters of the fast change point position and the described manner for the multiple subband audio signal frames respectively.
  • the multi-path subband has a mutation intensity parameter of the audio signal frame, and respectively quantizes the mutation intensity parameter of the multi-channel sub-band audio signal frame to obtain a quantized value of the mutation strength of the multi-channel sub-band audio signal frame; step 43, the correction window function module 3 02 And respectively reducing the function value after the fast change point position of the analysis window function by an equal ratio, and the reduced value is equal to the Mutant intensity values quantized audio signal frame with the road, to give 'analysis window function after positive; step 44, the correction window function module 302 using the modified analysis window function on each of the multi-way audio band signal frame Windowing, obtaining a multi-path sub-band time domain signal after windowing, and performing step 46; Step 45, the correction window function module 30 2 windowing the multi-channel sub-band audio signal frame with an original analysis window function, Obtaining the windowed time domain signal, and performing step 46; Step 46, The time-frequency mapping module 304 performs time-frequency mapping processing on the windowed multi-band sub-band time domain signal
  • Step 47 the quantization and entropy coding module 305 integrates the multi-path sub-band frequency domain coefficients;
  • the quantization and chirp encoding module 305 quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module 303, to obtain an encoded audio code stream;
  • Step 49 The code stream multiplexing module 306 multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
  • step 41 of the above decoding method while the signal type analysis module 301 determines the signal type of the multiplex subband audio signal frame, the psychoacoustic module 303 performs psychoacoustic processing on the input audio signal frame to obtain a scale factor. Masked threshold parameter with band.
  • the psychoacoustic processing is a masking curve for calculating a current frame signal according to a human ear hearing characteristic, and a masking threshold value of a specific time-frequency region can be calculated according to the masking curve for guiding quantization of a current audio frame signal, where the psychoacoustic model can be Is the first or second type of psychoacoustic model used by MPEG AAC.
  • the second embodiment of the encoding method of the present invention has the advantage that the window function can be modified differently according to the difference of the mutation strength and the signal property of each frequency segment. For example, if the low frequency band does not generate the pre-echo phenomenon, then the fault may be incorrect.
  • the window function of the low frequency band is modified to more flexibly control the window function correction of different frequency segments.
  • FIG. 17 is a structural block diagram of Embodiment 1 of a decoding apparatus for controlling pre-echo of the present invention.
  • the device is composed of the following functional modules: a code stream demultiplexing module 401 for demultiplexing the compressed audio code stream; an inverse quantization and entropy decoding module 40 2 , connected to the code stream demultiplexing module 401, Decoding and dequantizing the demultiplexed audio code stream, and outputting the inverse quantized frequency domain coefficients; a frequency time mapping module 403, coupled to the inverse quantization and decoding module 402, for The inverse-quantized frequency domain coefficients are transformed into a time domain signal, and the frequency-time mapping module 403 is composed of a filter bank, the filter bank is an inverse transform filter bank corresponding to the encoding device; the modified window function module 404, and The frequency time mapping module 403 is connected to modify the integrated window function and perform windowing processing on the time domain signal.
  • FIG. 18 is a flowchart of Embodiment 1 of a method for decoding a pre-control echo according to the present invention.
  • the steps are as follows: Step 31: A code stream demultiplexing module demultiplexes an input compressed audio code stream to obtain a demultiplexed Audio code stream and side information; Step 32: The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inverse quantized frequency domain coefficients; Step 33, frequency time mapping The radio module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal;
  • the correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, go to step 35, otherwise go to step 37; Step 35, modify the window function module pair The function value after the fast change point position of the integrated window function is amplified in equal proportion, and the amplified value is equal to the quantized value of the mutation intensity, and the modified integrated window function is obtained; Step 36, the modified window function module is used The modified integrated window function windowes the time domain signal to obtain a reconstructed audio signal; and the modified window function module adds a window to the time domain signal by using an original integrated window function. The reconstructed audio signal is obtained.
  • the method of performing frequency-frequency mapping and correcting window function processing on the frequency domain coefficients corresponds to the time-frequency mapping and the modified window function processing method in the encoding method, and is based on encoding control in the compressed audio code stream. Information is used to select the corresponding inverse mapping and window function.
  • the frequency-time mapping processing can be implemented by inverse discrete cosine transform (IDCT), inverse discrete Fourier transform, inverse modified discrete cosine transform (IMDCT).
  • the following is an example of the inverse time cosine transform IMDCT to illustrate the frequency time mapping process. Since the frequency-time mapping and windowing processing are indivisible, the frequency-time mapping and the correction window function are considered together here.
  • IMDCT transform is performed on the inverse quantized word to obtain the transformed time domain signal x '. IMDCT change
  • the time domain signal obtained by the DCT transform is windowed in the time domain.
  • the above windowed time domain The signal is superimposed to obtain a time domain audio signal.
  • the first ⁇ /2 samples of the signal obtained after the windowing operation are overlapped with the ⁇ /2 samples of the previous frame signal to obtain /2 outputs.
  • Time domain audio samples, ⁇ timeS ⁇ i, n p re Sa mi , n + preSam ⁇ , where i represents the frame number, n represents the sample number, there are 2 , and the length is ⁇ .
  • the integrated window function correction has been described above.
  • the correction processing described in the original integrated window function is: scaling the window function value after the fast change point, the amplified value is equal to
  • the modified comprehensive window function is shown in Fig. 11. Pair and edit
  • the comprehensive window corresponding to the analysis window used in the code can still satisfy the complete reconstruction condition after the above correction, which has been proved in the foregoing.
  • the fast change in the pair of integrated window functions corresponds to the correction process at the time of encoding.
  • a transition block can be added near the fast change point to slowly change the value of the analysis window function near the fast change point.
  • the window function method of adding the transition block is: Assume that the window function is "where 0 ⁇ " ⁇ 2 ⁇ - 1 , and the quantized mutation intensity is the fast change point position is L. If no transition block is added, the corrected window function is :
  • the corrected window function is:
  • w' n) w(n) /g(n) L -I ⁇ n ⁇ L + I-l
  • Fig. 14 is a schematic diagram of the modified integrated window function of the transition block of the pre-control echo decoding method of the present invention, and the transition block 14 in Fig. 14 has a length of 64 Sample points. It has been proved that as long as the linear transformation of the original window does not change the complete reconstruction characteristics of the transformation, we can also perform arbitrary linear transformation on the analysis window function or the synthesis window according to the signal type.
  • the window function of the transition section or the arbitrary linear transformation of the window function may be directed to the analysis window or the synthesis window used in the coding and decoding methods in all of the above embodiments.
  • Fig. 19 is a block diagram showing the configuration of the first embodiment of the decoding apparatus for controlling the pre-echo of the present invention.
  • a sub-band synthesis module 405 is added, which is connected to the correction window function module 404 for multiplexing the multi-channel reconstruction.
  • Subband time domain signals are used for subband synthesis.
  • FIG. 20 is a flowchart of Embodiment 2 of the method for decoding the pre-echo of the present invention.
  • Step 51 Code stream The demultiplexing module demultiplexes the input compressed audio code stream to obtain the demultiplexed audio stream and side information
  • Step 52 inverse quantization and entropy decoding module for demultiplexed audio
  • the code stream is divided into multiple channels according to the frequency
  • Step 53 The inverse quantization and entropy decoding modules respectively perform inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients
  • Step 53 The frequency-time mapping module separately performs frequency-time mapping processing on the inverse-quantized frequency-domain coefficients to obtain a multi-channel time domain signal.
  • Step 54 The correction window function module respectively determines the side information according to the demultiplexed Whether the signal type of the multi-channel audio signal frame is fast-changing type, if yes, step 55 is performed; otherwise, step 57 is performed; step 55, the correction window function module respectively performs the fast-changing point position of the multi-channel integrated window function.
  • the function value is scaled up, the amplified value is equal to the quantized value of the mutation intensity, and the modified integrated window function is obtained;
  • Step 56 the modified window function module uses the modified integrated window function to respectively The time domain signal is windowed to obtain a multiplexed reconstructed audio signal, and then step 58 is performed;
  • Step 57 the modified window function module separately uses the original integrated window function Domain signal windowed multi-channel, audio signal to obtain reconstructed multi channel, then step 58;
  • step 58 the sub-band audio signal synthesizing module reconstructed multi-channel are synthesized.

Abstract

The invention discloses an encoder for pre-echo control including a signal type analyzing module, modified window function module, time-domain to frequency-domain transformation module, quantization and entropy coding module and code stream multiplexing module,which are connected in sequence, and a psychoacoustic analyzing module which is connected to the quantization and entropy coding module. The invention also discloses an encoding method for pre-echo control, including the steps of : 1.determing whether the signal type of the input audio frame is a transient frame, if it is yes, then the window function value is linearly modified, and then the audio frame is multiplied by the modified window; 2.performing time-domain to frequency-domain transformation for the audio frame being multiplied by the window and quantizing and entropy encoding the frequency-domain coefficients, and then multiplexing the coded audio stream and the result of signal type analyzing. The invention further discloses a decoder for pre-echo control including a demultiplexing module, a dequantization module and entropy decoding module, a frequency-domain to time-domain transformation module and a modified window function module, which are connected in sequence. The invention also discloses a decoding method for pre-echo control.

Description

一种控制前回声的编码和解码装置及方法 技术领域 本发明涉及一种控制前回声的编码和解码装置及方法, 尤其是一种利用 修正窗函数方法来控制前回声的音频编码和解码装置及方法。  FIELD OF THE INVENTION The present invention relates to an apparatus and method for encoding and decoding pre-echo, and more particularly to an audio encoding and decoding apparatus for controlling pre-echo using a modified window function method and method.
背景技术 一般釆用感知编码器对音频信息进行编码压缩, 其中传统的感知编码器 中通常有一个心理声学模块,该心理声学模块的作用是分析音频信号中的 "不 相关成分" , 在获得这些 "不相关成分"后, 再通过量化模块去处理这些 "不 相关成分" , 而使音频信号达到 "感知透明" , 即对人的感觉没有影响或影 响在可接受的范围内。 在心理声学模块分析 "不相关成分" 时, 主要利用人 耳所具有的掩蔽现象。 所谓 "掩蔽现象" , 如图 1所示, 就是在一个声音存 在的情况下, 另一个声音在人耳中不能被感知的现象, 这种声音就是遮蔽信 号 3。 掩蔽又分为同时掩蔽 1 (s imul taneous masking) 、 向前掩蔽 2 (pre- masking)和向后掩蔽 4 (pos t-masking)。其中向前掩蔽 2和向后掩蔽 3 是表现在时域上的, 因此对感知编码器时域特性有额外的要求, 即要做到感 知透明的编码质量, 量化噪声也必须具有一个与时域相关的掩蔽阈值。 但这 个要求对实际的感知编码器来说并不容易实现。 由时频的测不准原理可知: 用块变换方法将音频时域信号变换到频域, 然后对变换后的谱系数做量化和 编码引起的量化误差, 在用合成滤波器重构后, 会在时域发生扩散。 对常用 的滤波器设计, 如用窗长为 2048个样本点的修正离散余弦变换(简称 MDCT ) 滤波器对采样频率为 48000赫兹的信号做变换, 在用合成滤波器重构后, 所 引起量化误差的扩散约为 42. 7ms。 如果在分析窗内信号较强的能量主要只集 中在很 d、一部分, 那么量化噪声就会扩散到信号出现之前。 在极端情况下, 在某些时间段中,量化噪声甚至会高于原始信号的能量级,这就是所谓的 "前 回声 (pre- echo ),, 现象, 如图 2和图 3所示。 图 2是未编码的音频信号时 域图形, 图 3是编码重构后的音频信号时域图形。 图 3中用椭圆圈出的部分 就是前回声 5 , 根据人耳的特性, 如果编码噪声在信号突变点前持续时间较 短的话, 可以利用向前掩蔽将前回声掩蔽掉, 否则编码噪声会被人耳感知到。 为了避免这种现象, 设计编码器时就要考虑量化噪声的时域特性, 以保证满 足时域掩蔽条件, 而前回声现象一直是快变类型信号 (如响板信号)等无法 做到较低码率的一个主要困难。 BACKGROUND OF THE INVENTION Generally, a perceptual encoder is used to encode and compress audio information. A conventional perceptual encoder generally has a psychoacoustic module, and the psychoacoustic module functions to analyze "unrelated components" in an audio signal. After the "unrelated component", the quantification module is used to process these "unrelated components", so that the audio signal reaches "perceived transparency", that is, it has no influence on the human feeling or the influence is within an acceptable range. When the psychoacoustic module analyzes "unrelated components", it mainly uses the masking phenomenon of the human ear. The so-called "masking phenomenon", as shown in Fig. 1, is a phenomenon in which another sound cannot be perceived in the human ear in the presence of one sound, and this sound is the masking signal 3. Masking is further divided into s imul taneous masking, pre-masking, and pos t-masking. Among them, forward masking 2 and backward masking 3 are expressed in the time domain, so there is an additional requirement for the time domain characteristics of the perceptual encoder, that is, to achieve transparent and transparent coding quality, the quantization noise must also have a time domain. The associated masking threshold. But this requirement is not easy to implement for an actual perceptual encoder. It can be known from the uncertainty principle of time-frequency: the block time conversion method is used to transform the audio time domain signal into the frequency domain, and then the quantization error caused by the quantization and coding of the transformed spectral coefficient is reconstructed by the synthesis filter. Diffusion occurs in the time domain. For commonly used filter designs, such as a modified discrete cosine transform (MDCT) filter with a window length of 2048 sample points, the signal with a sampling frequency of 48000 Hz is transformed, and after being reconstructed by the synthesis filter, the quantization is caused. The diffusion of the error is about 42. 7ms. If the stronger energy of the signal in the analysis window is mainly concentrated in a very d, part, then the quantization noise will spread until the signal appears. In extreme cases, in some time periods, the quantization noise is even higher than the energy level of the original signal. This is called the "pre-echo" phenomenon, as shown in Figure 2 and Figure 3. 2 is an uncoded audio signal time domain graph, and Fig. 3 is a time domain graph of the encoded reconstructed audio signal. The portion circled by an ellipse in Fig. 3 It is the pre-echo 5, according to the characteristics of the human ear, if the coding noise lasts for a short time before the signal break point, the forward echo can be masked by forward masking, otherwise the coding noise will be perceived by the human ear. In order to avoid this phenomenon, the time domain characteristics of the quantization noise should be considered when designing the encoder to ensure that the time domain masking condition is satisfied, and the pre-echo phenomenon is always a fast variable type signal (such as a castanets signal). A major difficulty in code rate.
在编解码音频信号中, 为解决前回声现象, 现有技术包括以下几种: 比位池控制技术: 对滤波器组覆盖快变段窗内的谱系数, 增加编码精度。 这会极大的增加快变帧编码所需要的比特数, 这种方法不能用于固定码率编 码器。 在 MPEG- 1标准中, 采用比特池方法, 在比特需要峰值时使用前面的帧 留下的比特, 从而维持平均的恒定码率。 然而实际上, 如果遇到变化非常快 的信号, 需要极大的比特池而导致无法编码。  In the codec audio signal, in order to solve the pre-echo phenomenon, the prior art includes the following: Specific bit cell control technology: The filter coefficient is covered by the filter group in the fast variable segment window, and the coding precision is increased. This greatly increases the number of bits required for fast variable frame coding. This method cannot be used for fixed rate encoders. In the MPEG-1 standard, the bit pool method is used to use the bits left by the previous frame when the bits require a peak, thereby maintaining an average constant code rate. In reality, however, if you encounter a very fast-changing signal, you need a huge bit pool that can't be encoded.
自适应窗切换技术: 许多感知编码器中使用的是自适应窗切换技术。 这 种方法能根据输入信号的特性, 自适应的调整滤波器组窗的大小; 稳态部分 或緩变部分采用长时窗, 快变信号部分采用短时窗进行编码。 这种方法增加 了编码器计算量, 并且使得编码器结构复杂化。 由于不同的窗长需要心理声 学模型不同的解释和归一化, 以及不同的频带及无噪编码结构, 窗切换显著 的增加了编码器结构的复杂度。 此外, 在采用交叠相加结构滤波器组时, 窗 切换判断需要编码器额外的緩冲和延迟, 会导致更大的端对端的延迟。 最后, 虽然长窗和短窗具有较好的时频局部特性, 但是开始窗和结束窗却会引入较 大的低效编码。  Adaptive Window Switching Technology: Adaptive sensing is used in many perceptual encoders. This method can adaptively adjust the size of the filter bank window according to the characteristics of the input signal; the steady-state part or the slow-changing part adopts a long-time window, and the fast-changing signal part adopts a short-time window for encoding. This approach increases the amount of encoder computation and complicates the encoder structure. Since different window lengths require different interpretations and normalizations of psychoacoustic models, as well as different frequency bands and noise-free coding structures, window switching significantly increases the complexity of the encoder structure. In addition, when using overlapping additive structure filter banks, window switching decisions require additional buffering and delay of the encoder, resulting in greater end-to-end delay. Finally, although the long and short windows have better time-frequency local characteristics, the start and end windows introduce larger inefficient coding.
滤波器组切换技术: 滤波器组切换技术是利用不同的滤波器组模式而控 制前回声的技术。 具体的说, 在緩变信号类型, 用频率分辨率高的余弦调制 滤波器组; 在快变信号类型, 用小波滤波器组。 当两种滤波器组模式相互切 换时, 其过渡块很难保证完全重构。  Filter bank switching techniques: Filter bank switching techniques are techniques that control the pre-echo using different filter bank modes. Specifically, in the slow-change signal type, a cosine-modulated filter bank with a high frequency resolution is used; in a fast-changing signal type, a wavelet filter bank is used. When the two filter bank modes are switched to each other, it is difficult to ensure complete reconstruction of the transition block.
时域噪声整形(TNS )技术: 时域噪声整形技术就是当信号经过滤波器组 变换成频域系数后, 根据对信号类型的判断, 如果是快变类型信号, 不直接 对频域系数进行量化, 而是对频域系数先进行线性预测, 然后对残差序列进 行量化。 采用 TNS技术会增加较多的边带信息, 影响了整体编码效率。  Time domain noise shaping (TNS) technology: Time domain noise shaping technology is to judge the signal type after the signal is transformed into the frequency domain coefficient by the filter bank. If it is a fast variable type signal, the frequency domain coefficient is not directly quantized. Instead, the frequency domain coefficients are first linearly predicted and then the residual sequence is quantized. The use of TNS technology will increase the amount of sideband information, affecting the overall coding efficiency.
如图 4所示, 是现有技术的音频编码装置结构框架图。 该编码器包括: 心理声学分析模块 201 , 窗函数模块 202, 时频映射模块 203, 量化和熵编码 模块 204和码流复用模块 205。 其中心理声学分析模块 201用于计算输入音 频信号的感知熵和掩蔽阈值, ^^据感知熵判断该音频信号帧信号类型是快变 类型信号还是緩变类型信号。 为了预防前回声并保证编码质量, 根据心理声 学分析模块 201输出的信号类型来判断窗函数模块 202的分析窗函数的长度, 具体的说, 如果该帧信号是快变信号类型, 为预防前回声, 采用时间分辨率 较高而频率分辨率较低的 256样本点长度的窗; 如果该帧信号是緩变信号类 型, 为保证编码效率, 采用时间分辨率较低而频率分辨率较高的 2048样本点 长度的窗。 时频映射模块 203用于将时域音频信号转变成频域系数, 并输出 到量化和熵编码模块 204; 量化和熵编码模块 204在心理声学分析模块 201 输出的掩蔽阈值的控制下, 对频域系数进行量化和熵编码, 并输出到码流复 用模块 205; 码流复用模块 205用于将接收到的数据进行复用, 形成音频编 码码流。 这种音频编码装置虽然可以达到防止前回声的目的, 但由于窗函数 模块 202采用不同长度样本点的窗, 从而使整个编码器的结构复杂度变得较 高。 As shown in FIG. 4, it is a structural frame diagram of a prior art audio encoding device. The encoder includes: The psychoacoustic analysis module 201, the window function module 202, the time-frequency mapping module 203, the quantization and entropy encoding module 204, and the code stream multiplexing module 205. The psychoacoustic analysis module 201 is configured to calculate the perceptual entropy and the masking threshold of the input audio signal, and determine whether the audio signal frame signal type is a fast variable type signal or a slowly varying type signal according to the perceptual entropy. In order to prevent the pre-echo and ensure the encoding quality, the length of the analysis window function of the window function module 202 is determined according to the signal type output by the psychoacoustic analysis module 201. Specifically, if the frame signal is a fast-changing signal type, in order to prevent the pre-echo a window with a 256 sample point length with a higher temporal resolution and a lower frequency resolution; if the frame signal is a slowly varying signal type, to ensure encoding efficiency, a 2048 with a lower temporal resolution and a higher frequency resolution is used. The window of the sample point length. The time-frequency mapping module 203 is configured to convert the time domain audio signal into frequency domain coefficients and output to the quantization and entropy coding module 204; the quantization and entropy coding module 204 controls the masking threshold output by the psychoacoustic analysis module 201, The domain coefficients are quantized and entropy encoded and output to the code stream multiplexing module 205; the code stream multiplexing module 205 is configured to multiplex the received data to form an audio coded code stream. Although the audio encoding device can achieve the purpose of preventing pre-echo, the window function module 202 uses windows of different length sample points, so that the structural complexity of the entire encoder becomes higher.
在文献 ( "Adapt ive Transform Coding of Speech Signal s", Rainer Zel inski, Peter Nol l, IEEE Transact ions on Acous t ics, Speech, and Signal Process ing, Vol. ASSP-25, No. 4, Augus t 1977 ) 中, 作者讨论了 基于变换的语音编码方法, 具体的说, 通过离散余弦变换(DCT )进行语音编 码的方法。对语音信号 ^进行分帧,假设一帧样本点数为 N ( N可以 128 , 256 ,  In the literature ("Adapt ive Transform Coding of Speech Signal s", Rainer Zel inski, Peter Nol l, IEEE Transact ions on Acous ics, Speech, and Signal Process ing, Vol. ASSP-25, No. 4, Augus t 1977 In the author, the author discusses the transform-based speech coding method, specifically, the method of speech coding by discrete cosine transform (DCT). Framing the speech signal ^, assuming that the number of samples in one frame is N (N can be 128, 256,
1 2― ' 1 2― '
512 , 1024等), 对一帧信号进行方差估计, 得到标准差 , 其中 σ 。 将量化后的标准差 作为边信息传送到解码端。 将该帧信号除以量化后的标 准差 得到 ν· 其中 = 。 对输入序列进行 DCT变换得到 V , 并对其进 行量化得到 V^。 将量化后的序列传送到解码端, 进行逆 DCT变换得到 V^ , 最后序列乘以 得到重构后的语音信号 ^。 这种方法如果直接应用到音频编 码中, 对快变信号帧出现的前回声问题仍然是无能为力的。 基于变换的语音 编码方法在除以量化后的标准差 后, 仍不能改变这一帧信号 6 的特性, 即 这一帧信号仍然是非平稳的, 如图 5所示, 为应用基于变换的语音编码方法 的音频信号时域图形。 如果对其进行改进, 在快变点之前估计一个标准差 σι , 在快变点之后估计一个标准差 2, 并分别量化为 和 如图 6所示,是症用 改进后的基于变换的语音编码方法的音频信号时域图形。 这样, 首先对快变 512, 1024, etc.), variance estimation is performed on a frame signal to obtain a standard deviation, where σ . The quantized standard deviation is transmitted as side information to the decoding side. Dividing the frame signal by the quantized standard deviation yields ν · where = . A DCT transform is performed on the input sequence to obtain V, and is quantized to obtain V ^. The quantized sequence is transmitted to the decoding end, inverse DCT transform is performed to obtain V^, and the last sequence is multiplied to obtain the reconstructed speech signal ^. If this method is directly applied to audio coding, the pre-echo problem that occurs in fast-changing signal frames is still powerless. The transform-based speech coding method cannot change the characteristics of the frame signal 6 after dividing by the quantized standard deviation, that is, the frame signal is still non-stationary, as shown in FIG. 5, for applying the transform-based speech coding. Method of audio signal time domain graphics. If you improve it, estimate a standard deviation σ ι before the fast change point. A standard deviation of 2 is estimated after the fast change point, and quantized as and as shown in Fig. 6, which is an audio signal time domain pattern of the improved transform-based speech coding method. In this way, first of all, fast change
1 N、 点前的信号 Ί 除以 , 而对快变点后的信号 8 除以 , 其中 Nu-i Xj ,
Figure imgf000006_0001
为快变点的位置)。 经过这样的处理以后, 这一帧 信号就变成准稳态信号, 然后对处理后的信号做上述处理, 就可以大大改善 前回声的问题。 但改进的方法是针对于语音编码的, 对于音频编码, 这种改 进的方法难以消除快效应, 而且编码效率很低。 发明内容
1 N , the signal before the point Ί is divided by, and the signal 8 after the fast change point is divided by, where N ui Xj ,
Figure imgf000006_0001
For the location of the quick change point). After such processing, the frame signal becomes a quasi-stationary signal, and then the above processing is performed on the processed signal, which can greatly improve the problem of the pre-echo. However, the improved method is for speech coding. For audio coding, this improved method is difficult to eliminate fast effects, and the coding efficiency is very low. Summary of the invention
本发明的主要目的在于提供一种控制前回声的编码装置, 其结构筒单, 并可有效地控制音频编码时的前回声现象。  SUMMARY OF THE INVENTION A primary object of the present invention is to provide an encoding apparatus for controlling pre-echo, which has a simple structure and can effectively control a pre-echo phenomenon during audio encoding.
本发明的另一目的在于提供一种控制前回声的编码方法, 可以有效地控 制音频编码时的前回声现象。  Another object of the present invention is to provide an encoding method for controlling pre-echo, which can effectively control the pre-echo phenomenon during audio encoding.
本发明的再一目的在于提供一种控制前回声的解码装置, 可实现对由本 发明所公开的编码方法编码后的信号进行解码, 并且解码得到的信号与原音 频信号完全重构。  It is still another object of the present invention to provide a decoding apparatus for controlling pre-echo, which can decode a signal encoded by the encoding method disclosed by the present invention, and completely decode the decoded signal and the original audio signal.
本发明的又一目的在于提供一种控制前回声的解码方法, 可实现对由本 发明所公开的编码方法编码后的信号进行解码, 并且解码得到的信号与原音 频信号完全重构。  It is still another object of the present invention to provide a decoding method for controlling pre-echo, which can decode a signal encoded by the encoding method disclosed by the present invention, and completely reconstruct the decoded signal from the original audio signal.
为实现上述目的, 本发明提供了一种控制前回声的编码装置, 其中包括: 一信号类型分析模块, 用于判断输入音频信号帧的信号类型, 并输出快 变点位置和量化后的突变强度参数;  To achieve the above object, the present invention provides an encoding apparatus for controlling pre-echo, which includes: a signal type analyzing module for judging a signal type of an input audio signal frame, and outputting a fast change point position and a quantized mutation intensity Parameter
一修正窗函数模块, 与所述信号类型分析模块连接, 用于修正分析窗函 数和对所述输入音频信号帧进行加窗处理, 并输出加窗后的时域音频信号; 一时频映射模块, 与所述修正窗函数模块连接, 用于将所述加窗后的时 域音频信号转换为频域系数; - 一心理声学分析模块, 用于对所述输入音频信号帧进行心理声学处理, 并输出尺度因子带的掩蔽阈值参数; 一量化和熵编码模块, 分别与所述时频映射模块和所述心理声学分析模 块连接, 用于根据所述心理声学分析模块输出的所述掩蔽阔值参数, 对所述 时频映射模块输出的频域系数进行量化和熵编码, 并输出编码码流; a correction window function module, coupled to the signal type analysis module, configured to modify the analysis window function and window-process the input audio signal frame, and output the windowed time domain audio signal; a time-frequency mapping module, And the correction window function module is configured to convert the windowed time domain audio signal into a frequency domain coefficient; a psychoacoustic analysis module, configured to perform psychoacoustic processing on the input audio signal frame, and a masking threshold parameter of the output scale factor band; a quantization and entropy coding module, which is respectively connected to the time-frequency mapping module and the psychoacoustic analysis module, and configured to output the time-frequency mapping module according to the masking threshold parameter output by the psychoacoustic analysis module The frequency domain coefficients are quantized and entropy encoded, and the encoded code stream is output;
一码流复用模块, 与所述量化和熵编码模块和所述信号类型分析模块相 连接, 用于将所述量化和熵编码模块输出的所述编码码流和信号类型分析模 块输出的结果进行复用, 并形成音频编码码流。  a code stream multiplexing module, coupled to the quantization and entropy coding module and the signal type analysis module, configured to output the coded code stream and the signal type analysis module output by the quantization and entropy coding module Multiplexing is performed and an audio coded stream is formed.
为实现上述再一目的, 本发明提供了一种控制前回声的编码方法, 其中 包括以下步骤:  In order to achieve the above further object, the present invention provides an encoding method for controlling pre-echo, which includes the following steps:
步骤 1、信号类型分析模块判断输入音频信号帧的信号类型是否为快变类 型信号, 是则所述信号类型分析模块计算快变点位置的参数及所述音频信号 帧的突变强度参数, 并将该突变强度参数进行量化, 得到突变强度的量化值, 然后执行步骤 2; 否则所述修正窗函数模块用原始的分析窗函数对所述音频 信号帧进行加窗, 得到加窗后的时域信号, 然后执行步骤 4;  Step 1, the signal type analysis module determines whether the signal type of the input audio signal frame is a fast change type signal, and the signal type analysis module calculates a parameter of the fast change point position and a sudden intensity parameter of the audio signal frame, and The mutation intensity parameter is quantized to obtain a quantized value of the mutation intensity, and then step 2 is performed; otherwise, the correction window function module uses the original analysis window function to window the audio signal frame to obtain a windowed time domain signal. , then perform step 4;
步骤 2、修正窗函数模块对分析窗函数进行线性变换,得到修正后的分析 窗函数;  Step 2. The correction window function module linearly transforms the analysis window function to obtain a modified analysis window function;
步骤 3、所述修正窗函数模块用修正后的分析窗函数对所述音频信号帧进 行加窗, 得到加窗后的时域信号;  Step 3: The correction window function module adds a window to the audio signal frame by using a modified analysis window function to obtain a time domain signal after windowing;
步骤 4、 时频映射模块对所述加窗后的时域信号进行时频映射处理,得到 频域系数;  Step 4: The time-frequency mapping module performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
步骤 5、量化和熵编码模块根据心理声学模块对音频信号帧进行心理声学 处理而得到的尺度因子带的掩蔽阈值参数, 对所述频域系数进行量化和熵编 码, 得到编码后的音频码流;  Step 5: The quantization and entropy coding module quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module, to obtain the encoded audio code stream. ;
步骤 6、码流复用模块将所述编码后的音频码流和所述信号类型分析的结 果进行复用, 得到压缩音频码流。  Step 6. The code stream multiplexing module multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
为实现上述另一目的, 本发明提供了一种控制前回声的解码装置, 其中 包括:  In order to achieve the above other object, the present invention provides a decoding apparatus for controlling pre-echo, which includes:
一码流解复用模块, 用于对压缩音频码流进行解复用;  a code stream demultiplexing module, configured to demultiplex the compressed audio code stream;
一逆量化和熵解码模块, 与所述码流解复用模块连接, 用于对所述解复 用后的音频码流进行解码和逆量化, 并输出逆量化后的频域系数; 一频时映射模块, 与所述逆量化和熵解码模块连接, 用于将所述逆量化 后的频域系数变换为时域信号; An inverse quantization and entropy decoding module is connected to the code stream demultiplexing module, configured to decode and inverse quantize the demultiplexed audio code stream, and output inverse quantized frequency domain coefficients; a frequency time mapping module, coupled to the inverse quantization and entropy decoding module, configured to transform the inverse quantized frequency domain coefficients into a time domain signal;
一修正窗函数模块, 与所述频时映射模块连接, 用于修正综合窗函数和 对所述时域信号进行加窗处理。  A correction window function module is coupled to the frequency time mapping module for modifying the integrated window function and windowing the time domain signal.
为实现上述又一目的, 本发明提供了一种控制前回声的解码方法, 其中 包括以下步骤:  In order to achieve the above further object, the present invention provides a decoding method for controlling pre-echo, which includes the following steps:
步骤 1、码流解复用模块对输入的压缩音频码流进行解复用,得到解复用 后的音频码流和边信息;  Step 1: The code stream demultiplexing module demultiplexes the input compressed audio code stream to obtain the demultiplexed audio code stream and side information.
步驟 2 ,逆量化和熵解码模块对所述解复用后的音频码流进行逆量化和熵 解码, 得到逆量化后的频域系数;  Step 2: The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients;
步骤 3、频时映射模块对所述逆量化后的频域系数进行频时映射处理,得 到时域信号;  Step 3: The frequency time mapping module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal.
步骤 4、修正窗函数模块根据所述解复用后的边信息判断所述音频信号帧 的信号类型是否为快变类型, 是则执行步驟 5; 否则执行步骤 6;  Step 4: The correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, step 5 is performed; otherwise, step 6 is performed;
步骤 5、修正窗函数模块对综合窗函数进行线性变换,得到修正后的综合 窗函数, 然后用修正后的综合窗函数对所述时域信号进行加窗, 得到重构后 的音频信号;  Step 5: The correction window function module linearly transforms the integrated window function to obtain a modified integrated window function, and then uses the modified integrated window function to window the time domain signal to obtain the reconstructed audio signal;
步骤 6、所述修正窗函数模块用原始的综合窗函数对所述时域信号进行加 窗, 得到重构后的音频信号。  Step 6. The correction window function module uses the original integrated window function to window the time domain signal to obtain a reconstructed audio signal.
因此, 本发明具有以下优点: 由于窗函数采用固定窗长, 筒化了编码装 置的结构, 在控制音频编码时的前回声现象的同时保证了音频信号的完全重 构。  Therefore, the present invention has the following advantages: Since the window function employs a fixed window length, the structure of the encoding device is compressed, and the pre-echo phenomenon during audio encoding is controlled while the complete reconstruction of the audio signal is ensured.
以下结合附图和具体的实施例对本发明作进一步的详细说明。 附图说明  The invention will be further described in detail below with reference to the drawings and specific embodiments. DRAWINGS
图 1是人耳的掩蔽特性图。  Figure 1 is a masking characteristic diagram of the human ear.
图 2是未编码的音频信号时域图形。  Figure 2 is an uncoded audio signal time domain graph.
图 3是编码重构后的音频信号时域图形。  Figure 3 is a time domain graph of the encoded audio signal after reconstruction.
图 4是现有技术的音频编码装置结构框架图。 图 5是应用基于变换的语音编码方法的音频信号时域图形。 4 is a structural block diagram of a prior art audio encoding device. FIG. 5 is an audio signal time domain graph to which a transform-based speech encoding method is applied.
图 6是应用改进后的基于变换的语音编码方法的音频信号时域图形。 图 7是本发明控制前回声的编码装置实施例 1的结构框图。  6 is an audio signal time domain graph to which an improved transform-based speech encoding method is applied. Fig. 7 is a block diagram showing the configuration of a first embodiment of an apparatus for controlling pre-echo of the present invention.
图 8是本发明控制前回声的编码方法实施例 1的流程图。  Fig. 8 is a flow chart showing the first embodiment of the encoding method of the pre-control echo according to the present invention.
图 9是本发明控制前回声的编码和解码方法的原始分析窗和原始综合窗 的示意图。  Figure 9 is a schematic illustration of the original analysis window and the original synthesis window of the encoding and decoding method for controlling the pre-echo of the present invention.
图 10是本发明控制前回声的编码方法的修正后的分析窗的示意图。 图 11是本发明控制前回声的解码方法的修正后的综合窗的示意图。 图 12是本发明控制前回声的编码方法中窗函数修正符合完全重构条件的 示意图。  Figure 10 is a schematic illustration of a modified analysis window of the encoding method for controlling pre-echo in accordance with the present invention. Figure 11 is a schematic illustration of a modified integrated window of the decoding method for controlling the pre-echo of the present invention. Fig. 12 is a view showing the window function correction in accordance with the full reconstruction condition in the encoding method of the pre-control echo of the present invention.
图 13是本发明控制前回声的编码方法的加过渡块的修正后的分析窗函数 的示意图。  Figure 13 is a diagram showing the modified analysis window function of the transition block of the encoding method of the pre-echo control of the present invention.
图 14是本发明控制前回声的解码方法的加过渡块的修正后的综合窗函数 的示意图。  Figure 14 is a diagram showing the modified integrated window function of the transition block of the decoding method for controlling the pre-echo of the present invention.
图 15是本发明控制前回声的编码装置实施例 2的结构框图。  Fig. 15 is a block diagram showing the configuration of a second embodiment of the apparatus for controlling pre-echo of the present invention.
图 16是本发明控制前回声的编码方法实施例 2的流程图。  Fig. 16 is a flow chart showing the second embodiment of the encoding method of the pre-control echo according to the present invention.
图 17是本发明控制前回声的解码装置实施例 1的结构框图。  Figure 17 is a block diagram showing the structure of a first embodiment of a decoding apparatus for controlling pre-echo of the present invention.
图 18是本发明控制前回声的解码方法实施例 1的流程图。  Figure 18 is a flow chart showing Embodiment 1 of the decoding method of the pre-control echo according to the present invention.
图 19是本发明控制前回声的解码装置实施例 2的结构框图。  Fig. 19 is a block diagram showing the configuration of a second embodiment of the decoding apparatus for controlling the pre-echo of the present invention.
图 20是本发明控制前回声的解码方法实施例 2的流程图。 具体实施方式  Figure 20 is a flow chart showing Embodiment 2 of the decoding method of the pre-control echo according to the present invention. detailed description
本发明是利用修正的窗函数(Modif ied Window Funct ion, MWF )控制音 频编码中出现的前回声信号, 实现了在控制音频编码时的前回声现象的同时 保证了音频信号的完全重构。  The present invention utilizes a modified window function (MWF) to control the pre-echo signal appearing in the audio coding, thereby realizing the pre-echo phenomenon when controlling the audio coding while ensuring complete reconstruction of the audio signal.
参见图 7, 图 7是本发明控制前回声的编码装置实施例 1的结构框图。该 装置由以下功能模块组成: 信号类型分析模块 301 , 用于判断输入音频信号 帧的信号类型, 并输出快变点位置和量化后的突变强度参数, 其中所述的信 号类型分析模块 301 包括信号类型分析器, 用于判断所述输入音频帧信号是 緩变类型信号还是快变类型信号; 计算快变点位置器, 与所述信号类型分析 器相连接, 用于计算快变点的位置; 突变强度计算器, 与所述信号类型分析 器相连接, 用于计算信号的突变强度; 突变强度量化器, 与所述突变强度计 算器相连接,用于对所述计算信号的突变强度进行量化;修正窗函数模块 302 , 与所述信号类型分析模块 301连接, 用于修正分析窗函数和对所述输入音频 信号帧进行加窗处理, 并输出加窗后的时域音频信号, 提高了对快变信号进 行编码的时间分辨率; 时频映射模块 304 , 与所述修正窗函数模块 302连接, 用于将所述加窗后的时域音频信号转换为频域系数; 心理声学分析模块 303 , 用于对所述输入音频信号帧进行心理声学处理, 并输出尺度因子带的掩蔽阈 值参数; 量化和熵编码模块 305 , 分别与所述时频映射模块 304和所述心理 声学分析模块 303连接, 用于根据所述心理声学分析模块 303输出的所述掩 蔽阈值参数, 对所述时频映射模块 304输出的频域系数进行量化和熵编码, 并输出编码码流; 码流复用模块 306 , 与所述量化和熵编码模块 305和所述 信号类型分析模块 301相连接, 用于将所述量化和熵编码模块 305输出的所 述编码码流和所述信号类型分析模块 301输出的结果进行复用, 并形成音频 编码码流。 Referring to FIG. 7, FIG. 7 is a structural block diagram of Embodiment 1 of an encoding apparatus for controlling pre-echo of the present invention. The device is composed of the following functional modules: a signal type analysis module 301, configured to determine a signal type of the input audio signal frame, and output a fast change point position and a quantized abrupt intensity parameter, wherein the signal type analysis module 301 includes a signal. a type analyzer, configured to determine that the input audio frame signal is The slowly varying type signal is also a fast variable type signal; a fast change point positioner is connected, coupled to the signal type analyzer for calculating a position of the fast change point; a mutation strength calculator, connected to the signal type analyzer a mutation intensity for calculating a signal; a mutation intensity quantizer coupled to the mutation intensity calculator for quantizing the intensity of the mutation of the calculated signal; a correction window function module 302, and the signal type analysis module a 301 connection, configured to modify an analysis window function and window-process the input audio signal frame, and output a windowed time domain audio signal, thereby improving a time resolution of encoding the fast-changing signal; and a time-frequency mapping module 304, connected to the modified window function module 302, configured to convert the windowed time domain audio signal into a frequency domain coefficient; a psychoacoustic analysis module 303, configured to perform psychoacoustic processing on the input audio signal frame And outputting a masking threshold parameter of the scale factor band; a quantization and entropy encoding module 305, respectively, and the time-frequency mapping module 304 and the psychoacoustic sound The analysis module 303 is configured to perform quantization and entropy coding on the frequency domain coefficients output by the time-frequency mapping module 304 according to the masking threshold parameter output by the psychoacoustic analysis module 303, and output an encoded code stream; a multiplexing module 306, coupled to the quantization and entropy encoding module 305 and the signal type analyzing module 301, configured to output the encoded code stream and the signal type analyzing module output by the quantization and entropy encoding module 305 The results of the 301 output are multiplexed and form an audio coded stream.
时频映射模块 304 由滤波器组构成, 滤波器组可以是离散傅立叶变换 ( DFT )滤波器组、 离散余弦变换( DCT )滤波器组、修正离散余弦变换( MDCT ) 滤波器组、 余弦调制滤波器组等。 当采用离散傅立叶变换(DFT ) 滤波器组、 离散余弦变换(DCT )滤波器组、 余弦调制滤波组等正交变换滤波器组时, 分 析窗函数中的窗长与音频信号帧长相等, 而窗函数可以选择汉宁 (Hanning ) 窗、 汉明 (Hamming )窗、 布莱克曼(Blacbnan )窗; 当采用修正离散余弦变 换(MDCT ) 滤波器组时, 分析窗函数中的窗长为音频信号帧长的两倍, 而窗 函数可以选择任一符合修正离散余弦变换条件的窗函数。  The time-frequency mapping module 304 is composed of a filter bank, which may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, a modified discrete cosine transform (MDCT) filter bank, and a cosine modulation filter. Group and so on. When a discrete transform filter bank such as a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, or a cosine modulation filter set is used, the window length in the analysis window function is equal to the audio signal frame length, and The window function can select Hanning window, Hamming window, Blacknan window; when using modified discrete cosine transform (MDCT) filter bank, the window length in the analysis window function is the audio signal frame. It is twice as long, and the window function can select any window function that conforms to the condition of the modified discrete cosine transform.
在量化和熵编码模块 305 中, 量化器由一组子量化器组成, 每个子量化 器分别根据心理声学分析模块 303输出的特定时频区域的掩蔽阈值, 量化本 区域的频域系数, 通常将该区域称为尺度因子带。 所述的量化器可以采用标 量量化器和矢量量化器, 如运动图象专家组高级音频编码 (MPEG AAC ) 的非 线性标量量化器, 以及运动图象专家组双(MPEG TwinVQ ) 的矢量量化器。 参见图 8 , 图 8是本发明控制前回声的编码方法实施例 1的流程图, 步骤 如下: . In the quantization and entropy coding module 305, the quantizer is composed of a set of sub-quantizers, each of which quantizes the frequency domain coefficients of the local region according to the masking threshold of the specific time-frequency region output by the psychoacoustic analysis module 303, usually This area is called the scale factor band. The quantizer can employ a scalar quantizer and a vector quantizer, such as a Moving Picture Experts Group Advanced Audio Coding (MPEG AAC) nonlinear scalar quantizer, and a Moving Picture Experts Group Dual (MPEG TwinVQ) vector quantizer. . Referring to FIG. 8, FIG. 8 is a flowchart of Embodiment 1 of a method for encoding pre-echo control according to the present invention, and the steps are as follows:
步骤 21、 信号类型分析模块 301判断输入音频信号帧的信号类型是否为 快变类型信号, 是则执行步驟 22, 否则执行步骤 25;  Step 21, the signal type analysis module 301 determines whether the signal type of the input audio signal frame is a fast change type signal, if yes, go to step 22, otherwise go to step 25;
步骤 22、 所述信号类型分析模块 301计算快变点位置的参数及所述音频 信号帧的突变强度参数, 并将该突变强度参数进行量化, 得到突变强度的量 化值;  Step 22: The signal type analysis module 301 calculates a parameter of the fast change point position and a mutation intensity parameter of the audio signal frame, and quantizes the mutation strength parameter to obtain a quantized value of the mutation intensity;
步骤 23、 修正窗函数模块 302对分析窗函数的所述快变点位置以后的函 数值进行等比例的缩小, 缩小的值等于所述突变强度的量化值, 得到修正后 的分析窗函数;  Step 23: The correction window function module 302 performs an equal scaling reduction on the function value of the fast change point position of the analysis window function, and the reduced value is equal to the quantization value of the mutation intensity, and the modified analysis window function is obtained;
步骤 24、 所述修正窗函数模块 302用修正后的分析窗函数对所述音频信 号帧进行加窗, 得到加窗后的时域信号, 并执行步驟 26;  Step 24, the correction window function module 302 uses the modified analysis window function to window the audio signal frame to obtain a windowed time domain signal, and step 26 is performed;
步驟 25、 所述修正窗函数模块 302用原始的分析窗函数对所述音频信号 帧进行加窗, 得到加窗后的时域信号, 并执行步骤 26;  Step 25, the correction window function module 302 uses the original analysis window function to window the audio signal frame to obtain a windowed time domain signal, and step 26 is performed;
步骤 26、时频映射模块 304对所述加窗后的时域信号进行时频映射处理, 得到频域系数;  Step 26: The time-frequency mapping module 304 performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
步骤 27、 量化和熵编码模块 305根据心理声学模块 303对音频信号帧进 行心理声学处理而得到的尺度因子带的掩蔽阔值参数, 对所述频域系数进行 量化和熵编码, 得到编码后的音频码流;  Step 27: The quantization and entropy coding module 305 quantizes and entropy encodes the frequency domain coefficients according to the masking threshold parameter of the scale factor band obtained by the psychoacoustic module 303 performing psychoacoustic processing on the audio signal frame, to obtain the encoded Audio stream
步骤 28、 码流复用模块 306将所述编码后的音频码流和所述信号类型分 析的结果进行复用, 得到压缩音频码流。  Step 28: The code stream multiplexing module 306 multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
在以上的解码方法的步骤 21中,在所述信号类型分析模块 301对输入音 频信号帧的信号类型进行判断的同时, 心理声学模块 303对音频信号帧进行 心理声学处理, 得到尺度因子带的掩蔽阈值参数。 所述心理声学处理是根据 人耳听觉特性计算当前帧信号的掩蔽曲线, 根据掩蔽曲线可以计算特定时频 区域的掩蔽阈值, 用于指导对当前音频帧信号的量化, 这里的心理声学模型 可以是 MPEG AAC 所用的第一类或第二类心理声学模型。 信号类型分析模块 301 基于自适应阈值和波形预测进行前、 后向掩蔽效应来对该帧信号进行信 号类型判断, 具体步骤是: 把输入帧分解成多个子帧, 并查找各个子帧上 PCM 数据绝对值的局部最大点; 在各子帧的局部最大点中选出子帧的绝对峰值; 对某个子帧绝对峰值, 利用该子帧前面的多个(典型的可取 3个)子帧绝对 峰值预测相对该子帧前向延迟的多个(典型的可取 4个)子帧的典型样本值; 计算该子帧绝对峰值与所预测出的典型样本值的差值和比值; 如果预测差值 和比值都大于设定的阔值, 则判断该子帧存在突跃信号, 确认该子帧具备后 向掩蔽预回声能力的局部最大峰点, 如果在该子帧前端与掩蔽峰点前 2. 5ms 处之间存在一个绝对峰值足够小的子帧,则判断该帧信号属于快变类型信号, 将存在突跃信号的子帧做为快变点的位置, 将存在突跃信号的子帧的绝对峰 值和该子帧前面所有子帧中最大绝对峰值的比作为突变强度, 并将突变强度 进行量化, 量化方法可以是上取整, 下取整和四舍五入等; 如果预测差值和 比值不大于设定的闹值, 则重复上述步驟直到判断出该帧信号是快变类型信 号或者到达最后一个子帧, 如果到达最后一个子帧仍未判断出该帧信号是快 变类型信号, 则该帧信号属于緩变类型信号。 In step 21 of the above decoding method, while the signal type analysis module 301 determines the signal type of the input audio signal frame, the psychoacoustic module 303 performs psychoacoustic processing on the audio signal frame to obtain masking of the scale factor band. Threshold parameter. The psychoacoustic processing is a masking curve for calculating a current frame signal according to a human ear hearing characteristic, and a masking threshold of a specific time-frequency region can be calculated according to the masking curve for guiding quantization of a current audio frame signal, where the psychoacoustic model can be The first or second type of psychoacoustic model used by MPEG AAC. The signal type analysis module 301 performs front and back masking effects based on the adaptive threshold and the waveform prediction to perform signal type determination on the frame signal. The specific steps are: decomposing the input frame into multiple subframes, and searching for PCM on each subframe. The local maximum point of the absolute value of the data; the absolute peak value of the sub-frame is selected in the local maximum point of each sub-frame; for a certain sub-frame absolute peak, a plurality of (typically 3) sub-frames in front of the sub-frame are used absolutely The peak sample predicts a typical sample value of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculates a difference and a ratio of the absolute peak of the subframe to the predicted typical sample value; If the ratio and the ratio are greater than the set threshold, it is determined that the sub-frame has a sudden signal, and the sub-frame has a local maximum peak point with a backward masking pre-echo capability, if the front end of the sub-frame and the mask peak point are 2. If there is a sub-frame with an absolute peak small enough between 5ms, it is judged that the frame signal belongs to the fast-changing type signal, and the sub-frame with the sudden signal is used as the position of the fast change point, and the sub-frame of the sudden signal will be present. The ratio of the absolute peak to the largest absolute peak in all the sub-frames before the sub-frame is used as the intensity of the mutation, and the intensity of the mutation is quantified. The quantization method may be rounding up, down-and-rounding, rounding, etc.; If the ratio is not greater than the set value, the above steps are repeated until it is determined that the frame signal is a fast-changing type signal or reaches the last subframe, and if the last subframe is reached, the frame signal is not determined to be a fast-changing type signal. , the frame signal belongs to a slowly varying type signal.
对时域音频信号变换到频域, 对时域音频信号进行时频变换的方法有很 多, 可采用离散傅立叶变换(DFT ) 、 离散余弦变换(DCT ) 、 余弦调制滤波 器组、 修正离散余弦变换(MDCT )或小波变换等变换方法。 当采用 DFT、 DCT 或余弦调制滤波器组等正交变换方法时, 分析窗函数中的窗长与音频信号帧 长相等, 而窗函数可以选择汉宁 (Hanning ) 窗、 汉明 ( Ha匪 ing ) 窗、 布莱 克曼(Blackmail ) 窗; 当采用修正离散余弦变换(MDCT ) 时, 分析窗函数中 的窗长为音频信号帧长的两倍, 而窗函数可以选择任一符合修正离散余弦变 换条件的窗函数。 分析窗函数釆用固定长度的窗函数, 长度为大于 1的整数, 优选 2的 N次方, 其中 N为自然数。 窗的选 殳计可参看 《离散时间信号处 理(笫二版) 》 , 西安交通大学出版社, A. V.奥本海姆, R. W.谢弗, J. R.巴 克著, 刘树棠, 黄建国译, 2001。 由于时频映射和加窗函数是分不开的, 所 以下面以 MDCT说明时频映射和加窗的过程。  There are many methods for time-frequency transform of time-domain audio signals into time-frequency audio signals, such as discrete Fourier transform (DFT), discrete cosine transform (DCT), cosine-modulated filter bank, and modified discrete cosine transform. Transform method such as (MDCT) or wavelet transform. When using orthogonal transform methods such as DFT, DCT or cosine-modulated filter banks, the window length in the analysis window function is equal to the length of the audio signal frame, and the window function can select Hanning window, Hamming (Ha匪ing) ) window, Blackmail window; when using modified discrete cosine transform (MDCT), the window length in the analysis window function is twice the frame length of the audio signal, and the window function can select any condition that matches the modified discrete cosine transform. Window function. The analysis window function uses a fixed length window function, the length being an integer greater than 1, preferably 2 to the power of N, where N is a natural number. For the selection of windows, see "Discrete Time Signal Processing (笫2)", Xi'an Jiaotong University Press, A. V. Oppenheim, R. W. Schaeffer, J. R. Barker, Liu Shuzhen, Huang Jianguo, 2001. Since the time-frequency mapping and the windowing function are inseparable, the process of time-frequency mapping and windowing is illustrated by MDCT below.
对于采用 MDCT进行时频映射的情况,首先选取前一帧 M个样本和当前帧 M个样本的时域信号, 再对这两帧共 2M个样本的时域信号通过模块 302进行 加窗操作, 其中分析窗的窗长为帧长的二倍, 然后对经过加窗后的信号利用 时频映射模块 304进行 MDCT变换, 从而获得 M个频域系数。 MDCT分析滤波器的脉冲响应为:
Figure imgf000013_0001
For the case of time-frequency mapping by using MDCT, firstly, the time domain signals of the M samples of the previous frame and the M samples of the current frame are selected, and then the time domain signals of 2M samples of the two frames are windowed by the module 302. The window of the analysis window is twice as long as the frame length, and then the framed signal is subjected to MDCT transformation by the time-frequency mapping module 304 to obtain M frequency domain coefficients. The impulse response of the MDCT analysis filter is:
Figure imgf000013_0001
2M-\  2M-\
X(k)= ∑ x{n)h An)  X(k)= ∑ x{n)h An)
则 MDCT变换为: "=0 , 0≤k≤M-l, 其中: w(n)为窗函 数; x(n)为 MDCT变换的输入时域音频信号; X(k)为 MDCT变换的输出频域信 为满足信号完全重构的条件, MDCT变换的窗函数 w(n)必须满足以下两个 条件: Then MDCT transforms into: "=0, 0≤k≤Ml, where: w (n) is a window function; x(n) is the input time domain audio signal of MDCT transform; X(k) is the output frequency domain of MDCT transform In order to satisfy the condition that the signal is completely reconstructed, the window function w(n) of the MDCT transform must satisfy the following two conditions:
w(2M -l-n) = w(n) 且 w2 (n) + w2(n + M) =1 在实际中, 可选用 Sine窗、 KBD窗等作为窗函数。 下面以 Sine窗为例说 明在模块 302中如何对其进行修正以达到控制前回声的目的。需要说明的是, 本发明并不限于 Sine窗和 KBD窗, 只要是满足 MDCT变换条件的窗函数, 都 可以用来对其进行修正, 最终达到控制前回声的目的。 w(2M -ln) = w(n) and w 2 (n) + w 2 (n + M) =1 In practice, Sine window, KBD window, etc. can be selected as the window function. The Sine window is taken as an example to illustrate how to modify it in the module 302 to achieve the purpose of controlling the pre-echo. It should be noted that the present invention is not limited to the Sine window and the KBD window, and any window function that satisfies the MDCT transformation condition can be used to correct it, and finally achieve the purpose of controlling the pre-echo.
当分析出此帧信号 9为緩变类型信号时, 原始分析窗函数不变, 图 9所 示的便是本发明控制前回声的编码和解码方法的原始分析窗函数的示意图。 如果此帧信号 9为快变类型信号时, 则对原始分析窗函数进行修正。 所述的 修正处理是: 对快变点以后的窗函数值进行等比例的缩小, 缩小的值等于量 化后突变强度的大小。 修正后的分析窗函数如图 10所示, 其中图 10的快变 点位置是第 1280样本点 10, 量化后的突变强度为 5。 为满足信号完全重构的 条件, 当对分析窗进行上述修正后, 也必须对解码时的的综合窗进行修正, 所述的修正处理是: 对快变点以后的窗函数值进行等比例的放大, 放大的值 等于量化后突变强度的大小, 修正后的综合窗函数如图 11所示, 其中图 11 的快变点位置是第 1280样本点 11, 量化后的突变强度为 5。 下面给出当分析 窗和综合窗做上述修正后仍可以满足完全重构条件的证明。  When the frame signal 9 is analyzed as a slowly varying type signal, the original analysis window function is unchanged, and Fig. 9 is a schematic diagram of the original analysis window function of the encoding and decoding method for controlling the pre-echo of the present invention. If the frame signal 9 is a fast-changing type signal, the original analysis window function is corrected. The correction processing is: performing a proportional reduction on the value of the window function after the fast change point, and the reduced value is equal to the magnitude of the mutation intensity after the quantization. The modified analysis window function is shown in Fig. 10, in which the fast change point position of Fig. 10 is the 1280 sample point 10, and the quantized mutation intensity is 5. In order to satisfy the condition of complete signal reconstruction, after the above-mentioned correction is performed on the analysis window, the integrated window at the time of decoding must also be corrected. The correction processing is: equalizing the value of the window function after the fast change point. The value of the amplification is equal to the intensity of the mutation after quantification. The modified integrated window function is shown in Fig. 11. The fast change point position of Fig. 11 is the 1280 sample point 11, and the quantized mutation intensity is 5. The proof that the analysis window and the synthesis window can still satisfy the full reconstruction condition after the above correction is given below.
文献 ( S. F. Cheung and J. S. Lim, "Incorporation of biorthogonality into lapped transforms for audio compression" . Proc. IEEE Int. Conf. On Acoustics, Speech, and Signal Processing, Detroit, May 1997, pp3079-3082 ) 中证明, 满足余弦调制滤波器组的双正交完全重构条件为: ∑ p {mM + n)p {{m + 2s)M + n) = d(s) The literature ( SF Cheung and JS Lim, "Incorporation of biorthogonality into lapped transforms for audio compression" . Proc. IEEE Int. Conf. On Acoustics, Speech, and Signal Processing, Detroit, May 1997, pp3079-3082 ) proves that the cosine is satisfied The bi-orthogonal complete reconstruction conditions of the modulation filter bank are: ∑ p {mM + n)p {{m + 2s)M + n) = d(s)
0 s ci  0 s ci
2K-l-2s  2K-l-2s
∑ (-^ mp ( M + n)p ((m + 2s)M + (M -n-l)) = 0 ∑ (-^ m p ( M + n)p ((m + 2s)M + (M -nl)) = 0
m = 0 S a m = 0 S a
其中: AW和^ )分别为综合窗和分析窗, M为帧长, K为大于零的整 数, s = 0, 1, ... , K-1, n = 0, 1, ... , M- 1。  Where: AW and ^) are the synthesis window and the analysis window, respectively, M is the frame length, K is an integer greater than zero, s = 0, 1, ..., K-1, n = 0, 1, ... M-1.
如果所用的余弦调制滤波器组为 MDCT, 则上式中 K等于 1, 则 s等于 0。 带入上式可以对其做进一步筒化:  If the cosine-modulated filter bank used is MDCT, then K is equal to 1 in the above equation and s is equal to 0. Bring it into the above formula to further it:
p { )pa l (n) + 1 (M + ") l(M + n)^l ρ {n)pa l ( - « - 1) - 1 ( + ή)^- 1 (2 - » - 1) = 0 ( χ ) 其中 i表示帧序号。 i- 1表示上一帧, i表示当前帧。 p { )p a l (n) + 1 (M + ") l (M + n)^l ρ {n)p a l ( - « - 1) - 1 ( + ή)^- 1 (2 - » - 1) = 0 ( χ ) where i denotes the frame number. i-1 denotes the previous frame and i denotes the current frame.
如图 12所示为本发明控制前回声的编码方法中窗函数修正符合完全重构 条件的示意图。 图 12中表示连续四帧的信号, 并且已经判断出第 i帧为快变 帧, 快变点 12的位置为第 M+L点, 突变强度为 scs, 其中 scs为实数。 当对 第 i帧做 MDCT变换时,按照上面所述对原始分析窗^ )和原始综合窗 做 修正得到修正后的分析窗 »和修正后的综合窗^ («) , 则:  FIG. 12 is a schematic diagram showing the window function correction in accordance with the full reconstruction condition in the encoding method of the pre-control echo according to the present invention. The signal of four consecutive frames is shown in Fig. 12, and it has been judged that the i-th frame is a fast-changing frame, the position of the fast change point 12 is the M+L point, and the mutation intensity is scs, where scs is a real number. When the MDCT transform is performed on the i-th frame, the original analysis window ^) and the original synthesis window are corrected as described above to obtain the modified analysis window » and the modified integrated window ^ («), then:
pa = pa(n) 0≤n≤M + L p a = p a (n) 0 ≤ n ≤ M + L
Pa = Pa(n)/scs M +L≤n< 2M P a = P a (n)/scs M +L≤n< 2M
 with
Ps(n) = Ps(n) 0≤n<M + L P s (n) = P s (n) 0 ≤ n < M + L
p (n) = p («) · scs M + L≤n< 2M  p (n) = p («) · scs M + L≤n< 2M
s s  s s
所以, 对第 i帧 pl 'l M + n)p1' 1 ( + n)So, for the ith frame p l ' l M + n)p 1 ' 1 ( + n)
Figure imgf000014_0001
Figure imgf000014_0001
= Ps l { )pa l (") + ρ8 1 {Μ + n)pa l 1 ( + n)
Figure imgf000014_0002
= P s l { )p a l (") + ρ 8 1 {Μ + n)p a l 1 ( + n)
Figure imgf000014_0002
12 ~l (2M - " - 1)12 ~ l (2M - " - 1)
Figure imgf000015_0001
Figure imgf000015_0001
= 0 ( 3 ) 由式( 2 )和式( 3 )可知, 对第 i帧做 MDCT时, 用修正窗可以完全重构。 同理可证对 i + 1帧和 i + 2帧做 MDCT, 用修正窗也都可以完全重构。 进一步 可以证明, 只要对原始窗做线性变换, 则不会改变变换的完全重构特性。  = 0 ( 3 ) It can be known from equations ( 2 ) and ( 3 ) that when MDCT is performed on the i-th frame, the correction window can be completely reconstructed. Similarly, MDCT can be performed on i + 1 frame and i + 2 frame, and the correction window can be completely reconstructed. It can be further proved that as long as the original window is linearly transformed, the complete reconstruction characteristics of the transform are not changed.
由于在原始窗的快变点处做突然变化会产生高频信息, 这会降低编码效 率, 因此在步骤 23中, 在所述对分析窗函数的所述快变点位置以后的函数值 进行等比例的缩小中, 可在快变点附近加过渡块使快变点附近的分析窗函数 值缓慢变化。 加过渡块的窗函数方法为: 假设窗函数为 其中0≤ " ≤ 2M— 量化后的突变强度为^ 快变点位置是 L。 如果不加过渡块, 则修正后的窗函 数 为:
Figure imgf000015_0002
Since the abrupt change at the fast change point of the original window generates high frequency information, which reduces the coding efficiency, in step 23, the function value after the fast change point position of the pair analysis window function is performed, etc. In the reduction of the ratio, a transition block can be added near the fast change point to slowly change the value of the analysis window function near the fast change point. The window function method of adding the transition block is: Assume that the window function is 0 ≤ " ≤ 2M - the quantized intensity of the mutation is ^ The fast change point position is L. If no transition block is added, the modified window function is:
Figure imgf000015_0002
如果加过渡块, 并且过渡块的长度为快变点位置的前 个样本点和后 -i 个样
Figure imgf000015_0003
If a transition block is added, and the length of the transition block is the previous sample point and the post-i sample at the fast change point position
Figure imgf000015_0003
(n = (S (n - L + l) (n = (S (n - L + l)
其中 为线性函数: gW— l "~。如图 1 3所示是本发明控制前 回声的编码方法的加过渡块的修正后的分析窗函数的示意图。 与编码时修正 分析窗函数相对应, 在解码时也要增加过渡块的处理, 如图 14所示是本发明 控制前回声的解码方法的加过渡块的修正后的综合窗函数的示意图, 图 13的 过渡块 13与图 14中的过渡块 14长度均为 64个样本点。 前已证明, 只要是 对原始窗做线性变换, 不会改变变换的完全重构特性, 所以我们还可以根据 信号类型对分析窗函数或综合窗做任意的线性变换。 这里所指的加过渡段的 窗函数或对窗函数局作任意线形变换都可以针对以上所有实施例中的编码解 码方法中采用的分析窗或综合窗。 在步骤 27中, 所述量化和熵编码包括非线性量化和熵编码两个步驟, 其 中量化处理可采用标量量化法或矢量量化法。 所述标量量化法可采用 MPEG. AAC所用的非线性标量, 所述矢量量化法可釆用 MPEG TwinVQ的矢量量化。 所述的量化处理也可以采用一种基于极小化全局噪声掩蔽比准则和熵编码的 量化的音频编码方法(专利申请号 03146213. 8 ) 。 在经过量化处理后, 利用 熵编码技术进一步去除量化后的系数以及边信息的统计冗余, 最后得到压缩 音频码流。 Where is a linear function: gW - l "~. As shown in Fig. 13 is a schematic diagram of the modified analysis window function of the transition block of the encoding method of the pre-echo control of the present invention. Corresponding to the modified analysis window function at the time of encoding, The processing of the transition block is also added during decoding, as shown in FIG. 14 is a schematic diagram of the modified integrated window function of the transition block of the pre-control echo decoding method of the present invention, the transition block 13 of FIG. 13 and the transition block 13 of FIG. The length of the transition block 14 is 64 sample points. It has been proved that as long as the linear transformation of the original window does not change the complete reconstruction characteristics of the transformation, we can also arbitrarily analyze the analysis window function or the synthesis window according to the signal type. The linear transformation of the transition segment referred to herein or the arbitrary linear transformation of the window function may be directed to the analysis window or the synthesis window employed in the coding and decoding methods in all of the above embodiments. In step 27, the quantization and entropy coding includes two steps of nonlinear quantization and entropy coding, wherein the quantization process may employ scalar quantization or vector quantization. The scalar quantization method can employ a nonlinear scalar used by MPEG. AAC, which can use vector quantization of MPEG TwinVQ. The quantization process may also employ an audio coding method based on minimizing global noise masking ratio criteria and entropy coding (patent application number 03146213. 8). After the quantization process, the entropy coding technique is used to further remove the quantized coefficients and the statistical redundancy of the side information, and finally the compressed audio code stream is obtained.
如图 15所示是本发明控制前回声的编码装置实施例 2的结构框图。本实 施例在本发明的控制前回声的编码装置的实施例 1的基础上, 增加了一个子 带分析模块 307, 该模块与信号类型分析模块 301相连接, 用于对所述输入 音频信号进行子带分析。 所述子带分析可以对每个频率段按照突变强度的不 同和信号性质的不同而对窗函数做不同的修正, 例如一般低频段不产生前回 声现象, 那么就可以不对低频段的窗函数做修正, 从而更加灵活的控制不同 频率段的窗函数修正。  Fig. 15 is a block diagram showing the structure of a second embodiment of the encoding apparatus for controlling the pre-echo of the present invention. In this embodiment, based on Embodiment 1 of the pre-control echo coding apparatus of the present invention, a sub-band analysis module 307 is added, which is connected to the signal type analysis module 301 for performing the input audio signal. Subband analysis. The subband analysis can modify the window function differently according to the difference of the intensity of the mutation and the nature of the signal for each frequency segment. For example, the general low frequency band does not generate the pre-echo phenomenon, then the window function of the low frequency band can be omitted. Corrected, thus more flexible control of window function correction for different frequency segments.
针对于上述编码装置,本发明公开了控制前回声的编码方法实施例 2 ,如 图 16所示是本发明控制前回声的编码方法实施例 2的流程图, 步驟如下: 步 骤 40、 子带分析模块 307对输入音频信号帧作子带分析, 子带分析是按照频 率进行分段的, 将音频信号帧分为多路子带音频信号; 步骤 41、 信号类型分 析模块 301分别判断多路子带音频信号帧的信号类型是否为快变类型信号, 是则执行步骤 42, 否则执行步骤 45; 步骤 42、 所述信号类型分析模块 301 分别对多路子带音频信号帧计算快变点位置的参数及所述多路子带音频信号 帧的突变强度参数,并分别对多路子带音频信号帧的突变强度参数进行量化, 得到多路子带音频信号帧的突变强度的量化值;步骤 43、修正窗函数模块 302 分别对分析窗函数的所述快变点位置以后的函数值进行等比例的缩小, 缩小 的值等于所述多路子带音频信号帧的突变强度的量化值, 得到 ' 正后的分析 窗函数; 步骤 44、 所述修正窗函数模块 302用修正后的分析窗函数分别对所 述多路子带音频信号帧进行加窗, 得到加窗后的多路子带时域信号, 并执行 步骤 46; 步骤 45、 所述修正窗函数模块 302用原始的分析窗函数对所述多路 子带音频信号帧进行加窗,得到加窗后的时域信号,并执行步骤 46; 步骤 46、 时频映射模块 304对所述加窗后的多路子带时域信号进行时频映射处理, 得 到频域系数; 步驟 47、 量化和熵编码模块 305将多路子带频域系数整合; 步 驟 48、 量化和嫡编码模块 305根据心理声学模块 303对音频信号帧进行心理 声学处理而得到的尺度因子带的掩蔽阈值参数, 对所述频域系数进行量化和 熵编码, 得到编码后的音频码流; 步骤 49、 码流复用模块 306将所述编码后 的音频码流和所述信号类型分析的结果进行复用, 得到压缩音频码流。 For the above-mentioned encoding device, the present invention discloses a second embodiment of the encoding method for controlling the pre-echo. FIG. 16 is a flowchart of Embodiment 2 of the encoding method for controlling the pre-echo according to the present invention. The steps are as follows: Step 40: Subband analysis The module 307 performs subband analysis on the input audio signal frame, and the subband analysis is segmented according to the frequency, and the audio signal frame is divided into multiple subband audio signals; Step 41, the signal type analysis module 301 respectively determines the multiple subband audio signals. Whether the signal type of the frame is a fast change type signal, if yes, step 42 is performed; otherwise, step 45 is performed; step 42; the signal type analyzing module 301 calculates parameters of the fast change point position and the described manner for the multiple subband audio signal frames respectively. The multi-path subband has a mutation intensity parameter of the audio signal frame, and respectively quantizes the mutation intensity parameter of the multi-channel sub-band audio signal frame to obtain a quantized value of the mutation strength of the multi-channel sub-band audio signal frame; step 43, the correction window function module 3 02 And respectively reducing the function value after the fast change point position of the analysis window function by an equal ratio, and the reduced value is equal to the Mutant intensity values quantized audio signal frame with the road, to give 'analysis window function after positive; step 44, the correction window function module 302 using the modified analysis window function on each of the multi-way audio band signal frame Windowing, obtaining a multi-path sub-band time domain signal after windowing, and performing step 46; Step 45, the correction window function module 30 2 windowing the multi-channel sub-band audio signal frame with an original analysis window function, Obtaining the windowed time domain signal, and performing step 46; Step 46, The time-frequency mapping module 304 performs time-frequency mapping processing on the windowed multi-band sub-band time domain signal to obtain frequency domain coefficients. Step 47, the quantization and entropy coding module 305 integrates the multi-path sub-band frequency domain coefficients; The quantization and chirp encoding module 305 quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module 303, to obtain an encoded audio code stream; Step 49: The code stream multiplexing module 306 multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
在以上的解码方法的步骤 41中,在所述信号类型分析模块 301对多路子 带音频信号帧的信号类型进行判断的同时, 心理声学模块 303对输入音频信 号帧进行心理声学处理, 得到尺度因子带的掩蔽阈值参数。 所述心理声学处 理是根据人耳听觉特性计算当前帧信号的掩蔽曲线, 根据掩蔽曲线可以计算 特定时频区域的掩蔽阔值, 用于指导对当前音频帧信号的量化, 这里的心理 声学模型可以是 MPEG AAC所用的第一类或第二类心理声学模型。  In step 41 of the above decoding method, while the signal type analysis module 301 determines the signal type of the multiplex subband audio signal frame, the psychoacoustic module 303 performs psychoacoustic processing on the input audio signal frame to obtain a scale factor. Masked threshold parameter with band. The psychoacoustic processing is a masking curve for calculating a current frame signal according to a human ear hearing characteristic, and a masking threshold value of a specific time-frequency region can be calculated according to the masking curve for guiding quantization of a current audio frame signal, where the psychoacoustic model can be Is the first or second type of psychoacoustic model used by MPEG AAC.
本发明的编码方法实施例 2的优点是可以对每个频率段按照突变强度的 不同和信号性质的不同而对窗函数做不同的修正, 如一般低频段不产生前回 声现象, 那么就可以不对低频段的窗函数做修正, 从而更加灵活的控制不同 频率段的窗函数修正。  The second embodiment of the encoding method of the present invention has the advantage that the window function can be modified differently according to the difference of the mutation strength and the signal property of each frequency segment. For example, if the low frequency band does not generate the pre-echo phenomenon, then the fault may be incorrect. The window function of the low frequency band is modified to more flexibly control the window function correction of different frequency segments.
如图 17所示是本发明控制前回声的解码装置实施例 1的结构框图。该装 置由以下功能模块组成: 码流解复用模块 401 , 用于对压缩音频码流进行解 复用; 逆量化和熵解码模块 402 , 与所述码流解复用模块 401连接, 用于对 所述解复用后的音频码流进行解码和逆量化, 并输出逆量化后的频域系数; 频时映射模块 403 , 与所述逆量化和嫡解码模块 402连接, 用于将所述逆量 化后的频域系数变换为时域信号, 频时映射模块 403由滤波器组构成, 所述 滤波器组为与编码装置相对应的逆变换滤波器组; 修正窗函数模块 404 , 与 所述频时映射模块 403连接, 用于修正综合窗函数和对所述时域信号进行加 窗处理。 FIG. 17 is a structural block diagram of Embodiment 1 of a decoding apparatus for controlling pre-echo of the present invention. The device is composed of the following functional modules: a code stream demultiplexing module 401 for demultiplexing the compressed audio code stream; an inverse quantization and entropy decoding module 40 2 , connected to the code stream demultiplexing module 401, Decoding and dequantizing the demultiplexed audio code stream, and outputting the inverse quantized frequency domain coefficients; a frequency time mapping module 403, coupled to the inverse quantization and decoding module 402, for The inverse-quantized frequency domain coefficients are transformed into a time domain signal, and the frequency-time mapping module 403 is composed of a filter bank, the filter bank is an inverse transform filter bank corresponding to the encoding device; the modified window function module 404, and The frequency time mapping module 403 is connected to modify the integrated window function and perform windowing processing on the time domain signal.
如图 18所示是本发明控制前回声的解码方法实施例 1的流程图, 步骤如 下: 步骤 31、 码流解复用模块对输入的压缩音频码流进行解复用, 得到解复 用后的音频码流和边信息; 步骤 32、 逆量化和熵解码模块对所述解复用后的 音频码流进行逆量化和熵解码, 得到逆量化后的频域系数; 步骤 33、 频时映 射模块对所述逆量化后的频域系数进行频时映射处理, 得到时域信号; 步骤FIG. 18 is a flowchart of Embodiment 1 of a method for decoding a pre-control echo according to the present invention. The steps are as follows: Step 31: A code stream demultiplexing module demultiplexes an input compressed audio code stream to obtain a demultiplexed Audio code stream and side information; Step 32: The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inverse quantized frequency domain coefficients; Step 33, frequency time mapping The radio module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal;
34、 修正窗函数模块根据所述解复用后的边信息判断所述音频信号帧的信号 类型是否为快变类型, 是则执行步骤 35 , 否则执行步骤 37; 步骤 35、 修正 窗函数模块对综合窗函数的所述快变点位置以后的函数值进行等比例的放 大, 放大的值等于所述突变强度的量化值, 得到修正后的综合窗函数; 步骤 36、 所述修正窗函数模块用修正后的综合窗函数对所述时域信号进行加窗, 得到重构后的音频信号; 步樣 37、 所述修正窗函数模块用原始的综合窗函数 对所述时域信号进行加窗, 得到重构后的音频信号。 34. The correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, go to step 35, otherwise go to step 37; Step 35, modify the window function module pair The function value after the fast change point position of the integrated window function is amplified in equal proportion, and the amplified value is equal to the quantized value of the mutation intensity, and the modified integrated window function is obtained; Step 36, the modified window function module is used The modified integrated window function windowes the time domain signal to obtain a reconstructed audio signal; and the modified window function module adds a window to the time domain signal by using an original integrated window function. The reconstructed audio signal is obtained.
在步驟 33中,对频域系数进行频时映射以及修正窗函数处理的方法与编 码方法中的时频映射以及修正窗函数处理方法相对应 , 都是根据所述压缩音 频码流中的编码控制信息来选择对应的逆映射以及窗函数, 频时映射处理可 采用逆离散余弦变换(IDCT ) 、 逆离散傅立叶变换 、 逆修正离散余 弦变换(IMDCT )等方法来实现。  In step 33, the method of performing frequency-frequency mapping and correcting window function processing on the frequency domain coefficients corresponds to the time-frequency mapping and the modified window function processing method in the encoding method, and is based on encoding control in the compressed audio code stream. Information is used to select the corresponding inverse mapping and window function. The frequency-time mapping processing can be implemented by inverse discrete cosine transform (IDCT), inverse discrete Fourier transform, inverse modified discrete cosine transform (IMDCT).
下面以逆 4务正离散余弦变换 IMDCT为例说明频时映射过程。 由于频时映 射和加窗处理是不可分割的, 所以此处对频时映射和修正窗函数一起考虑。  The following is an example of the inverse time cosine transform IMDCT to illustrate the frequency time mapping process. Since the frequency-time mapping and windowing processing are indivisible, the frequency-time mapping and the correction window function are considered together here.
首先对逆量化语进行 IMDCT变换, 得到变换后的时域信号 x' 。 IMDCT变 First, IMDCT transform is performed on the inverse quantized word to obtain the transformed time domain signal x '. IMDCT change
X. + nQ)(k 4-丄) ) X. + n Q )(k 4-丄) )
换的表达式为: '
Figure imgf000018_0001
2入其中, η表示样本序号, 且 0≤« < N , Ν表示窗的长度, "o= (N/2+l) / 2; i表示帧序号; k表示语序号。
The expression changed is: '
Figure imgf000018_0001
2, where η represents the sample number, and 0 ≤ « < N , Ν represents the length of the window, "o = (N / 2+l) / 2; i represents the frame number; k represents the language number.
其次, 对 DCT变换获得的时域信号在时域进行加窗处理。 为满足完全 重构条件, 窗函数 w (n)必须满足以下两个条件: w(2M- = 且 w2 (ji) + w2 (n + M) = l 最后, 对上述加窗时域信号进行叠加处理, 得到时域音频信号。 具体是: 将加窗操作后获得的信号的前 Ν/2个样本和前一帧信号的后 Ν/2个样本重叠 相加, 获得 /2个输出的时域音频样本, ^ timeS隱 i,n = preSami,n + preSam一 η , 其中 i表示帧序号, n表示样本序号, 有 2 , 且 Ν为冒的长度。 Secondly, the time domain signal obtained by the DCT transform is windowed in the time domain. To satisfy the full reconstruction condition, the window function w (n) must satisfy the following two conditions: w( 2 M- = and w 2 (ji) + w 2 (n + M) = l Finally, the above windowed time domain The signal is superimposed to obtain a time domain audio signal. Specifically, the first Ν/2 samples of the signal obtained after the windowing operation are overlapped with the Ν/2 samples of the previous frame signal to obtain /2 outputs. Time domain audio samples, ^ timeS 隐 i, n = p re Sa mi , n + preSam η , where i represents the frame number, n represents the sample number, there are 2 , and the length is 冒.
如图 9和 11所示, 所述的综合窗函数修正在前面已经说明, 原始综合窗 函数所述的修正处理是: 对快变点以后的窗函数值进行等比例的放大, 放大 的值等于量化后突变强度的大小,修正后的综合窗函数如图 11所示。对与编 码时采用的分析窗相对应的综合窗作上述修正后仍可满足完全重构条件, 在 前面已经证明。 As shown in Figures 9 and 11, the integrated window function correction has been described above. The correction processing described in the original integrated window function is: scaling the window function value after the fast change point, the amplified value is equal to After quantifying the magnitude of the mutation intensity, the modified comprehensive window function is shown in Fig. 11. Pair and edit The comprehensive window corresponding to the analysis window used in the code can still satisfy the complete reconstruction condition after the above correction, which has been proved in the foregoing.
由于在原始窗的快变点处做突然变化会产生高频信息, 这会降低编码效 率, 因此在步骤 35中, 对应于编码时的修正处理, 在所述对综合窗函数的所 述快变点位置以后的函数值进行等比例的放大中, 可在快变点附近加过渡块 使快变点附近的分析窗函数值緩慢变化。 加过渡块的窗函数方法为: 假设窗 函数为《其中 0≤"≤2Μ - 1 , 量化后的突变强度为 快变点位置是 L。 如果 不加过渡块, 则修正后的窗函数 为:
Figure imgf000019_0001
Since the abrupt change at the fast change point of the original window generates high frequency information, which reduces the coding efficiency, in step 35, the fast change in the pair of integrated window functions corresponds to the correction process at the time of encoding. In the equal scaling of the function values after the point position, a transition block can be added near the fast change point to slowly change the value of the analysis window function near the fast change point. The window function method of adding the transition block is: Assume that the window function is "where 0 ≤" ≤ 2 Μ - 1 , and the quantized mutation intensity is the fast change point position is L. If no transition block is added, the corrected window function is :
Figure imgf000019_0001
如果加过渡块, 并且过渡块的长度为快变点位置的前 /个样本点和后 - 1 个样本点, 则修正后的窗函数 为:  If a transition block is added, and the length of the transition block is the first / sample point and the last - 1 sample point of the fast change point position, the corrected window function is:
w(n) 0≤n < L -I  w(n) 0≤n < L -I
w' n) = w(n) /g(n) L -I≤n≤L + I-l  w' n) = w(n) /g(n) L -I≤n≤L + I-l
w(n)/s L + I≤n≤2M -l  w(n)/s L + I≤n≤2M -l
(n - L + l)  (n - L + l)
其中 为线性函数: ~" 21 ^。如图 14所示是本发明控制前 回声的解码方法的加过渡块的修正后的综合窗函数的示意图, 图 14中的过渡 块 14长度均为 64个样本点。 前已证明, 只要是对原始窗做线性变换, 不会 改变变换的完全重构特性, 所以我们还可以根据信号类型对分析窗函数或综 合窗做任意的线性变换。 这里所指的加过渡段的窗函数或对窗函数局作任意 线形变换都可以针对以上所有实施例中的编码解码方法中采用的分析窗或综 合窗。  Where is a linear function: ~" 21 ^. As shown in Fig. 14, is a schematic diagram of the modified integrated window function of the transition block of the pre-control echo decoding method of the present invention, and the transition block 14 in Fig. 14 has a length of 64 Sample points. It has been proved that as long as the linear transformation of the original window does not change the complete reconstruction characteristics of the transformation, we can also perform arbitrary linear transformation on the analysis window function or the synthesis window according to the signal type. The window function of the transition section or the arbitrary linear transformation of the window function may be directed to the analysis window or the synthesis window used in the coding and decoding methods in all of the above embodiments.
如图 19所示是本发明控制前回声的解码装置实施例 1的结构框图。本实 施例在本发明的控制前回声的解码装置的实施例 1的基础上, 增加了一个子 带合成模块 405 , 该模块与修正窗函数模块 404相连接, 用于将多路重构后 的子带时域信号进行子带合成。  Fig. 19 is a block diagram showing the configuration of the first embodiment of the decoding apparatus for controlling the pre-echo of the present invention. In this embodiment, based on Embodiment 1 of the pre-control echo decoding apparatus of the present invention, a sub-band synthesis module 405 is added, which is connected to the correction window function module 404 for multiplexing the multi-channel reconstruction. Subband time domain signals are used for subband synthesis.
针对于上述解码装置,本发明公开了控制前回声的解码方法实施例 2 ,参 见图 20 , 图 20是本发明控制前回声的解码方法实施例 2的流程图, 步驟如 下: 步骤 51、 码流解复用模块对输入的压缩音频码流进行解复用, 得到解复 用后的音频码流和边信息; 步骤 52、 逆量化和熵解码模块对解复用后的音频 码流按频率分为多路; 步骤 53、 逆量化和熵解码模块分别对所述多路解复用 后的音频码流进行逆量化和熵解码, 得到逆量化后的频域系数; 步骤 53、 频 时映射模块分别对多路所述逆量化后的频域系数进行频时映射处理, 得到多 路时域信号; 步骤 54、 修正窗函数模块根据所述解复用后的边信息分别判断 所述多路音频信号帧的信号类型是否为快变类型, 是则执行步骤 55 , 否则执 行步骤 57; 步骤 55、修正窗函数模块分别对多路的综合窗函数的所述快变点 位置以后的函数值进行等比例的放大,放大的值等于所述突变强度的量化值, 得到修正后的综合窗函数; 步骤 56、 所述修正窗函数模块用修正后的综合窗 函数分别对所述多路时域信号进行加窗, 得到多路的重构后的音频信号, 然 后执行步驟 58; 步驟 57、 所述修正窗函数模块用原始的综合窗函数分别对所 述多路时域信号进行加窗,得到多路的重构后的音频信号,然后执行步骤 58; 步骤 58、 子带合成模块对多路的重构后的音频信号进行合成。 For the above decoding apparatus, the present invention discloses a second embodiment of the method for decoding the pre-echo. Referring to FIG. 20, FIG. 20 is a flowchart of Embodiment 2 of the method for decoding the pre-echo of the present invention. The steps are as follows: Step 51: Code stream The demultiplexing module demultiplexes the input compressed audio code stream to obtain the demultiplexed audio stream and side information; Step 52, inverse quantization and entropy decoding module for demultiplexed audio The code stream is divided into multiple channels according to the frequency; Step 53: The inverse quantization and entropy decoding modules respectively perform inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients; Step 53 The frequency-time mapping module separately performs frequency-time mapping processing on the inverse-quantized frequency-domain coefficients to obtain a multi-channel time domain signal. Step 54: The correction window function module respectively determines the side information according to the demultiplexed Whether the signal type of the multi-channel audio signal frame is fast-changing type, if yes, step 55 is performed; otherwise, step 57 is performed; step 55, the correction window function module respectively performs the fast-changing point position of the multi-channel integrated window function. The function value is scaled up, the amplified value is equal to the quantized value of the mutation intensity, and the modified integrated window function is obtained; Step 56, the modified window function module uses the modified integrated window function to respectively The time domain signal is windowed to obtain a multiplexed reconstructed audio signal, and then step 58 is performed; Step 57, the modified window function module separately uses the original integrated window function Domain signal windowed multi-channel, audio signal to obtain reconstructed multi channel, then step 58; step 58, the sub-band audio signal synthesizing module reconstructed multi-channel are synthesized.
最后应当说明的是:以上实施例仅用以说明本发明的技术方案而非对其 限制; 尽管参照较佳实施例对本发明进行了详细的说明, 所属领域的普通技 术特征进行等同替换; 而不脱离本发明技术方案的精神, 其均应涵盖在本发 明请求保护的技术方案范围当中。  It should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention and are not intended to be limiting thereof; although the present invention has been described in detail with reference to the preferred embodiments, the technical features of the art are equivalently substituted; All the spirits of the technical solutions of the present invention should be included in the scope of the technical solutions claimed in the present invention.

Claims

权 利 要 求 Rights request
1、 一种控制前回声的编码装置., 其中包括: 1. An encoding device for controlling pre-echo., comprising:
一信号类型分析模块, 用于判断输入音频信号帧的信号类型, 并输出快 变点位置和量化后的突变强度参数;  a signal type analysis module, configured to determine a signal type of the input audio signal frame, and output a fast change point position and a quantized mutation intensity parameter;
一修正窗函数模块, 与所述信号类型分析模块连接, 用于修正分析窗函 数和对所述输入音频信号帧进行加窗处理, 并输出加窗后的时域音频信号; 一时频映射模块, 与所述修正窗函数模块连接, 用于将所述加窗后的时 域音频信号转换为频域系数;  a correction window function module, coupled to the signal type analysis module, configured to modify the analysis window function and window-process the input audio signal frame, and output the windowed time domain audio signal; a time-frequency mapping module, And the correction window function module is configured to convert the windowed time domain audio signal into a frequency domain coefficient;
一心理声学分析模块, 用于对所述输入音频信号帧进行心理声学处理, 并输出尺度因子带的掩蔽阈值参数;  a psychoacoustic analysis module, configured to perform psychoacoustic processing on the input audio signal frame, and output a masking threshold parameter of a scale factor band;
一量化和熵编码模块, 分别与所述时频映射模块和所述心理声学分析模 块连接, 用于根据所述心理声学分析模块输出的所述掩蔽阈值参数, 对所述 时频映射模块输出的频域系数进行量化和熵编码, 并输出编码码流;  a quantization and entropy coding module, which is respectively connected to the time-frequency mapping module and the psychoacoustic analysis module, and configured to output the time-frequency mapping module according to the masking threshold parameter output by the psychoacoustic analysis module The frequency domain coefficients are quantized and entropy encoded, and the encoded code stream is output;
一码流复用模块, 与所述量化和熵编码模块和所述信号类型分析模块相 连接, 用于将所述量化和熵编码模块输出的所述编码码流和信号类型分析模 块输出的结果进行复用, 并形成音频编码码流。  a code stream multiplexing module, coupled to the quantization and entropy coding module and the signal type analysis module, configured to output the coded code stream and the signal type analysis module output by the quantization and entropy coding module Multiplexing is performed and an audio coded stream is formed.
2、 根据权利要求 1所述的控制前回声的编码装置, 其中所述的信号类 型分析模块包括:  2. The apparatus for controlling pre-echo according to claim 1, wherein said signal type analysis module comprises:
一信号类型分析器, 与所述修正窗函数模块和码流复用相连接, 用于判 断所述输入音频帧信号是緩变类型信号还是快变类型信号;  a signal type analyzer, coupled to the correction window function module and the code stream multiplexing, configured to determine whether the input audio frame signal is a slow-change type signal or a fast-change type signal;
一计算快变点位置器, 与所述信号类型分析器、 修正窗函数模块和码流 复用模块相连接, 用于计算快变点的位置; a fast change point positioner, the signal type analyzer, the correction window function module, and the code stream The multiplexing module is connected to calculate the position of the fast change point;
一突变强度计算器, 与所述信号类型分析器和码流复用模块相连接, 用 于计算信号的突变强度;  a mutation strength calculator coupled to the signal type analyzer and the code stream multiplexing module for calculating a mutation intensity of the signal;
一突变强度量化器, 与所述突变强度计算器、 修正窗函数模块和码流复 用模块相连接, 用于对所述计算信号的突变强度进行量化。  A mutation intensity quantizer is coupled to the mutation intensity calculator, the correction window function module, and the code stream multiplexing module for quantizing the intensity of the mutation of the calculated signal.
3、 根据权利要求 1所述的控制前回声的编码装置, 其中所述时频映射 模块由滤波器组构成。  The apparatus for controlling pre-echo in accordance with claim 1, wherein said time-frequency mapping module is constituted by a filter bank.
4、 根据权利要求 3所述的控制前回声的编码装置, 其中所述滤波器组 为离散傅立叶变换滤波器组或离散余弦变换滤波器组; 分析窗函数中的窗长 与帧长相等。  4. The apparatus for controlling pre-echo in accordance with claim 3, wherein said filter bank is a discrete Fourier transform filter bank or a discrete cosine transform filter bank; and a window length in the analysis window function is equal to a frame length.
5、 根据权利要求 4所述的控制前回声的编码装置, 其中所述分析窗函 数为汉宁窗、 汉明窗或布莱克曼窗。  The apparatus for controlling pre-echo according to claim 4, wherein the analysis window function is a Hanning window, a Hamming window or a Blackman window.
6、 根据权利要求 3所述的控制前回声的编码装置, 其中所述滤波器组 为修正离散余弦变换滤波器组; 分析窗函数中的窗长为帧长的两倍。  6. The apparatus for controlling pre-echo in accordance with claim 3, wherein said filter bank is a modified discrete cosine transform filter bank; and the window length in the analysis window function is twice the frame length.
7、 根据权利要求 6所述的控制前回声的编码装置, 其中所述分析窗函 数为符合修正离散余弦变换条件的窗函数。  7. The apparatus for controlling pre-echo in accordance with claim 6, wherein said analysis window function is a window function conforming to a condition of a modified discrete cosine transform.
8、 根据权利要求 1所述的控制前回声的编码装置, 其中所述信号类型 分析模块还连接有一个子带分析模块, 用于对所述输入音频信号帧进行子带 分析。  8. The apparatus for controlling pre-echo in accordance with claim 1, wherein said signal type analysis module is further coupled to a subband analysis module for performing subband analysis on said input audio signal frame.
9、 一种控制前回声的编码方法, 其中包括以下步骤:  9. A coding method for controlling pre-echo, comprising the following steps:
步驟 1、 信号类型分析模块判断输入音频信号帧的信号类型是否为快变 类型信号, 是则所述信号类型分析模块计算快变点位置的参数及所述音频信 号帧的突变强度参数, 并将该突变强度参数进行量化, 得到突变强度的量化 值, 然后执行步驟 2; 否则所述修正窗函数模块用原始的分析窗函数对所述 音频信号帧进行加窗, 得到加窗后的时域信号, 然后执行步骤 4; Step 1. The signal type analysis module determines whether the signal type of the input audio signal frame is a fast change type signal, and the signal type analysis module calculates a parameter of the fast change point position and the audio signal. The mutation intensity parameter of the frame, and quantizing the mutation intensity parameter to obtain a quantized value of the mutation intensity, and then performing step 2 ; otherwise, the correction window function module windowing the audio signal frame with the original analysis window function , get the time domain signal after windowing, and then perform step 4;
步骤 2、 修正窗函数模块对分析窗函数进行线性变换, 得到修正后的分 析窗函数; Step 2 : The correction window function module linearly transforms the analysis window function to obtain a modified analysis window function;
步骤 3、 所述修正窗函数模块用修正后的分析窗函数对所述音频信号帧 进行加窗, 得到加窗后的时域信号;  Step 3: The correction window function module adds a window to the audio signal frame by using a modified analysis window function to obtain a time domain signal after windowing;
步骤 4、 时频映射模块对所述加窗后的时域信号进行时频映射处理, 得 到频域系数;  Step 4: The time-frequency mapping module performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
步骤 5、 量化和熵编码模块根据心理声学模块对音频信号帧进行心理声 学处理而得到的尺度因子带的掩蔽阈值参数, 对所述频域系数进行量化和熵 编码, 得到编码后的音频码流;  Step 5: The quantization and entropy coding module quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module, to obtain the encoded audio code stream. ;
步骤 6、 码流复用模块将所述编码后的音频码流和所述信号类型分析的 结果进行复用, 得到压缩音频码流。  Step 6. The code stream multiplexing module multiplexes the encoded audio code stream and the signal type analysis result to obtain a compressed audio code stream.
10、 根据权利要求 9所述的控制前回声的编码方法, 其中所述步骤 1中 信号类型分析模块判断输入音频信号帧的信号类型是否为快变类型信号具体 为, 信号类型分析模块对所述音频信号帧进行基于自适应阈值和波形预测的 前、 后向掩蔽效应分析, 以判断所述音频信号帧的信号类型。  The method for encoding a pre-echo control according to claim 9, wherein the signal type analysis module in the step 1 determines whether the signal type of the input audio signal frame is a fast-change type signal, and the signal type analysis module The audio signal frame performs front and back masking effect analysis based on adaptive threshold and waveform prediction to determine the signal type of the audio signal frame.
11、 才 据权利要求 9所述的控制前回声的编码方法,其中所述步骤 2中, 在所述对分析窗函数的所述快变点位置以后的函数值进行等比例的缩小中, 可在快变点附近加过渡块使快变点附近的分析窗函数值緩慢变化。  11. The method for encoding a pre-echo control according to claim 9, wherein in the step 2, the function value after the fast change point position of the analysis window function is scaled down in an equal manner. Adding a transition block near the fast change point causes the value of the analysis window function near the fast change point to slowly change.
12、 才艮据权利要求 9所述的控制前回声的编码方法, 其中所述步骤 2中 所述线性变换为对所述分析窗函数的快变点位置以后的函数值进行等比例的 缩小, 缩小的值等于所述突变强度的量化值。 12. The method of encoding a pre-echo control according to claim 9, wherein in step 2 The linear transformation is to scale down the function value after the fast change point position of the analysis window function, and the reduced value is equal to the quantized value of the mutation intensity.
13、 根据权利要求 9所述的控制前回声的编码方法,其中所述步骤 4中, 所述时频映射处理为离散傅立叶变换或离散余弦变换, 其中, 所述分析窗函 数的窗长与所述音频信号帧的长度相等。  The method for encoding a pre-echo control according to claim 9, wherein in the step 4, the time-frequency mapping process is a discrete Fourier transform or a discrete cosine transform, wherein a window length of the analysis window function is The audio signal frames are of equal length.
14、 根据权利要求 13所述的控制前回声的编码方法, 其中所述分析窗 函数为汉宁窗、 汉明窗或布莱克曼窗。  14. The encoding pre-echo encoding method according to claim 13, wherein the analysis window function is a Hanning window, a Hamming window or a Blackman window.
15、 根据权利要求 9所述的控制前回声的编码方法,其中所述步骤 4中, 所述时频映射处理为修正离散余弦变换, 其中, 所述分析窗函数的窗长为所 述音频信号帧的长度的两倍。  The method for encoding a pre-echo control according to claim 9, wherein in the step 4, the time-frequency mapping process is a modified discrete cosine transform, wherein a window length of the analysis window function is the audio signal. The length of the frame is twice.
16、 根据权利要求 15所述的控制前回声的编码方法, 其中所述分析窗 函数为符合修正离散余弦变换条件的窗函数。  The pre-control echo encoding method according to claim 15, wherein the analysis window function is a window function conforming to a modified discrete cosine transform condition.
17、 根据权利要求 9所述的控制前回声的编码方法,其中在步骤 1之前, 首先子带分析模块对所述音频信号帧做子带分析, 得到多路子带音频信号, 并在步骤 4之后将多路子带频域系数进行整合。  17. The method of encoding pre-echo control according to claim 9, wherein prior to step 1, the sub-band analysis module first performs sub-band analysis on the audio signal frame to obtain a multi-channel sub-band audio signal, and after step 4 Integrate multiple subbands with frequency domain coefficients.
18、 根据权利要求 17所述的控制前回声的编码方法, 其中所述子带分 析模块对所述音频信号帧做子带分析为: 子带分析模块以频率分段的方式对 所述音频信号帧作子带分析。  The method for encoding a pre-echo control according to claim 17, wherein the sub-band analysis module performs sub-band analysis on the audio signal frame as: the sub-band analysis module pairs the audio signal in a frequency segmentation manner. The frame is subband analysis.
19、 根据权利要求 9所述的控制前回声的编码方法, 其中所述步骤 5中 心理声学模块对音频信号帧进行心理声学处理, 为心理声学模块采用运动图 象专家组高级音频编码所用的第一类或第二类模型对所述音频信号帧进行心 理声学处理。 19. The method of encoding pre-echo in accordance with claim 9, wherein the psychoacoustic module performs psychoacoustic processing on the audio signal frame in step 5, and uses the advanced audio coding of the motion image expert group for the psychoacoustic module. A class or type of model performs psychoacoustic processing on the audio signal frame.
20、 根据权利要求 9所述的控制前回声的编码方法, 其中所述步骤 5中 量化和熵编码模块对所述频域系数进行量化为, 量化和熵编码模块采用标量 量化法或矢量量化法对所述频域系数进行量化。 The method for encoding a pre-echo control according to claim 9, wherein the quantization and entropy coding module quantizes the frequency domain coefficients in the step 5, and the quantization and entropy coding module uses a scalar quantization method or a vector quantization method. The frequency domain coefficients are quantized.
21、 根据权利要求 20所述的控制前回声的编码方法, 其中所述标量量 化法为运动图象专家组高级音频编码所采用的非线性标量法。  A method of encoding a pre-echo control according to claim 20, wherein said scalar quantization method is a nonlinear scalar method employed by the advanced audio coding of the Moving Picture Experts Group.
22、 根据权利要求 20所述的控制前回声的编码方法, 其中所述矢量量 化法为运动图象专家组双矢量量化所采用的矢量量化法。  The method of encoding a pre-echo control according to claim 20, wherein said vector quantization method is a vector quantization method used for motion vector expert group double vector quantization.
23、 根据权利要求 9所述的控制前回声的编码方法, 其中所述步骤 2中 的分析窗函数采用固定长度的窗函数, 长度为大于 1的整数, 优选 2的 N次 方, 其中 N为自然数。  The method for encoding a pre-echo control according to claim 9, wherein the analysis window function in the step 2 adopts a fixed-length window function, and the length is an integer greater than 1, preferably 2 to the power of N, where N is Natural number.
24、 一种控制前回声的解码装置, 其中包括:  24. A decoding device for controlling pre-echo, comprising:
一码流解复用模块, 用于对压缩音频码流进行解复用;  a code stream demultiplexing module, configured to demultiplex the compressed audio code stream;
一逆量化和熵解码模块, 与所述码流解复用模块连接, 用于对所述解复 用后的音频码流进行解码和逆量化, 并输出逆量化后的频域系数;  An inverse quantization and entropy decoding module is connected to the code stream demultiplexing module, configured to decode and inverse quantize the demultiplexed audio code stream, and output inverse quantized frequency domain coefficients;
一频时映射模块, 与所述逆量化和熵解码模块连接, 用于将所述逆量化 后的频域系数变换为时域信号;  a frequency time mapping module, coupled to the inverse quantization and entropy decoding module, configured to transform the inverse quantized frequency domain coefficients into a time domain signal;
一修正窗函数模块, 与所述频时映射模块连接, 用于修正综合窗函数和 对所述时域信号进行加窗处理。  A correction window function module is coupled to the frequency time mapping module for modifying the integrated window function and windowing the time domain signal.
25、 根据权利要求 24所述的控制前回声的解码装置,其中所述修正窗函 数模块还连接有子带合成模块, 用于将多路重构后的子带时域信号进行子带 合成。  The pre-echo decoding apparatus according to claim 24, wherein the correction window function module is further connected with a sub-band synthesis module for sub-band synthesis of the multiplexed sub-band time domain signals.
26、 根据权利要求 24所述的控制前回声的解码装置, 其中所述频时 映射模块由滤波器组构成。 26. The apparatus for controlling pre-echo decoding according to claim 24, wherein said frequency time The mapping module consists of a filter bank.
27、 根据权利要求 26所述的控制前回声的解码装置, 其中所述滤波器 组为与编码装置相对应的逆变换滤波器组。 27. The apparatus for controlling pre-echo decoding according to claim 26 , wherein said filter bank is an inverse transform filter bank corresponding to an encoding device.
28、 一种控制前回声的解码方法, 其中包括以下步骤:  28. A decoding method for controlling pre-echo, comprising the following steps:
步骤 1、 码流解复用模块对输入的压缩音频码流进行解复用, 得到解复 用后的音频码流和边信息;  Step 1: The code stream demultiplexing module demultiplexes the input compressed audio code stream, and obtains the demultiplexed audio code stream and side information;
步驟 2 , 逆量化和熵解码模块对所述解复用后的音频码流进行逆量化和 熵解码, 得到逆量化后的频域系数;  Step 2: The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients;
步骤 3、 频时映射模块对所述逆量化后的频域系数进行频时映射处理, 得到时域信号;  Step 3: The frequency time mapping module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal.
步骤 4、 修正窗函数模块根据所述解复用后的边信息判断所述音频信号 帧的信号类型是否为快变类型, 是则执行步骤 5; 否则执行步驟 6;  Step 4: The correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, step 5 is performed; otherwise, step 6 is performed;
步骤 5、 修正窗函数模块对综合窗函数进行线性变换, 得到修正后的综 合窗函数, 然后用修正后的综合窗函数对所述时域信号进行加窗, 得到重构 后的音频信号;  Step 5: The modified window function module linearly transforms the integrated window function to obtain a modified integrated window function, and then windowed the time domain signal with the modified integrated window function to obtain the reconstructed audio signal;
步骤 6、 所述修正窗函数模块用原始的综合窗函数对所述时域信号进行 加窗, 得到重构后的音频信号。  Step 6. The correction window function module uses the original integrated window function to window the time domain signal to obtain a reconstructed audio signal.
29、 ^^据权利要求 28所述的控制前回声的解码方法, 其中所述步骤 5 中,在所述对综合窗函数的所述快变点位置以后的函数值进行等比例的放大, 可在快变点附近加过渡块使快变点附近的综合窗函数緩慢变化。  The method for decoding a pre-echo control according to claim 28, wherein in the step 5, the function value after the fast change point position of the integrated window function is enlarged in proportion, Adding a transition block near the fast change point causes the integrated window function near the fast change point to slowly change.
30、 根据权利要求 28所述的控制前回声的解码方法, 其中所述步骤 5 中所述线性变换为对所述综合窗函数的快变点位置以后的函数值进行等比例 的放大, 放大的值等于所述突变强度的量化值。 30. The method of decoding pre-echo in accordance with claim 28, wherein said linear transformation in said step 5 is equal to a function value after said fast change point position of said integrated window function Amplification, the value of the amplification is equal to the quantified value of the intensity of the mutation.
31、 根据权利要求 28所述的控制前回声的解码方法, 其中所述步骤 2 之前, 所述逆量化和熵解码模块对所述解复用后的音频码流按频率分段, 并 在所述步骤 5之后对重构后的音频信号进行子带合成。  The method for decoding a pre-echo control according to claim 28, wherein the inverse quantization and entropy decoding module segments the demultiplexed audio code stream by frequency before the step 2 Sub-band synthesis is performed on the reconstructed audio signal after step 5.
32、 根据权利要求 28所述的解码方法,其中所述步驟 4和 5中的综合窗 函数是根据所述压缩音频码流中的编码控制信息选择相应的窗函数。  The decoding method according to claim 28, wherein said integrated window function in said steps 4 and 5 is to select a corresponding window function based on the encoding control information in said compressed audio code stream.
33、 根据权利要求 28所述的解码方法, 其中所述步骤 3中的频时映射 是根据所述压缩音频码流中的编码控制信息选择相应的逆映射。  33. The decoding method according to claim 28, wherein the frequency time mapping in the step 3 is to select a corresponding inverse mapping according to the encoding control information in the compressed audio code stream.
PCT/CN2005/001435 2005-09-08 2005-09-08 Encoder and decoder for pre-echo control and method thereof WO2007028280A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN200580051158.0A CN101228574A (en) 2005-09-08 2005-09-08 Encoding and decoding device for controlling pre-echo and method thereof
PCT/CN2005/001435 WO2007028280A1 (en) 2005-09-08 2005-09-08 Encoder and decoder for pre-echo control and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2005/001435 WO2007028280A1 (en) 2005-09-08 2005-09-08 Encoder and decoder for pre-echo control and method thereof

Publications (1)

Publication Number Publication Date
WO2007028280A1 true WO2007028280A1 (en) 2007-03-15

Family

ID=37835353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2005/001435 WO2007028280A1 (en) 2005-09-08 2005-09-08 Encoder and decoder for pre-echo control and method thereof

Country Status (2)

Country Link
CN (1) CN101228574A (en)
WO (1) WO2007028280A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008141579A1 (en) * 2007-05-17 2008-11-27 Spreadtrum Communications (Shanghai) Co., Ltd. An encoding and decoding method for audio transient signal
WO2009092309A1 (en) * 2008-01-16 2009-07-30 Huawei Technologies Co., Ltd. A control method and apparatus for quantizing noise leakage
CN109783767A (en) * 2018-12-21 2019-05-21 电子科技大学 A kind of adaptive selection method that Short Time Fourier Transform window is long
CN114002733A (en) * 2021-10-27 2022-02-01 武汉科技大学 Automatic picking method for first arrival time of micro-seismic wave signal and micro-seismic monitoring device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2481650C2 (en) * 2008-09-17 2013-05-10 Франс Телеком Attenuation of anticipated echo signals in digital sound signal
CN103327201B (en) * 2012-03-20 2016-04-20 联芯科技有限公司 Residual echo removing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5117228A (en) * 1989-10-18 1992-05-26 Victor Company Of Japan, Ltd. System for coding and decoding an orthogonally transformed audio signal
CN1153369A (en) * 1995-10-05 1997-07-02 索尼公司 Coding method and apparatus for using multi channel audio signals
JP2001265392A (en) * 2000-03-17 2001-09-28 Victor Co Of Japan Ltd Voice coding device and its method
JP2003216188A (en) * 2002-01-25 2003-07-30 Matsushita Electric Ind Co Ltd Audio signal encoding method, encoder and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5117228A (en) * 1989-10-18 1992-05-26 Victor Company Of Japan, Ltd. System for coding and decoding an orthogonally transformed audio signal
CN1153369A (en) * 1995-10-05 1997-07-02 索尼公司 Coding method and apparatus for using multi channel audio signals
JP2001265392A (en) * 2000-03-17 2001-09-28 Victor Co Of Japan Ltd Voice coding device and its method
JP2003216188A (en) * 2002-01-25 2003-07-30 Matsushita Electric Ind Co Ltd Audio signal encoding method, encoder and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008141579A1 (en) * 2007-05-17 2008-11-27 Spreadtrum Communications (Shanghai) Co., Ltd. An encoding and decoding method for audio transient signal
WO2009092309A1 (en) * 2008-01-16 2009-07-30 Huawei Technologies Co., Ltd. A control method and apparatus for quantizing noise leakage
CN109783767A (en) * 2018-12-21 2019-05-21 电子科技大学 A kind of adaptive selection method that Short Time Fourier Transform window is long
CN109783767B (en) * 2018-12-21 2023-03-31 电子科技大学 Self-adaptive selection method for short-time Fourier transform window length
CN114002733A (en) * 2021-10-27 2022-02-01 武汉科技大学 Automatic picking method for first arrival time of micro-seismic wave signal and micro-seismic monitoring device
CN114002733B (en) * 2021-10-27 2024-01-23 武汉科技大学 Automatic pickup method for first arrival time of microseismic signal and microseismic monitoring device

Also Published As

Publication number Publication date
CN101228574A (en) 2008-07-23

Similar Documents

Publication Publication Date Title
US11380342B2 (en) Hierarchical decorrelation of multichannel audio
US20200402520A1 (en) Post-processor, pre-processor, audio encoder, audio decoder and related methods for enhancing transient processing
JP5539203B2 (en) Improved transform coding of speech and audio signals
CN101878504B (en) Low-complexity spectral analysis/synthesis using selectable time resolution
KR101425155B1 (en) Audio encoder, audio decoder and related methods for processing multi-channel audio signals using complex prediction
KR101508819B1 (en) Multi-mode audio codec and celp coding adapted therefore
JP5485909B2 (en) Audio signal processing method and apparatus
US9355646B2 (en) Method and apparatus to encode and decode an audio/speech signal
RU2680352C1 (en) Encoding mode determining method and device, the audio signals encoding method and device and the audio signals decoding method and device
US20090204397A1 (en) Linear predictive coding of an audio signal
KR20130133848A (en) Linear prediction based coding scheme using spectral domain noise shaping
WO2012070370A1 (en) Audio encoding device, method and program, and audio decoding device, method and program
KR20180103102A (en) Apparatus and method of MDCT M / S stereo with global ILD improved mid / side decision
WO2007028280A1 (en) Encoder and decoder for pre-echo control and method thereof
EP3069337B1 (en) Method and apparatus for encoding an audio signal
WO2019037714A1 (en) Encoding method and encoding apparatus for stereo signal
JP7279160B2 (en) Perceptual Audio Coding with Adaptive Non-Uniform Time/Frequency Tiling Using Subband Merging and Time Domain Aliasing Reduction
RU2803142C1 (en) Audio upmixing device with possibility of operating in a mode with or without prediction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 200580051158.0

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC - FORM EPO 1205A DATED 27-05-2008

122 Ep: pct application non-entry in european phase

Ref document number: 05783980

Country of ref document: EP

Kind code of ref document: A1