WO2005027094A1 - Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio - Google Patents

Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio Download PDF

Info

Publication number
WO2005027094A1
WO2005027094A1 PCT/CN2003/000790 CN0300790W WO2005027094A1 WO 2005027094 A1 WO2005027094 A1 WO 2005027094A1 CN 0300790 W CN0300790 W CN 0300790W WO 2005027094 A1 WO2005027094 A1 WO 2005027094A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
resolution
time
quantization
frequency
Prior art date
Application number
PCT/CN2003/000790
Other languages
English (en)
French (fr)
Inventor
Xingde Pan
Weimin Ren
Original Assignee
Beijing E-World Technology Co.,Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing E-World Technology Co.,Ltd. filed Critical Beijing E-World Technology Co.,Ltd.
Priority to PCT/CN2003/000790 priority Critical patent/WO2005027094A1/zh
Priority to JP2005508847A priority patent/JP2007506986A/ja
Priority to AU2003264322A priority patent/AU2003264322A1/en
Priority to EP03818611A priority patent/EP1667109A4/en
Priority to US10/572,769 priority patent/US20070067166A1/en
Priority to CNA038270625A priority patent/CN1839426A/zh
Publication of WO2005027094A1 publication Critical patent/WO2005027094A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • G10L19/0216Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation using wavelet decomposition

Definitions

  • the present invention relates to the field of signal processing, and in particular, to a coding method and a device for implementing multi-resolution analysis and vector quantization on audio signals. Background technique
  • an audio coding method includes steps of psychoacoustic model calculation, time-frequency domain mapping, quantization, and encoding.
  • the time-frequency domain mapping refers to mapping an audio input signal from the time domain to the frequency domain or the time-frequency domain.
  • Time-frequency domain mapping also called transformation and filtering, is a basic operation of audio signal coding, which can improve coding efficiency. Through this operation, most of the information contained in the time domain signal can be transformed or concentrated into a subset of the frequency domain or time-frequency domain coefficients.
  • a basic operation of a perceptual audio encoder is to map the input audio signal from the time domain to the frequency domain or the time-frequency domain. The basic idea is: decompose the signal into components on each frequency band; once the input signal is in the frequency domain After being expressed, the psychoacoustic model can be used to remove perceptually irrelevant information; then the components in each frequency band are grouped; finally, the number of bits is reasonably allocated to express each group of frequency parameters.
  • the audio signal exhibits a strong quasi-periodic nature, this process can greatly reduce the data volume and improve the coding efficiency.
  • the commonly used time-frequency domain mapping methods are: discrete Fourier transform DFT method, discrete cosine transform DCT method, mirror filter QMF method, pseudo-mirror filter PQMF method, cosine modulation filter CMF method, modified discrete cosine transform MDCT and discrete wavelet (Packet) transform DW (P) T method, etc., but the above methods either use a transform / filter configuration to compress and express an input signal frame, or use a filter bank or transform compression with a small time domain analysis interval to express Signals that change drastically to eliminate the effect of pre-echo on the decoded signal.
  • the vector quantization technology can be used to improve the coding efficiency.
  • the current audio coding method that uses vector quantization technology in audio coding is the Transform-domain Weigthed Inter leave Vector Quantization (TWINVQ) coding method. After MDCT transformation of the signal, the method uses cross-selection The signal spectrum parameters are used to construct the vector to be quantized, and then the efficient vector quantization is used to significantly improve the encoded audio quality of the lower bit rate.
  • TWINVQ encoding method is a perceptually lossy encoding method.
  • the TWINVQ encoding method needs further improvement.
  • the TWINVQ encoding method The coefficient interleaving method is used at this time, although the consistency of statistics between vectors can be ensured, the phenomenon of signal energy concentration in local time-frequency regions cannot be effectively used, which also limits the further improvement of coding efficiency.
  • the MDCT transform is essentially a filter bank of equal bandwidth, the signal cannot be decomposed according to the aggregation of the signal energy in the time-frequency plane, which limits the efficiency of the TWINVQ coding method.
  • the time-frequency plane needs to be effectively divided so that the signal The distance between the components of the class is as large as possible, and the distance between the classes is as small as possible. This is to solve the problem of multi-resolution filtering of the signal.
  • the vector needs to be reorganized, selected, and quantized based on an effective time-frequency plane division. The coding gain is maximized, which is to solve the problem of multi-resolution vector quantization of a signal.
  • the technical problem to be solved by the present invention is to provide a multi-resolution vector quantization audio coding and decoding method and device, which can adjust the time-frequency resolution for different types of input signals, and effectively use the local agglomeration of the signal in the time-frequency domain. Perform vector quantization to improve coding efficiency.
  • the multi-resolution vector quantized audio encoding method of the present invention includes: adaptively filtering an input audio signal to obtain a time-frequency filter coefficient and outputting a filtered signal; performing vector division on the time-frequency plane of the filtered signal to obtain Vector combination; selecting a vector for vector quantization; performing vector quantization on the selected vector, and calculating a quantization residual; the quantized codebook information is transmitted to the audio decoder as side information of the encoder, and the quantization residual is quantized and encoded.
  • the multi-resolution vector quantization audio decoding method of the present invention includes: demultiplexing from a code stream to obtain side information of multi-resolution vector quantization, obtaining energy of a selected point and position information of vector quantization; using an inverse vector according to the above information Quantize the normalized vector, calculate the normalization factor, and reconstruct the quantized vector of the original time-frequency plane; add the reconstructed vector to the residual of the corresponding time-frequency coefficient according to the position information; go through multi-resolution Reverse filtering and frequency-to-time mapping to obtain a reconstructed audio signal.
  • the multi-resolution vector quantized audio encoder of the present invention includes a time-frequency mapper, a multi-resolution filter, a multi-resolution vector quantizer, a psychoacoustic calculation module, and a quantization encoder;
  • the time-frequency mapper Receive an audio input signal, perform time-to-frequency domain mapping, and output to the multi-resolution filter;
  • the multi-resolution filter is configured to perform adaptive filtering on the filtered signal and output the filtered signal to the psychoacoustic calculation module And the multi-resolution vector quantizer;
  • the multi-resolution vector quantizer is configured to perform vector quantization on the filtered signal and calculate a quantization residual, pass the quantized signal to the audio decoder as side information, and quantize the residual
  • the difference is output to the quantization encoder;
  • the psychoacoustic calculation module is configured to calculate a masking threshold of the psychoacoustic model according to the input audio signal, and output to the quantization encoder, for controlling the noise allowed by the quantization;
  • the multi-resolution vector quantization audio decoder of the present invention includes a decoding and inverse quantizer, a multi-resolution inverse vector quantizer, a multi-resolution inverse filter and a frequency-time mapper; the decoding and inverse quantizer, It is used to demultiplex code stream, entropy decoding and inverse quantization, to obtain side information and encoded data, and output to the multi-resolution inverse vector quantizer; the multi-resolution inverse vector quantizer is used to perform inverse vector A quantization process, reconstructing a quantized vector, and adding the reconstructed vector to a residual coefficient on a time-frequency plane, and outputting the multi-resolution inverse filter to the multi-resolution inverse filter; The sum of the vector and residual coefficients reconstructed by the multi-resolution vector quantizer is inverse filtered and output to the frequency-time mapper; the frequency-time mapper is used to complete the mapping of the signal from frequency to time To obtain the final reconstructed audio signal.
  • the audio encoding and decoding method and device based on the multi-resolution vector quantization (Mul tiresolut Vector Quant izat ion, MRVQ for short) technology of the present invention can adaptively filter audio signals, and through multi-resolution filtering, the Effectively use the phenomenon of signal energy concentration in the local time-frequency region, and can #home the type of signal, adaptively adjust the time and frequency resolution; reorganize by filtering coefficients, you can choose different according to the aggregation characteristics of the signal
  • the organization strategy uses the results of the above multi-resolution time-frequency analysis effectively. Using vector quantization to quantify these regions can not only improve the coding efficiency, but also conveniently control the quantization accuracy and optimize it.
  • FIG. 1 is a flowchart of a multi-resolution vector quantization audio coding method according to the present invention
  • FIG. 3 is a schematic diagram of a source encoding / decoding system based on a chord modulation filter
  • FIG. 4 is a schematic diagram of three aggregation modes of energy after multi-resolution filtering
  • FIG. 6 is a schematic diagram of dividing a vector in three ways
  • Figure 8 is a schematic diagram of the area energy / maximum value
  • FIG. 9 is a flowchart of another embodiment of multi-resolution vector quantization.
  • FIG. 10 is a schematic structural diagram of a multi-resolution vector quantization audio encoder according to the present invention.
  • FIG. 11 is a schematic structural diagram of a multi-resolution filter in an audio encoder
  • FIG. 12 is a schematic structural diagram of a multi-resolution vector quantizer in an audio encoder
  • Figure 3 is a flowchart of a multi-resolution vector quantization audio decoding method of the present invention.
  • 14 is a flowchart of multi-resolution inverse filtering
  • 15 is a schematic structural diagram of a multi-resolution vector quantization audio decoder according to the present invention
  • 16 is a schematic structural diagram of a multi-resolution inverse vector quantizer in an audio decoder
  • FIG. 17 is a structural diagram of a multi-resolution inverse filter in an audio decoder.
  • the flowchart shown in Figure 1 gives the overall technical solution of the audio coding method of the present invention.
  • the input audio signal is first subjected to multi-resolution filtering, then the filter coefficients are reorganized, and the vector is divided on the time-frequency plane. Further select and determine the vector to be quantized; after the vector is determined, quantize each vector to obtain the corresponding vector quantization codebook and quantization residual.
  • the vector quantization codebook is sent to the decoder as side information, and the quantization residual is quantized and encoded.
  • the flowchart of multi-resolution filtering on the audio signal is shown in Figure 2.
  • the input audio signal is decomposed into frames, and the transient measurement calculation is performed on the signal frame.
  • the value is determined by comparing the value of the transient measurement with the threshold value. Whether the type of the current signal frame is a slowly changing signal or a fast changing signal.
  • the filter structure of the signal frame is selected according to the type of different signal frames. If it is a slowly changing signal, cosine modulation filtering of equal bandwidth is performed to obtain the filter coefficients of the time-frequency plane, and the filtered signal is output.
  • fast-changing signal If it is a fast-changing signal, perform cosine modulation filtering of equal bandwidth to obtain the filter coefficients of the time-frequency plane, and then use wavelet transform to perform multi-resolution analysis on the filter coefficients, adjust the time-frequency resolution of the filter coefficients, and finally output the filtered signal.
  • a series of fast-changing signal types can be further defined, that is, there are multiple thresholds to subdivide the fast-changing signals, and different types of fast-changing signals use different wavelet transforms for multi-resolution analysis.
  • the wavelet base can be fixed or adaptive.
  • the filtering of slowly changing signals and fast changing signals is based on the technology of a cosine modulation filter bank.
  • the cosine modulation filter bank includes two types of filtering: traditional cosine modulation filtering technology and modified discrete cosine transform MDCT technology.
  • the source coding / decoding system based on cosine modulation filtering is shown in Figure 3.
  • the input signal is decomposed into M subbands by the analysis filter bank, and the subband coefficients are quantized and entropy coded.
  • subband coefficients are obtained, and the subband coefficients are filtered by a comprehensive filter bank to restore the audio signal.
  • the cosine modulation filter banks represented by the formulas (F-1) and (F-2) are orthogonal filter banks.
  • a symmetrical window is further specified
  • the other form of filtering is the modified discrete cosine transform MDCT, also known as TDACCTime Domain Aliasing Cancellation.
  • the cosine modulation filter bank has an impulse response of:
  • the cosine modulation filter bank is a bi-orthogonal modulation filter bank.
  • the analysis window and synthesis window of the cosine modulation filter bank can adopt any window form that satisfies the complete reconstruction condition of the filter bank, such as the SINE and KBD windows commonly used in audio coding.
  • cosine modulation filter bank filtering can use fast Fourier transform to improve calculation efficiency, refer to the literature "A New Algorithm for the Implementation of Filter Banks based on 'Time Domain Aliasing Cancellation'" (P. Duhamel, Y. Mahieux and JP Petit, Proc. ICASSP, May 1991, pages 2209-2212).
  • wavelet transform technology is also a well-known technology in the field of signal processing.
  • wavelet transform technology is also a well-known technology in the field of signal processing.
  • the signal after multi-resolution analysis and filtering has the property of reallocating and accumulating signal energy on the time-frequency plane, as shown in FIG. 4.
  • signals that are stable in the time domain such as sinusoidal signals, in the time-frequency plane, their energy will be concentrated in a frequency band along the time direction, as shown in a in Figure 4; for fast-varying signals in the time domain, especially in audio coding
  • fast-changing signals with obvious pre-echo phenomena, such as castanets their energy is mainly distributed along the frequency direction, that is, most of the energy values are concentrated at a few time points, as shown in Figure 4b; and for the time domain
  • the noise signal has a frequency distribution in a wide range, so the energy accumulation mode has multiple modes, both in the time direction, along the frequency direction, and in a regional manner, as shown in Figure 4c As shown.
  • the frequency resolution of the low frequency portion is high, and the frequency resolution of the high frequency portion is low. Because the components that cause the pre-echo phenomenon are mainly the middle and high frequency parts, if the coding quality of these components can be improved, the pre-echo can be effectively suppressed.
  • An important starting point of multi-resolution vector quantization is to address these important filter coefficients. Optimize the errors introduced by quantization. Therefore, it is particularly important to use efficient coding strategies for these coefficients.
  • important filter coefficients can be effectively reorganized and classified. From the above analysis, it can be known that the energy distribution of the signal after multi-resolution filtering shows a strong law.
  • vector quantization can effectively use this feature to combine coefficients.
  • the regions on the time-frequency plane are organized into a matrix form of a one-dimensional vector.
  • vector quantization is performed on all or part of the matrix elements of the vector matrix.
  • the quantized information is transmitted to the decoder as side information of the encoder, and the quantized residual and unquantized coefficients together form a residual system for quantization. coding.
  • FIG. 5 describes in detail the process of performing multi-resolution vector quantization on the audio signal after multi-resolution filtering.
  • the process of multi-resolution vector quantization includes three sub-processes of vector division, vector selection, and vector quantization.
  • the vectors can be combined and extracted in different ways for the time-frequency plane, as shown in Figs. 6-a, 6-b, and 6-c.
  • the vector is divided into 8 * 16 8-dimensional vectors according to the frequency direction, which is referred to as I-type vector organization for short.
  • Figure 6-b is the result of dividing the vector according to the time direction.
  • Figure 6-c is the result of organizing the vectors according to the time-frequency region.
  • There are 16 * 8 8-dimensional vectors in total referred to as type III vector organization. In this way, 128 8-dimensional vectors can be obtained according to different division methods.
  • the vector set obtained by the type I organization can be recorded as ⁇ v r ⁇ , and the vector set obtained by the type II organization can be recorded as ⁇ v J, and the vector set obtained by the type II organization can be recorded as ⁇ v t — r ⁇ .
  • the first method is to select all vectors on the entire time-frequency plane for quantization.
  • All vectors refer to the vectors covering all the time-frequency grid points obtained according to a certain division.
  • all vectors obtained by the I-type vector organization may be used.
  • All vectors obtained by type vector organization, or all vectors obtained by type II vector organization just select all vectors in one group.
  • the quantization gain which refers to the ratio of the energy before the quantization to the quantity ⁇ ⁇ error energy.
  • a vector of a vector organization having a large gain value is selected.
  • the second method is to select the most important vector for quantization.
  • the most important vector may include a vector in the frequency direction, a vector in the time direction, or a vector in the time-frequency region.
  • the side information also needs to include the serial numbers of these vectors.
  • the specific method of selecting vectors is described in the following. -After the quantized vector is determined, vector quantization is performed. No matter whether all vectors are selected for quantization or only important vectors are selected for quantization, the basic unit is the quantization of a single vector.
  • the vector For a single D-dimensional vector, considering the trade-off between dynamic range and codebook size, the vector needs to be normalized before quantization to obtain a normalization factor.
  • the normalization factor reflects the energy dynamic range of different vectors. The value of is the amount of change.
  • the vector is quantized again, including the quantization of the codebook index number and the quantization of the normalization factor. Considering the limitation of the code rate and the coding gain, the number of bits occupied by the quantization of the normalization factor is between As few as possible, the better.
  • the curve and surface fitting, multi-resolution decomposition, and prediction methods can be used to calculate the multi-resolution time-frequency coefficient envelope to obtain the normalized factor.
  • FIG. 7 and FIG. 9 respectively show flowcharts of two specific embodiments of the multi-resolution vector quantization process.
  • the embodiment shown in FIG. 7 selects a vector according to the energy and the variance of the internal components of the vector, and uses Taylor expansion to describe the multi-resolution time-frequency coefficient envelope, obtains a normalization factor, and then quantizes to achieve multi-resolution Vector quantization.
  • the embodiment shown in FIG. 9 selects a vector according to the coding gain, and calculates a multi-resolution time-frequency coefficient envelope using a spline curve fitting to obtain a normalization factor, and then quantizes to achieve multi-resolution vector quantization.
  • vector organization is performed according to the frequency direction, time direction, and time-frequency region. If the frequency coefficient is N-1024, the time-frequency multi-resolution filtering generates 64 * 16 grid points.
  • the vector dimension is 8
  • a vector in the form of an 8 * 16 matrix can be obtained by dividing by frequency
  • a vector in the form of a 64 * 2 matrix can be obtained by dividing by time
  • a vector in the form of a 16 * 8 matrix can be obtained according to the time-frequency region.
  • the basis for selecting a vector is the energy of the vector and the variance of each component within the vector.
  • the vector constituent elements need to take absolute values to exclude the influence of the numerical symbols.
  • the ratio of total energy determines the vector to be selected
  • the number M the typical value can be an integer within 3-50. Then, the first M vectors are selected for vector quantization. If vectors of the same region are included in the vector organization of type], the vector organization of type II, and the vector of type III, both are sorted by order of variance. Through the above steps, M vectors to be quantized are selected.
  • the quantization search process for each order difference is completed.
  • the vector needs to be normalized twice.
  • the global maximum absolute value is used in the first normalization, and the signal envelope is estimated through finite multiple points in the second normalization. Then, The corresponding position vector is normalized a second time with the estimated value. After two normalizations, the dynamic range of the vector change is effectively controlled.
  • the signal envelope estimation method is implemented by Taylor expansion, which will be described in detail later.
  • Vector quantization is performed according to the following steps: first determine the parameters in Taylor's approximate calculation formula, in order to use Taylor's formula to represent the approximate energy value of any vector in the entire time-frequency plane, and calculate the maximum energy or maximum absolute value thereof; and then, select The resulting vector is normalized for the first time; the energy approximation of the vector to be vector quantized is calculated by Taylor formula, and the normalization is performed for the second time; finally, the normalized vector is quantized according to the minimum distortion, and Calculate quantized residuals.
  • the above steps are described in detail below.
  • the coefficient on each time-frequency grid point corresponds to a certain energy value.
  • the coefficient energy of the time-frequency grid point as the square of the coefficient or its absolute value; define the energy of the vector as the sum of the coefficient energy on all time-frequency grid points that make up the vector or the largest absolute value of these coefficient values; define the time-frequency
  • the energy of the planar region is the sum of the coefficient energies or the largest absolute value of these coefficient values at all the time-frequency grid points constituting the region. Therefore, in order to obtain the energy of the vector, it is necessary to calculate the energy sum or the value with the largest absolute value for all time-frequency grid point coefficients contained in the vector. Therefore, for the entire time-frequency plane, the division manners of FIG. 6-a, 6-b, and / or 6-c can be adopted, and the divided regions are numbered (1, 2 N).
  • f ⁇ x Q + ⁇ ) f (x 0 ) + f m (x 0 ) A + ⁇ ( 2 > ( ⁇ 0 ) ⁇ 2 + ⁇ / (3) ( ⁇ ) ⁇ 3 (1 )
  • the first, second, and third order differences of this sequence can be used for regression Calculated by the method, that is, DY, D 2 Y, and D 3 Y can be obtained from Y.
  • the dots indicate the regions to be quantized and selected from all N regions, where N refers to the entire time-frequency plane division.
  • the process of obtaining the normalization factor is as follows: A global gain factor Global-Gain is determined according to the total energy of the signal, and it is quantized and encoded with a logarithmic model. Then use the gain factor Global-Gain to normalize the vector, and then calculate the local normalization factor Local_Gain at the current vector position according to Taylor formula (1), and normalize the current vector again. So the global normalization factor Gain of the current vector is given by the product of the above two normalization factors:
  • Local-Gain does not need to be quantized at the encoder.
  • the local normalization factor Local-Gain can be obtained by the same process according to Taylor formula (1). Multiply Global-Gain with the reconstructed normalized vector to get the reconstructed value of the current vector. Therefore, the side information that needs to be encoded at the encoder end is the function values at the dots selected in FIG. 8 and their first and second order difference values.
  • the present invention uses vector quantization to encode them.
  • the vector quantization process is described as follows:
  • the function value f (x) of the preselected M regions constitutes an M-dimensional vector y.
  • the first-order and second-order differences corresponding to the vector are known, and are represented by dy and d 2 y, respectively.
  • the three vectors are quantized separately.
  • a codebook corresponding to three vectors has been obtained by using a codebook training algorithm, and the quantization process is a process of searching for the best matching vector.
  • the vector y corresponds to the zero-order approximation of the Taylor formula, and the distortion measure in the codebook search uses the Euclidean distance.
  • the quantization of the first-order difference dy corresponds to the first-order approximation of Taylor's formula:
  • the quantization of the first order difference first searches for a small number of codewords with the least distortion in the corresponding codebook according to the Euclidean distance.
  • Vector ⁇ Calculate the quantization distortion for each region in the small neighborhood using formula (3), and finally use the total distortion sum as the distortion metric, that is:
  • the above method can be easily extended to the case of two-dimensional time-frequency surfaces.
  • FIG. 9 shows another specific embodiment of the multi-resolution vector quantization process.
  • vector organization is performed according to the frequency direction, time direction, and region. If all vectors are not quantized, the coding gain of each vector is calculated. The first M vectors with the largest coding gain are selected for vector quantization.
  • the method for determining the M value is: After the vectors are sorted according to the energy from large to small, the number of vectors whose total energy percentage exceeds an empirical threshold (for example, 50 ° / -90%) is M. For more effective quantization, the vector needs to be normalized twice. The first time is to use the global maximum absolute value. The second time is to use spline fitting to calculate the normalized value within the vector. After two normalizations, The dynamic range of vector changes is effectively controlled.
  • the entire time-frequency plane is re-divided and numbered (1, 2, ..., ).
  • the m-th B-spline function on the interval [ ⁇ ;, x i + m + 1 ] is defined as:
  • N iiB (x) N- ,, m- , (x) + ⁇ ,. (x) (6)
  • any spline can be expressed as:
  • the dots represent the regions to be encoded selected from all N regions, where N is obtained by dividing the entire time-frequency plane.
  • Vector number The specific vector quantization process is as follows: On the encoder side, the vector to be quantized determines the global gain factor Global-Gain for the total energy of the signal, which is quantized and encoded using a logarithmic model; then the gain factor Global-Gain is used to vector Normalization is performed, and the local normalization factor Local_Gain at the current vector position is calculated according to the fitting formula (7) and the current vector is normalized again, so the overall normalization factor Gain of the current vector is the above two Product of factors:
  • Local-Gain does not need to be quantized at the encoder.
  • Local_Gain can be obtained by the same process according to the fitting formula (7). Multiply the total gain with the reconstructed normalized vector to obtain the reconstructed value of the current vector. Therefore, when the spline curve fitting method is used, the side information that needs to be encoded at the encoder end is the function value at the circle selected in FIG. 8, and the present invention uses vector quantization to encode them.
  • the process of vector quantization is described as follows:
  • the function value f (X) of M regions is selected in advance to form an M-dimensional vector y.
  • the vector y can be further decomposed into several sub-vectors to control the size of the vector and improve the accuracy of the vector quantization. These vectors This is called the selection point vector.
  • each vector y is quantized.
  • the corresponding vector codebook can be obtained by using the codebook training algorithm.
  • the quantization process is a process of searching for the best matching vector, and the searched codeword index is transmitted to the decoder as side information.
  • the quantization error continues to the next quantization encoding process.
  • the audio encoder shown in FIG. 10 includes a time-frequency mapper, a multi-resolution filter, a multi-resolution vector quantizer, a psychoacoustic calculation module, and a quantization encoder.
  • the input audio signal to be encoded is divided into two channels, one of which passes through a time-frequency mapper and enters a multi-resolution filter for multi-resolution analysis, and the analysis result is used as a vector quantization input and a calculation for adjusting a psychoacoustic calculation module;
  • the other way is to enter the psychoacoustic calculation module to estimate the psychoacoustic masking value of the current signal, which is used to control the perceptually irrelevant component of the quantization encoder;
  • the multi-resolution vector quantizer uses the output of the multi-resolution filter to
  • the coefficients are divided into vectors and vector quantization is performed.
  • the quantization residual is quantized and entropy coded by a quantization encoder.
  • FIG. 11 is a schematic structural diagram of a multi-resolution filter in the audio encoder shown in FIG. 10.
  • the multi-resolution filter includes a transient metric calculation block, a plurality of equal-bandwidth cosine modulation filters, a plurality of multi-resolution analysis modules, and a time-frequency filter coefficient organization module; the number of the multi-resolution analysis modules is greater than the equal-bandwidth cosine.
  • the number of modulation filters is one less.
  • the working principle is as follows: After analysis of the transient measurement calculation module, the input audio signal is divided into a slowly changing signal and a fast changing signal. The fast changing signal can be further divided into a type I fast changing signal and a type II fast changing signal.
  • the multi-resolution analysis module For slow-varying signals, input them into an equal-bandwidth cosine modulation filter to obtain the required time-frequency filter coefficients. For various types of fast-varying signals, first filter through the equal-bandwidth cosine modulation filter, and then enter The multi-resolution analysis module performs wavelet transformation on the filter coefficients, adjusts the time-frequency resolution of the coefficients, and finally organizes the module to output the filtered signals through the time-frequency filter coefficients.
  • the structure of the multi-resolution vector quantizer is shown in FIG. 12, and includes a vector organization module, a vector selection module, a global normalization module, a local normalization module, and a quantization module.
  • the time-frequency plane coefficients output by the multi-resolution filter pass through the vector organization module, and are organized into a vector form according to different division strategies.
  • the vector selection module selects the vector to be quantified according to factors such as the amount of energy and outputs it to the global regression. ⁇ ⁇ ⁇ One module.
  • the global normalization module the first global normalization processing is performed on all vectors through the global normalization factor, and then the local normalization factor of each vector is calculated in the local normalization module, and Perform the second local normalization process and output to the quantization module.
  • the quantization module the normalized vector is quantized twice, and the quantized residual is calculated as the output of the multi-resolution vector quantizer.
  • the present invention also provides a multi-resolution vector quantization audio decoding method.
  • the received code stream is first demultiplexed, entropy decoded, and inverse quantized to obtain a quantized global normalization factor and a selection point.
  • Quantified index From the codebook, the energy of each selected point and the difference values of each order are calculated, and the position information of the vector quantization on the time-frequency plane is obtained from the code stream, and then the corresponding formula is obtained according to Taylor formula or spline curve fitting formula Quadratic normalization factor at position. Then, a normalized vector is obtained according to the vectorization index, and the normalized vector is multiplied with the above two normalization factors to reconstruct the quantized vector on the time-frequency plane. The reconstructed vector is added to the corresponding coefficients of the time-frequency plane after decoding and inverse quantization, and multi-resolution inverse filtering and frequency-to-time mapping are performed to complete decoding to obtain a reconstructed audio signal.
  • Figure 14 illustrates the process of multi-resolution inverse filtering in the decoding method.
  • the time-frequency coefficients of the reconstructed vector are In the time-frequency organization, the following filtering operations are performed according to the decoded signal type: if it is a slowly changing signal, perform equal-band cosine modulation filtering to obtain a pulse-code-modulated PCM output in the time domain; if it is a fast-changing signal, perform multi-resolution Synthesis, and then perform equal bandwidth cosine modulation filtering to obtain the PCM output in the time domain.
  • fast-changing signals they can be further subdivided into multiple types, and different types of fast-changing signals are different in the method of multi-resolution synthesis.
  • the corresponding audio decoder is shown in FIG. 15, and specifically includes a decoding and inverse quantizer, a multi-resolution inverse vector quantizer, a multi-resolution inverse filter, and a frequency-time mapper.
  • the decoding and inverse quantizer demultiplexes the received code stream, performs entropy decoding and inverse quantization, obtains side information of multi-resolution vector quantization, and outputs it to the multi-resolution inverse vector quantizer.
  • the multi-resolution inverse vector quantizer reconstructs the quantized vector according to the inverse quantization result and the side information, and restores the value of the time-frequency plane.
  • the multi-resolution inverse filter performs inverse filtering on the vector reconstructed by the multi-resolution inverse vector quantizer.
  • the frequency-time mapper completes the frequency-to-time mapping to obtain the final reconstructed audio signal.
  • the structure of the above multi-resolution inverse vector quantizer is shown in FIG. 16 and includes a demultiplexing module, an inverse quantization module, a normalized vector calculation module, a vector reconstruction module, and an addition module.
  • the demultiplexing module demultiplexes the received code stream to obtain a normalization factor and a quantized index of a selected point.
  • the inverse quantization module the energy envelope is obtained according to the quantization index, the vector quantization position information is obtained according to the demultiplexing result, and according to the normalization factor and the quantization index, the guidance point and the selection point vector are obtained by inverse quantization, and the secondary normalization is calculated.
  • the normalization factor is output to a normalized vector calculation module.
  • the normalization vector calculation module inverse secondary normalization is performed on the selected point vector to obtain a normalized vector, and the normalized vector is output to the vector reconstruction module. Then, the normalized vector is inversely normalized according to the energy envelope. To obtain a reconstructed vector. The reconstructed vector and the inverse quantization residual corresponding to the time-frequency plane are added in the addition module to obtain the inverse-quantized time-frequency coefficient, which is used as the input of the multi-resolution inverse filter.
  • the structure of the multi-resolution inverse filter is shown in FIG. 17, and includes a time-frequency coefficient organization module, multiple multi-resolution synthesis modules, and multiple equal-bandwidth cosine modulation filters, where the number of multi-resolution synthesis modules is equal to the equal bandwidth.
  • the number of cosine modulation filters is one less.
  • the reconstructed vector is organized by the time-frequency coefficient organization module, it is divided into a slowly changing signal and a fast changing signal.
  • the fast changing signal can be further subdivided into multiple types, such as I, I I ... K.
  • For a slowly changing signal it is output to a cosine modulation filter of equal bandwidth for filtering to obtain a time-domain PCM output.
  • For different fast-changing signal types they are output to different multi-resolution synthesis modules for synthesis, and then output to a cosine modulation filter of equal bandwidth for filtering to obtain the time-domain PCM output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

多分辨率矢量量化的音频编解码方法及装置 技术领域
本发明涉及信号处瑝领域, 具体地说, 涉及对音频信号实现多分辨率分析和矢量量化的 编解码方法及装置。 背景技术
一般地, 音频编码方法包括心理声学模型计算、 时频域映射、 量化和编码等步骤, 其中 时频域映射是指将音频输入信号从时间域映射到频率域或时-频域。
时频域映射又称作变换和滤波, 是音频信号编码的一个基本操作, 可以提高编码效率。 通过此操作, 时域信号包含的大部分信息都能够被转换或集中到频域或时频域系数的一个子 集中。 知觉音频编码器的一个基本操作是把输入的音频信号从时间域映射到频率域或时-频 域, 其基本的思路为: 把信号分解为各频率带上的成分; 一旦输入信号在频域上得以表达, 心理声学模型就可以用来去除感知无关信息; 然后将各频带上的成份分组; 最后通过合理地 分配比特数以表达各组频率参数。 如果音频信号展现出较强的准周期性, 这一过程可大大降 低数据量、 提升编码效率。 目前常用的时频域映射方法有: 离散傅立叶变换 DFT法、 离散余 弦变换 DCT法、 镜像滤波器 QMF法、 伪镜像滤波器 PQMF法、 余弦调制滤波器 CMF法、 修正离散 余弦变换 MDCT和离散小波(包) 变换 DW ( P ) T法等, 但上述方法或者是采用一种变换 /滤波 配置去压缩表达一个输入信号帧,或者是采用时域分析区间较小的滤波器組或变换压缩来表 达变化剧烈的信号, 以消除前回声对解码信号的影响。 而当一个输入信号帧包含不同暂态特 性的成份时, 单一的变换配置无法满足不同信号子帧对优化压缩的基本需求; 简单地采用时 域分析区间较' j、的滤波器組或变换来处理快变信号, 则所得系数的频率分辨率较低, 使得低 频部分的频率分辨率远大于人耳的临界子带带宽, 严重影响了编码效率。
在音频编码过程中, 当时域信号映射为时频域信号后, 采用矢量量化技术可以提高编码 效率。 目前在音频编码中应用矢量量化技术的音频编码方法是变换域加权交叉矢量量化 ( Transform-domain Weigthed Inter leave Vector Quantizat ion, 简称 TWINVQ )编码方法, 该方法在对信号进行 MDCT变换后, 通过交叉选择信号谱参数构造待量化的矢量, 然后采用高 效率的矢量量化使较低码率的编码音频质量获得明显提高。 但是, 由于无法有效控制量化噪 声和人耳掩蔽的关系, TWINVQ编码方法本盾上是一个感知有损的编码方法, 在追求更高的主 观音频质量时, TWINVQ编码方法需要进一步的改进。 同时, 由于 TWINVQ编码方法在组织矢量 时采用系数交织的方式, 虽然可以保证矢量间统计的一致性, 但对于信号能量在局部时频区 域集中的现象, 不能有效的利用, 也限制了编码效率的进一步提高。 而且, 由于 MDCT变换实 质上是一种等带宽的滤波器组, 因此, 不能按照信号能量在时频平面的聚集性对信号进行分 解, 限制了 TWINVQ编码方法的效率。
因此, 如何有效利用信号的时-频域局部集聚性和矢量量化技术的高效率, 是提高编码 效率的一个核心问题, 具体涉及两个方面: 首先, 需要对时频平面进行有效划分, 使得信号 成分的类间距离尽可能大, 而类内距离尽可能小,这是解决信号的多分辨率滤波问题;其次, 需要在一个有效的时频平面划分的基础上重新组织、 选择和量化矢量, 使得编码增益最大, 这是解决信号的多分辨率矢量量化问题。
发明内容
本发明所要解决的技术问题在于提供一种多分辨率矢量量化的音频编解码方法及装置, 可以针对不同的揄入信号类型, 调整时频分辨率, 并有效利用信号的时频域局部集聚性进行 矢量量化, 提高编码效率。
本发明所述多分辨率矢量量化的音频编码方法, 包括: 对输入的音频信号进行自适应滤 波, 获得时频滤波系数, 输出滤波信号; 对上述滤波信号在时频平面上进行矢量划分, 获得 矢量组合; 选择进行矢量量化的矢量; 对选择的矢量进行矢量量化, 并计算量化残差; 量化 后的码本信息作为编码器的边信息传输到音频解码器, 对量化残差进行量化编码。
本发明所述多分辨率矢量量化的音频解码方法, 包括: 从码流中解复用得到多分辨矢量 量化的边信息, 获得选择点的能量以及矢量量化的位置信息; 根据上述信息用逆矢量量化获 得归一化的矢量, 并计算归一化因子, 重构出原始时频平面的量化矢量; 根据位置信息将上 述重构的矢量加到对应时频系数的残差上; 经过多分辨率逆向滤波和频率到时间的映射, 得 到重构的音频信号。
本发明所述多分辨率矢量量化的音频编码器, 包括时间-频率映射器、 多分辨率滤波器、 多分辨率矢量量化器、 心理声学计算模块和量化编码器; 所述时间-频率映射器接收音频输 入信号, 进行时间到频率域的映射, 并输出到所述多分辨率滤波器; 所述多分辨率滤波器用 于对进行自适应滤波,输出滤波后的信号到所述心理声学计算模块和所述多分辨率矢量量化 器; 所述多分辨率矢量量化器用于对滤波后的信号进行矢量量化并计算量化残差, 将量化后 的信号作为边信息传给音频解码器, 将量化残差输出到所述量化编码器; 所述心理声学计算 模块用于根据输入的音频信号计算心理声学模型的掩蔽阈值, 并输出到所述量化编码器, 用 于控制量化容许的噪声; 所述量化编码器用于在所述心理声学计算模块输出的容许噪声限制 下, 对所述多分辨率矢量量化器输出的残差进行量化和熵编码, 得到编码的码流信息。 本发明所述多分辨率矢量量化的音频解码器, 包括解码和逆量化器、 多分辨率逆矢量量 化器、 多分辨率逆向滤波器和频率-时间映射器; 所迷解码和逆量化器, 用于对码流解复用、 熵解码和逆量化, 得到边信息及编码数据, 输出到所述多分辨率逆矢量量化器中; 所述多分 辨率逆矢量量化器, 用于进行逆矢量量化过程, 重构量化的矢量, 并且将重构矢量加到时频 平面上的残差系数, 输出到所述多分辨率逆向滤波器; 所述多分辨率逆向滤波器, 用于对所 述多分辨率矢量量化器重构的矢量和残差系数的和信号进行逆向滤波, 并输出到所述频率- 时间映射器; 所述频率-时间映射器, 用于完成信号从频率到时间的映射, 得到最终重构的 音频信号。
本发明所述基于多分辨率矢量量化 ( Mul t iresolut ion Vector Quant izat ion,简称 MRVQ ) 技术的音频编解码方法及装置, 可以自适应地对音频信号进行滤波, 通过多分辨率滤波, 可 以更有效的利用信号能量在局部时频区域集中的现象, 并且可以 #居信号的类型, 自适应的 调整时间和频率分辨率; 通 ii^"滤波系数重新进行组织,可以按照信号的聚集特性选择不同 的组织策略, 有效的利用上述多分辨时频分析的结果; 采用矢量量化来量化这些区域, 既能 提高编码效率, 也能方便地控制量化的精度并进行优化。
附图说明
图 1是本发明多分辨率矢量量化音频编码方法的流程图;
图 2是本发明编码方法中多分辨率滤波的流程图;
图 3是基于佘弦调制滤波的信源编 /解码系统的示意图;
图 4是经过多分辨率滤波后能量的三种聚集模式示意图;
图 5是多分辨率矢量量化过程的流程图;
图 6是按照三种方式划分矢量的示意图;
图 7是多分辨率矢量量化的一个实施例的流程图;
图 8是区域能量 /最大值的示意图;
图 9是多分辨率矢量量化的另一个实施例的流程图;
图 10是本发明多分辨率矢量量化音频编码器的结构示意图;
图 11是音频编码器中多分辨率滤波器的结构示意图;
图 12是音频编码器中多分辨率矢量量化器的结构示意图;
图】 3是本发明多分辨率矢量量化音频解码方法的流程图;
图 14是多分辨率逆向滤波的流程图; 图 15是本发明多分辨率矢量量化音频解码器的结构示意图;
图 16是音频解码器中多分辨率逆矢量量化器的结构示意图;
图 17是音频解码器中多分辨率逆向滤波器的结构示意图。
具体实施方式
下面根据附 及实施例进一步详细说明本发明的技术方案。
图 1所示的流程图给出了本 明音频编码方法的总体技术方案,输入的音频信号首先经 过多分辨率的滤波, 然后对滤波系数重新进行组织, 在时频平面上进行矢量划分; 再进一步 选择确定需要进行量化的矢量; 确定了矢量后, 对每个矢量进行量化, 获得相应的矢量量化 码本和量化残差。 矢量量化码本作为边信息发给解码器, 而量化残差则进行量化编码处理。
对音频信号进行多分辨率滤波的流程图如图 2所示, 将输入的音频信号分解成帧, 对信 号帧进行暂态性度量计算,通过比较暂态性度量的值与阈值的大小来判断当前信号帧的类型 是缓变信号还是快变信号。根据不同信号帧的类型选择信号帧的滤波结构,如果是缓变信号, 则进行等带宽的余弦调制滤波,获得时频平面的滤波系数,输出滤波信号。如果是快变信号, 则进行等带宽的余弦调制滤波, 获得时频平面的滤波系数, 再采用小波变换对滤波系数进行 多分辨率分析, 调整滤波系数的时频分辨率, 最后输出滤波信号。 对于快变信号, 还可以进 一步地定义一系列的快变信号类型, 即存在多个阈值对快变信号进行细分, 对不同类型的快 变信号, 采用不同的小波变换进行多分辨率分析, 如小波基可以是固定的, 也可以是自适应 的。
如上所述, 对緩变信号和快变信号的滤波均是基于余弦调制滤波器组的技术, 余弦调制 滤波器組包括两种滤波形式: 传统的余弦调制滤波技术和修正离散余弦变换 MDCT技术。 基于 余弦调制滤波的信源编 /解码系统如图 3所示。 在编码端, 输入信号被分析滤波器组分解成 M 个子带, 将子带系数量化和熵编码。 在解码端, 经熵解码和反量化后, 获得子带系数, 子带 系数通过综合滤波器组滤波, 恢复音频信号。
传统的余弦调制滤波技术的冲击响应为:
Figure imgf000006_0001
n =0,l,- - - , Nh - 1 fk (n) = 2ps (n) cos (k + Q.5)(n
Figure imgf000006_0002
η -0Χ· · -, Ν 其中 0≤A<M— 1, 0≤n<2KM-l, 为大于零的整数, 1) ^。 这里, 设 M个子 带余弦调制滤波器组的分析窗(分析原型滤波器) ρα(«)的冲击响应长度为 Ne, 综合窗(或 称综合原型滤波器) 的冲击响应长度为 N , 此时整个系统的延时 D 可限定在
[JW - 1, N + N。― Μ + 1]的范围内, 系统的延时为 D = 2sM + ί/(0≤ d≤ 2M - 1)。
当分析窗和综合窗相等, 即
pa (n) = ps ("),且 N。 =NS (F-3) 时, 公式 (F- 1 ) 和 (F- 2)表示的余弦调制滤波器组为正交滤波器组, 此时矩阵//和 ( [H]nJc = hk(n),[F]nlc = fk(n) )为正交变换矩阵。 为获得线性相位滤波器组, 进一步规定对 称窗
ρα{2ΚΜ-\-ή) = ρα(η) (F-4) 为保证正交和双正交系统的完全重构性,窗函数需满足的条件见文献 ( P. P. Vaidynathan "Multirate Systems and Filter Banks" , Prentice Hall, Englewood Cliff s, NJ, 1993 )。
另一种滤波形式为修正离散余弦变换 MDCT, 也被称为 TDACCTime Domain Aliasing Cancellation)余弦调制滤波器组, 其冲击响应为:
Figure imgf000007_0001
其中 0≤ <i -l, 0≤"<2 M-1, ¾:为大于零的整数。 其中, ;?。(")和 ? 分别为分析 窗(或分析原型滤波器)和综合窗(或综合原型滤波器)。
同样的, 当分析窗和综合窗相等, 即
ρα(η) = ps(n) (F-7 ) 时, 公式(F- 5 )和 (F-6)表示的余弦调制滤波器组为正交滤波器组, 此时矩阵 H和 ( [H]nk =hk(n),[F]nk =fk(n) )为正交变换矩阵。 为获得线性相位滤波器组' 进一步规定对 称窗
ρα{2ΚΜ-\-ή) = ρα{ή) (F-8) 则为满足完全重构, 由此可知, 分析窗和综合窗需满足 2K-\-2s
Z pa (mM + n) pa ((m + 2s)M + n) = S(s) ( F-9 ) 其中 = 1, " = ο,··Ά-ι。
2
放宽公式(F- 7) 的约束条件, 即取消分析窗和综合窗相等的限制, 则余弦调制滤波器 组为双正交调制滤波器组。
时域分析已经证明, 根据公式( F- 5 )和( F- 6 )获得的双正交调制滤波器組依然满足完 全重构性能, 只要
2 ps {mM + ή) pa ((m + 2s)M + 5{s) (F-10)
2K-l-2s
∑ (— 1 Ps M + ") pa ((m + 2s)M + (M— "— 1)) = 0 (F-ll ) 其中 = 0,··', — 1, Μ = 0,···,Μ- 1。
4据上述分析, 余弦调制滤波器組(包括 MDCT)的分析窗和综合窗均可以采用任意满足 滤波器组完全重构条件的窗形式, 如在音频编码中常用的 SINE和 KBD窗。
另外,余弦调制滤波器组滤波可以采用快速傅立叶变换来提高计算效率,可参考文献" A New Algorithm for the Implementation of Filter Banks based on 'Time Domain Aliasing Cancellation' " ( P. Duhamel, Y. Mahieux和 J. P. Petit, Proc. ICASSP, 1991年五月, 2209-2212 页)。
同样, 小波变换技术也是信号处理领域众所周知的技术, 可以参考 "子波变换理论及其 在信号处理中的应用" (陈逢时, 国防工业出版社, 1998 ) 关于小波变换技术的详细论述。
经过多分辨率分析滤波后的信号在时间-频率平面上具有重新分配、 聚集信号能量的性 质, 如图 4所示。 对时域平稳的信号, 如正弦信号, 在时频平面上, 其能量会沿时间方向聚 集在一个频率带上, 如图 4的 a所示; 对时域快变信号, 尤其是音频编码中预回声现象明显 的快变信号, 如响板信号, 其能量沿频率方向分布为主, 即大部分的能量值聚集在少数几个 时间点上, 如图 4的 b所示; 而对于时域噪声信号, 其频傅分布在较宽的范围, 因此能量聚 集方式具有多种模式, 既有沿时间方向的分布, 也有沿频率方向的分布, 还有按区域式的分 布, 如图 4的 c所示。
在时间 -频率的多分辨率分布中, 低频部分的频率分辨率高, 中高频部分的频率分辨率 较低。 由于引起预回声现象的成分主要是中高频部分, 如果能改善这些成分的编码质量, 就 能够有效抑制预回声,多分辨率矢量量化的一个重要出发点,就是针对这些重要的滤波系数, 优化量化引入的误差。 因此, 对这些系数采用高效的编码策略特别重要。 根据多分辨率滤波 后得到的信号滤波系数的时间-频率分布, 可以有效的把重要的滤波系数进行重組和分类。 由上述分析可知, 经过多分辨率滤波后的信号的能量分布呈现较强的规律, 引入矢量量化, 可以有效的利用这种特点进行系数的组合。 通过对采用特定方式的矢量组织, 把时间 -频率 平面上的区域组织为一维矢量的矩阵形式。 然后, 对此矢量矩阵的全部或部分矩阵元素实行 矢量量化, 量化后的信息作为编码器的边信息传输到解码器, 而量化残差和未量化的系数则 一起构成一个残差系统, 进行量化编码。
图 5详细描述了音频信号在经过多分辨率滤波后, 进行多分辨率矢量量化的过程, 多分 辨率矢量量化的过程包括矢量划分、 选择矢量和矢量量化三个子过程。
对时频平面可按照时间方向、 频率方向和时频区域三种方式进行矢量划分, 音调性较强 的信号适用于按时间方向组织矢量, 时域具有快变特性的信号则适合于按频率方向组织矢 量, 而比较复杂的音频信号则适合按时频区域组织矢量。 假设信号的频率系数长度是 N, 经 过多分辨率滤波后,在时频平面上时间方向的分辨率为 L,频率方向的分辨率为 K,且 K*L=N。 当进行矢量划分时, 首先确定矢量维数 D的大小, 由此可得到划分后的矢量的个数为 N/D。 当按照时间方向进行矢量划分时, 保持频率方向的分辨率 K不变, 对时间进行划分; 当按照 频率方向进行矢量划分时, 保持时间方向的分辨率 L不变, 对频率进行划分; 当按照时频区 域进行矢量划分时, 其时间和频率方向划分的个数可任意, 只要满足最终划分的矢量个数为 N/D即可。图 6示出了按照时间、频率和时频区域划分矢量的实施例。假设频率系数长 N=l 024 , 经过多分辨率滤波后,时频平面被划分为 K*L = 64*16形式, K=64为频率方向的分辨率, L=16 为时间方向的分辨率。 假设矢量的维数 D=8 , 可以对该时频平面按照不同的方式组合和提取 矢量, 如图 6- a、 图 6-b和图 6-c所示。 在图 6-a中, 矢量按频率方向被划分为 8*16个 8 维矢量, 简称为 I型矢量组织。 图 6-b是按照时间方向划分矢量的结果, 共有 64*2个 8维 矢量, 简称为 II型矢量组织。 图 6-c是按照时频区域组织矢量的结果, 共有 16*8个 8维矢 量, 简称为 I I I型矢量组织。 这样按不同的划分方法皆可毅得 128个 8维矢量。 可将 I型组 织得到的矢量集合记为 {vr}, I I型组织得到的矢量集合记为 {v J , I II型组织得到的矢量集 合记为 {vtr}。
在进行了矢量划分后, 然后确定需要量化哪些矢量, 对矢量进行选择, 可以采用两种选 择方式。
第一种方式是选择整个时间 -频率平面上的全部矢量进行量化, 全部矢量是指按照某一 种划分所得的覆盖全部时频格点的矢量, 如可以是 I 型矢量组织得到的全部矢量, 或是 I I 型矢量组织得到的全部矢量, 或是 II I型矢量组织得到的全部矢量,只要选择其中一组的全 部矢量即可。 至于选择哪一组的矢量集合, 则通过量化增益来确定, 量化增益是指量化前的 能量与量^ <误差能量的比。 对于上述矢量组织, 选择增益值大的矢量組织的矢量。
第二种方式是选择最重要的矢量进行量化, 最重要的矢量既可以包括频率方向的矢量, 也可以包括时间方向的矢量或时频区域的矢量。 对于只选择部分矢量进行矢量量化的情况, 在边信息中除了包括矢量的量化索引外, 还需包括这些矢量的序号。 具体选择矢量的方法在 下面的内容中进行介绍。 - 确定了量化的矢量后, 则进行矢量量化的处理。 不论是选择全部矢量进行量化, 还是只 选择重要矢量进行量化, 其基本单元都是对单个矢量的量化。 对单个 D维矢量, 考虑到动态 范围和码本大小之间的折衷, 需要在量化前对矢量进行归一化处理, 得到一个归一化因子, 归一化因子是反映不同矢量的能量动态范围的值, 是变化的量。 经过归一化处理后的矢量再 进行量化, 包含码本索引号的量化和归一化因子的量化, 考虑到码率和编码增益的限制, 对 归一化因子的量化所占的比特数在满足精度的条件下越少越好。 在本发明中, 可以采用曲线 和曲面拟合、 多分辨率分解和预测等方法计算多分辨率时间-频率系数包络, 获得归一化因 子。
图 7和图 9分别给出了多分辨率矢量量化过程的两个具体实施例的流程图。 图 7所示实 施例根据能量和矢量内分量的方差对矢量进行选择, 并采用泰勒展式描述多分辨率时间 -频 率系数包络, 获得归一化因子, 再进行量化, 以实现多分辨率矢量量化。 图 9所示实施例是 根据编码增益来选择矢量, 并采用样条曲线拟合计算多分辨率时间-频率系数包络, 获得归 一化因子, 再进行量化, 以实现多分辨率矢量量化。 下面分别介绍这两个实施例。
在图 7 中, 首先分别按照频率方向、 时间方向和时频区域进行矢量組织, 若频率系数 N-1024时, 时间-频率多分辨率滤波产生 64*16的格点, 当矢量维数取 8时, 那么按频率划 分可以得到 8*16矩阵形式的矢量, 按时间划分可以得到 64*2矩阵形式的矢量, 按时频区域 可得到 16*8矩阵形式的矢量。
如果不对全部矢量都量化, 那么需要按照重要性来选择矢量。 在本实施例中, 选择矢量 的依据是矢量的能量和矢量内各分量的方差, 在计算方差时, 矢量组成元素需要取绝对值, 以排除数值符号的影响。 设集合 V= (V J U {v J U {vl-r} , 则选择矢量的过程具体如下: 首先, 计 算集合 V中的每个矢量的能量 EVi = | Vi , 同时计算每个矢量的 dEv dEVi表示第 i个矢量的 各分量方差。 然后将集合 V中的元素按能量从大到小进行排序, 再将上述排序后的元素按照 方差从小到大进行再排序。根据信号总能量和当前选择的矢量总能量之比确定需选择的矢量 个数 M, 典型的值可取 3-50内的整数。 然后选择前 M个矢量进行矢量量化, 若同时包含有】 型矢量组织、 II型矢量组织和 III型矢量组织的同一区域的矢量,则按方差的排序进行取舍。 通过上述步骤, 选择出待量化的 M个矢量。
在选择了 M个矢量后, 利用泰勒 Taylor近似公式, 分别用不同的失真度量准则, 完成 对各阶差分的量化搜索过程。 为了更有效的量化, 需要对矢量进行两次归一化处理, 第一次 归一化时采用全局最大绝对值, 第二次归一化时, 通过有限多点对信号包络进行估计, 然后 用估计值对对应位置矢量进行第二次归一化, 经过两次归一化后, 矢量变化的动态范围得到 有效的控制。 信号包络的估计方法通过泰勒展式实现, 将在后面详细叙述。
矢量量化按以下步骤进行:首先确定 Taylor近似计算公式中的参数, 以便用泰勒公式来 表示整个时频平面上任意矢量的能量近似值, 并且计算出其中的最大能量或最大绝对值; 然 后, 对选择出来的矢量进行第一次归一化处理; 接着通过 Taylor公式计算待矢量量化的矢 量的能量近似值, 进行第二次归一化处理;最后对归一化后的矢量按最小失真进行量化, 并 计算量化残差。 下面对上述步骤进行详细地描述。 在时间-频率平面上, 每个时频格点上的 系数对应一个确定的能量值。 定义时频格点的系数能量为该系数的平方或其绝对值; 定义矢 量的能量为组成该矢量的所有时频格点上系数能量的和或者这些系数值中最大的绝对值; 定 义时频平面区域的能量为组成该区域的所有时频格点上系数能量的和或者这些系数值中最 大的绝对值。 因此为了得到矢量的能量, 需要对矢量所包含的所有时频格点系数计算能量和 或者绝对值最大的值。 因此, 对整个时间-频率平面, 可以采用图 6-a、 6-b和 /或 6- c的划 分方式, 对划分后的区域进行编号(1、 2 N)。 如果采用按频率方向划分, 则每个区域 就对应一个频率方向的矢量, 计算每个区域的能量或绝对值最大的值, 构造出一元函数 Y=f (X) , 其中 X表示区域序号, 其取值为 [1, N]上的整数, Y表示对应 X的区域的能量或绝对 值最大的值, 而点 (X Υ, ), i取值为 [Ι, Ν]上的整数, 也被称为引导点。 根据泰勒公式有: f{xQ + Δ) = f(x0) + fm(x0)A + ± (2>(χ02 + ^/(3)(ξ)Α3 ( 1 ) 一元函数 Y=f (X)的 M个值构成了一个离散序列 {yh y2, y3, y4,…, yj, 该序列的一阶、 二 阶和三阶差分都可以用回归方法求得, 即由 Y可得到 DY、 D2Y以及 D3Y。
图 8所示的是用泰勒展式近似表示函数 Y=f (X)的示意图,圆点表示从全部 N个区域中选 择出来的待量化编码的区域, 这里的 N是指整个时频平面划分得到的矢量数。 具体获得归一 化因子的过程如下: 根据信号总能量确定一个全局的增益因子 Global-Gain, 对其用对数模 型量化编码。 然后用该增益因子 Global-Gain对矢量进行归一化, 再根据泰勒公式(1 )计 算出当前矢量位置上的局部归一化因子 Local— Gain, 并且对当前矢量再次进行归一化处理。 于是当前矢量的总体归一化因子 Gain由上述两个归一化因子的乘积给出:
Gain = Global-Gain * Local-Gain (2)
其中, Local—Gain在编码器端不需要量化。 在解码器端, 根据泰勒公式(1 )用相同的过程 可以求出局部归一化因子 Local-Gain。用 Global-Gain与重构的归一化矢量相乘, 即可得到 当前矢量的重构值。因此,在编码器端需要编码的边信息就是图 8中选择的圆点处的函数值、 以及它们的一阶、 二阶差分值, 本发明采用矢量量化来对它们进行编码。
矢量量化的过程描述如下: 预先选择的 M个区域的函数值 f (x)构成 M维矢量 y, 已知该 矢量对应的一阶、 二阶差分, 分別用 dy和 d2y表示, 对这三个矢量分别进行量化。 在编码器 端, 用码本训练算法已经得到了对应三个矢量的码本, 量化过程就是搜索最佳匹配矢量的过 程。 矢量 y对应泰勒公式的零阶近似表示, 在码本搜索时的失真度量用欧氏距离。 对一阶差 分 dy的量化, 对应于泰勒公式的一阶近似:
/( 0 + Δ) = (χ0) + (Ι)0)Δ ( 3 ) 因此,一阶差分的量化首先根据欧式距离, 搜索对应码本中失真最小的少量码字, 再在当前 矢量 χ。的小邻域中, 对邻域中的每一个区域用公式(3 )计算量化失真, 最后用总的失真和 作为失真度量, 即:
D =
Figure imgf000012_0001
(/ + Δ,) - /( + Δ,))2 (4) 其中/ (χ + Δ4)表示量化前的真值, /^ +厶4)表示用泰勒公式求出的近似值, Μ表示邻域的 范围。 对二阶差分 d2y的量化可用类似的过程进行。 通过上述过程最终可以得到三个量化后 的码字索引, 作为边信息传输到解码器。 而量化残差则进行量化编码处理。
上述方法可以很容易扩展到二维时频曲面的情况。
图 9为多分辨率矢量量化过程的另一个具体实施例。 首先分别按照频率方向、 时间方向 和区域进行矢量组织, 如果不对全部矢量进行量化, 则计算每个矢量的编码增益, 选择编码 增益最大的前 M个矢量进行矢量量化, M值的确定方法是:对矢量按照能量从大到小排序后, 占总能量百分比超过一个经验阈值(如 50°/。- 90% )的矢量的数目就是 M。 为了更有效的量化, 也需要对矢量进行两次归一化, 第一次采用全局最大绝对值, 第二次采用样条拟合计算矢量 内归一化值, 经过两次归一化后, 矢量变化的动态范围得到有效的控制。
与图 7所示的实施例相同,首先对整个时间-频率平面重新进行划分并编号(1 , 2, ... ... ,
N ), 计算每个区域的能量或绝对值最大的值, 构造一元函数 Y=f (X) , 其中 X表示区域编号, 其取值为 [1, N]上的整数, Y是对应 X的区域的能量或绝对值最大的值。根据 B样条曲线拟合 的公式有:
第 i个子区间上的常数(0次) B样条函数为:
li Xi < X < Xi+i
Ni| 0 (x) = 1 (5)
0, 其它。
在区间 [χ;, xi+m+1]上的第 m次 B样条函数定义为:
( X Xi ) ( Xl+m+1 X )
NiiB (x) = N-,,m-, (x) + Ν,. (x) (6)
( Xi+ra一 Xi) ( Xi+i )
那么, 采用 B样条基函数作为基底, 可以将任何样条表示为:
f (x) = ∑k-^N (x) ( 7 )
这样根据公式( 5 ) " 6 )和( 7 )可以计算给定 X点样条的函数值, 这些用于插值的点也被 称为引导点。
图 8同样可以作为经样条曲线拟合获得的函数 Y=f (X)的示意图,圆点表示从全部 N个区 域中选择出来的待编码的区域, 这里的 N是整个时频平面划分得到的矢量数。 具体的矢量量 化过程如下: 在编码器端, 对待量化的矢量, 居信号总能量确定一个全局的增益因子 Global-Gain, 对其用对数模型量化编码; 然后用该增益因子 Global— Gain对矢量进行归一 化, 才艮据拟合公式( 7 )计算当前矢量位置上的局部归一化因子 Local_Gain并且再次对当前 矢量进行归一化处理, 于是当前矢量的总体归一化因子 Gain是上述两个因子的乘积:
Gain = Global-Gain * Local-Gain (8)
其中, Local-Gain在编码器端并不需要量化。 同样的, 在解码器端可以根据拟合公式( 7 ) 用相同的过程求出 Local_Gain。用总增益与重构的归一化矢量相乘,即可得到当前矢量的重 构值。 因此, 在采用样条曲线拟合方法时, 编码器端需要编码的边信息就是图 8中所选择的 圆点处的函数值, 本发明采用矢量量化对它们进行编码。
矢量量化的过程描述如下: 预先选择 M个区域的函数值 f (X)构成 M维的矢量 y, 矢量 y 可以进一步分解成若干分矢量, 以控制矢量的大小, 提高矢量量化的精度, 这些矢量被称为 选择点矢量。 然后, 对矢量 y分别进行量化。 在编码器端, 用码本训练算法可以得到了对应 的矢量码本。 量化过程就是搜索最佳匹配矢量的过程, 搜索得到的码字索引作为边信息传送 到解码器。 量化误差则继续进行下一步的量化编码处理。
以上方法可以很容易扩展到二维时频曲面的情况。 如图 10所示的音频编码器, 包括时间-频率映射器、 多分辨率滤波器、 多分辨率矢量量 化器、 心理声学计算模块和量化编码器。 待编码的输入音频信号分为两路, 一路经时间-频 率映射器后进入多分辨率滤波器, 进行多分辨分析, 其分析结果作为矢量量化的输入和用于 调整心理声学计算模块的计算; 另一路进入心理声学计算模块, 估计当前信号的心理声学掩 蔽闹值, 用于控制量化编码器的感知不相关成分; 多分辨率矢量量化器根据多分辨率滤波器 的输出, 对时频平面的系数划分成矢量并进行矢量量化, 量化残差由量化编码器进行量化和 熵编码。 '
图 11是图 10所示音频编码器中多分辨率滤波器的结构示意图。 多分辨率滤波器包括暂 态性度量计算 块、 多个等带宽余弦调制滤波器、 多个多分辨率分析模块和时频滤波系数组 织模块; 其中多分辨率分析模块的个数比等带宽余弦调制滤波器的个数少一个。 其工作原理 如下: 输入音频信号经过暂态性度量计算模块的分析, 分为緩变信号和快变信号, 快变信号 可进一步细分为类 I型快变信号, 类 II型快变信号。 对于緩变信号, 输入到等带宽余弦调 制滤波器中进行滤波, 获得所需的时-频滤波系数; 对于各类快变信号, 则均先经过等带宽 余弦调制滤波器进行滤波, 然后再进入多分辨率分析模块对滤波系数进行小波变换, 调整系 数的时频分辨率, 最后通过时频滤波系数组织模块输出滤波后的信号。
多分辨率矢量量化器的结构如图 12所示, 包括矢量组织模块、 矢量选择模块、 全局归 一化模块、 局部归一化模块和量化模块。 多分辨率滤波器输出的时频平面系数经过矢量组织 模块, 根据不同的划分策略, 组织成矢量的形式, 然后在矢量选择模块根据能量的大小等因 素选择出待量化的矢量, 输出到全局归一化模块。 在全局归一化该模块中, 通过全局归一化 因子对所有的矢量进行第一次全局归一化处理, 然后在局部归一化模块中计算出每个矢量的 局部归一化因子, 并进行第二次局部归一化处理, 输出到量化模块。 在量化模块中, 对经过 两次归一化后的矢量进行量化, 并计算出量化后的残差, 作为多分辨率矢量量化器的输出。
本发明还提供了多分辨率矢量量化的音频解码方法, 如图 13所示, 首先对收到的码流 进行解复用、 熵解码和逆量化, 得到量化的全局归一化因子以及选择点的量化索引。 根据索 ?|从码本中计算出各个选择点的能量及各阶差分值,从码流中得到时频平面上矢量量化的位 置信息, 再根据泰勒公式或样条曲线拟合公式, 获得对应位置上的二次归一化因子。 再根据 矢量化索引得到归一化的矢量, 并与上述两个归一化因子相乘, 就重构了时频平面上量化的 矢量。 将重构后的矢量和解码逆量化后的时频平面对应位置的系数相加, 进行多分辨率逆向 滤波和频率到时间的映射, 完成解码, 得到重构的音频信号。
图 14介绍了解码方法中的多分辨率逆向滤波的过程。 首先对重构矢量的时频系数进行 时频组织, 根据解码得到的信号类型进行如下滤波操作: 如果是緩变信号, 则进行等带宽余 弦调制滤波, 获得时域的脉冲编码调制 PCM输出; 如果是快变信号, 则进行多分辨率综合, 再进行等带宽余弦调制滤波, 获得时域的 PCM输出。 对于快变信号, 也可以进一步细分为多 种类型, 不同类型的快变信号进行多分辨率综合的方法也不同。
相应的音频解码器如图 15所示, 具体包括解码和逆量化器、 多分辨率逆矢量量化器、 多分辨率逆向滤波器以及频率-时间映射器。 解码和逆量化器对收到的码流进行解复用, 并 进行熵解码和逆量化, 获得多分辨矢量量化的边信息, 输出到多分辨率逆矢量量化器中。 多 分辨率逆矢量量化器根据逆量化结果和边信息, 重构量化矢量, 并恢复时频平面的值; 多分 辨率逆向滤波器对多分辨率逆矢量量化器重构的矢量进行逆向滤波, 并由频率 -时间映射器 完成频率到时间的映射, 得到最终重构的音频信号。
上述多分辨率逆矢量量化器的结构如图 16所示, 包括解复用模块、 逆量化模块、 归一 化矢量计算模块、 矢量重构模块和加法模块。 首先解复用模块对接收到的码流进行解复用, 获得归一化因子和选择点的量化索引。 然后在逆量化模块中根据量化索引获得能量包络, 根 据解复用结果获得矢量量化位置信息, 并根据归一化因子和量化索引, 逆量化获得引导点和 选择点矢量, 计算出二次归一化因子, 输出到归一化矢量计算模块。 在归一化矢量计算模块 中, 对选择点矢量进行逆二次归一化, 获得归一化矢量, 输出到矢量重构模块中, 再根据能 量包络对归一化矢量进行逆一次归一化, 获得重构矢量。 重构矢量和对应时频平面的反量化 残差在加法模块中相加, 得到逆量化的时频系数, 作为多分辨率逆向滤波器的输入。
多分辨率逆向滤波器的结构如图 17所示, 包括时频系数组织模块、 多个多分辨率综合 模块以及多个等带宽余弦调制滤波器,其中多分辨率综合模块的个数比等带宽余弦调制滤波 器的个数少 1。 重构的矢量经过时频系数组织模块后, 分为緩变信号和快变信号, 快变信号 还可以进一步的细分为多种类型, 如 I、 I I…… K。 对于緩变信号, 则输出到等带宽的余弦调 制滤波器进行滤波, 获得时域 PCM输出。 对于不同的快变信号类型, 则输出到不同的多分辨 率综合模块进行综合, 然后输出到等带宽的余弦调制滤波器中滤波, 获得时域 PCM输出。
最后所应说明的是, 以上实施例仅用以说明本发明的技术方案而非限制, 尽管参照较佳 实施例对本发明进行了详细说明, 本领域的普通技术人员应当理解, 可以对本发明的技术方 案进行修改或者等同替换, 而不脱离本发明技术方案的精神和范围, 其均应涵盖在本发明的 权利要求范围当中。

Claims

权利要求书
1、 一种多分辨率矢量量化的音频编码方法, 其特征在于, 包括: 对输入的音频信号进行自 适应滤波,获得时频滤波系数,输出滤波信号;对上迷滤波信号在时频平面上进行矢量划分, 获得矢量组合; 择进行矢量量化的矢量; 对选择的矢量进行矢量量化, 并计算量化残差; 量化后的码本信息作为编码器的边信息传输到音频解码器, 对量化残差进行量化编码。
2、 根据权利要求 1所述的多分辨率矢量量化的音频编码方法,其特征在于,所述对音频信 号进行自适应滤波的步骤进一步包括: 将输入的音频信号分解成帧, 计算信号帧的暂态性度 量;通过比较暂态性度量的值与阈值的大小来判断当前信号帧的类型是緩变信号还是快变信 号; 如果是緩变信号, 则进行等带宽的余弦调制滤波, 获得时频平面的滤波系数, 输出滤波 信号; 如果是快变信号, 则进行等带宽的余弦调制滤波, 获得时频平面的滤波系数, 再采用 小波变换对滤波系数进行多分辨率分析, 调整滤波系数的时频分辨率, 最后输出滤波信号。
3、 根据权利要求 2所述的多分辨率矢量量化的音频编码方法,其特征在于,所述余弦调制 滤波可采用传统的余弦调制滤波或修正离散余弦变换滤波。
4、 根据权利要求 3所述的多分辨率矢量量化的音频编码方法,其特征在于,所述余弦调制 滤波还包括进行快速傅立叶变换。
5、 根据权利要求 1所述的多分辨率矢量量化的音频编码方法,其特征在于,如果是快变信 号, 则还包括: 将快变信号进一步细分为多种快变信号类型, 对于不同的快变信号类型, 分 别进行滤波和多分辨率分析。
6、 根据权利要求 5所述的多分辨率矢量量化的音频编码方法,其特征在于,对不同类型的 快变信号, 所述进行多分辨率分析的小波变换的小波基是固定的或是自适应的。
7、 根据权利要求 1所述的多分辨率矢量量化的音频编码方法,其特征在于,所述对滤波信 号在时频平面上进行矢量划分包括按照时间方向、频率方向和时频区域三种方式进行矢量划 分; 所述按时间方向划分进一步包括保持频率方向的分辨率不变, 对时间进行划分, 使得划 分后的矢量个数为 N/D, 得到 I型矢量组织, 其中 N表示音频信号的频率系数的长度, D表 示矢量的维数;
所述按频率方向划分进一步包括保持时间方向的分辨率不变, 对频率进行划分, 使得划 分后的矢量个数为 N/D, 得到 I I型矢量组织, 其中 N表示音频信号的频率系数的长度, D表 示矢量的维数;
所述按时频区域划分进一步包括对时频平面的时间和频率进行划分,使得划分后的矢量 个数为 N/D, 得到 I I I型矢量组织, 其中 N表示音频信号的频率系数的长度, D表示矢量的 维数。
8、 根据权利要求 1所述的多分辨率矢量量化的音频编码方法,其特征在于,所述选择进行 矢量量化的矢量的步骤进一步包括:判断是否需要对时频平面的全部矢量进行量化,如果是, 则分别计算 I型矢量組织、 I I型矢量组织和 I I I型矢量组织的量化增益,选择量化增益值大 的矢量组织的矢量作为量化的矢量; 如果否, 则选择 M个待量化的矢量, 并对所选的矢量的 序号进行编码。
9、 根据权利要求 8 所述的多分辨率矢量量化的音频编码方法, 其特征在于, 所述选择 M 个待量化的矢量的步骤可以进一步包括:将 I型矢量组织、 I I型矢量組织和 I I I型矢量组织 的矢量组成一个矢量集合; 计算上述矢量集合中每个矢量的能量即系数的平方, 同时计算每 个矢量的各分量方差; 将矢量集合中的矢量按能量从大到小进行排序; 将上述排序后的矢量 按照方差从小到大进行再排序;根据信号总能量和当前逸择的矢量总能量之比确定需选择的 矢量个数 M, 选择前 M个矢量作为矢量量化的矢量; 若同时包含有 I型矢量组织、 I I型矢量 組织和 I I I型矢量组织的同一区域的矢量, 则按方差的排序进行取舍。
10、 ■据权利要求 8 所述的多分辨率矢量量化的音频编码方法, 其特征在于, 所述选择 M 个持量化的矢量的步骤可以进一步包括:将 I型矢量組织、 I I型矢量组织和 I I I型矢量组织 的矢量组成一个矢量集合; 计算矢量集合中每个矢量的能量和编码增益; 选择编码增益最大 的前 M个矢量, 使得所选 M个矢量的能量与总能量的百分比超过 50%。
11、 根据权利要求 9或 10所述的多分辨率矢量量化的音频编码方法, 其特征在于, 所述 M 的值可以是 3到 50之间的任一整数。
12、 根据权利要求 1所述的多分辨率矢量量化的音频编码方法,其特征在于,所述对选择的 矢量进行矢量量化的步骤进一步包括: 计算时间-频率平面每个区域的能量值或绝对值最大 值; 确定全局归一化因子; 对选择的矢量进行归一化处理; 计算矢量的局部归一化因子, 并 进行第二次归一化处理; 对归一化后的矢量进行量化, 并计算量化残差。
13、 根据权利要求 12所述的多分辨率矢量量化的音频编码方法, 其特征在于, 所述对选择 的矢量进行矢量量化的步驟进一步包括: 计算时间 -频率平面每个区域的能量值或绝对值最 大值; 构造一元函数 Y=f (X) , 其中 X表示区域的序号, Y表示对应 X的区域的能量或绝对值 最大值; 居信号总能量确定一个全局增益因子, 对其用对数模型进行量化编码; 用该全局 增益因子对选择的矢量进行归一化处理;根据泰勒公式计算当前矢量位置上的局部归一化因 子, 并对当前矢量再次进行归一化处理; 获得当前矢量的总体归一化因子是上述两个归一化 因子的乘积;将选择的 M个区域的函数值构成 M维矢量;计算该矢量对应的一阶、二阶差分; 通过码本训练算法获得对应上述三个矢量的码本, 并对上述三个矢量进行量化; 所述矢量的 量化对应泰勒公式的零阶近似表示, 码本搜索时的失真度量采用欧氏距离;一阶差分矢量的 量化对应于泰勒公式的一阶近似, 根据欧式距离, 搜索对应码本中失真最小的少量码字, 再 在当前矢量 的小邻域中,对邻域中的每一个区域计算量化失真, 最后总的失真和作为失真 度量; 二阶差分矢量的量化与一阶差分矢量的量化类似。
14、 根据权利要求 12所述的多分辨率矢量量化的音频编码方法, 其特征在于, 所述对选择 的矢量进行矢量量化的步骤进一步包括: 计算时间 -频率平面每个区域的能量值或绝对值最 大值; 构造一元函数 Y=f (X) , 其中 X表示区域的序号, Y表示对应 X的区域的能量或绝对值 最大值; 根据信号总能量确定一个全局增益因子, 对其用对数模型进行量化编码; 用该全局 增益因子对选择的矢量进行归一化处理;根据样条曲线拟合公式计算当前矢量位置上的局部 归一化因子, 并对当前矢量再次进行归一化处理; 将选择的 M个区域的函数值构成 M维的矢 量, 所述矢量可以进一步分解成若干分矢量, 称为选择点矢量; 对上述矢量分别进行量化。
15、 一种多分辨率矢量量化的音频解码方法, 其特征在于, 包括以下步骤: 从码流中解复用 得到多分辨矢量量化的边信息, 获得选择点的能量以及矢量量化的位置信息; 根据上述信息 用逆矢量量化获得归一化的矢量, 并计算归一化因子, 重构出原始时频平面的量化矢量; 根 据位置信息将上述重构的矢量加到对应时频系数的残差上; 经过多分辨率逆向滤波和频率到 时间的映射, 得到重构的音频信号。
16、 根据权利要求 15所述的多分辨率矢量量化的音频解码方法, 其特征在于, 所述重构原 始时频平面的量化矢量步骤进一步包括:根据边信息从码本中计算出各个选择点的能量及各 阶差分值; 从码流中得到时频平面上矢量量化的位置信息和全局归一化因子; 根据编码过程 中计算二次归一化因子的公式, 获得对应位置上的二次归一化因子; 根据矢量化索引获得归 一化的矢量, 并与上述两个归一化因子相乘, 重构时频平面上量化的矢量。
17、 根据权利要求 15所述的多分辨率矢量量化的音频解码方法, 其特征在于, 所述多分辨 率逆向滤波的步骤进一步包括: 对重构矢量的时频系数进行时频组织, 根据解码得到的信号 类型进行如下滤波操作: 如果是缓变信号, 则进行等带宽余弦调制滤波, 获得时域的脉冲编 码调制输出; 如果是快变信号, 则进行多分辨率综合, 再进行等带宽余弦调制滤波, 获得时 域的脉沖编码调制输出。
18、 根据权利要求 17所述的多分辨率矢量量化的音频解码方法, 其特征在于, 所述快变信 号可以进一步分为多种快变信号类型, 对不同的快变信号类型, 分别进行多分辨率综合和滤 波。
19、 一种多分辨率矢量量化的音频编码器, 其特征在于, 包括时间-频率映射器、 多分辨率 滤波器、 多分辨率矢量量化器、 心理声学计算模块和量化编码器;
所述时间-频率映射器接收音频输入信号, 进行时间到频率域的映射, 并输出到所述多 分辨率滤波器;
所述多分辨率滤波器用于对信号进行自适应滤波,并输出滤波后的信号到所述心理声学 计算模块和所述多分辨率矢量量化器;
所述多分辨率矢量量化器用于对滤波后的信号进行矢量量化并计算量化残差,将量化后 的信号作为边信息传给音频解码器, 将量化残差输出到所述量化编码器;
所述心理声学计算模块用于根据输入的音频信号计算心理声学模型的掩蔽阈值,并输出 到所述量化编码器, 以控制量化容许的噪声; 所述量化编码器用于在所述心理声学计算模块输出的容许噪声限制下,对所述多分辨率 矢量量化器输出的残差进行量化和熵编码, 得到编码的码流信息。
20、 据权利要求 19所述的多分辨率矢量量化的音频编码器, 其特征在于, 所述多分辨率 滤波器包括暂态性度量计算模块、 M个等带宽余弦调制滤波器、 N个多分辨率分析模块和时 频滤波系数组织模块 , 且满足 M-N+1;
所述暂态性度量计算模块, 用于计算音频输入信号帧的暂态性度量, 以确定所述信号帧 的类型;
所述等带宽余弦调制滤波器, 用于对信号进行滤波, 获得滤波系数; 如果是緩变信号, 将滤波系数输出到所述时频滤波系数组织模块; 如果是快变信号, 则将滤波系数输出到所述 多分辨率分析模块;
所述多分辨率分析模块, 用于对快变信号的滤波系数进行小波变换, 调整系数的时频分 辨率, 并将变换后的系数输出到所述时频滤波系数组织模块;
所述时频滤波系数组织模块, 用于将滤波输出的系数按时频平面进行组织, 并输出滤波 信号。
21、 根据权利要求 19所述的多分辨率矢量量化的音频编码器, 其特征在于, 所述多分辨率 矢量量化器包括矢量組织模块、 矢量选择模块、 全局归一化模块、 局部归一化模块和量化模 块;
所述矢量组织模块,用于将所述多分辨率滤波器输出的时频平面系数根据不同的划分策 略组织成矢量的形式, 输出到所述矢量选择模块;
所述矢量选择模块, 用于根据能量的大小等因素选择出待量化的矢量, 输出到所述全局 归一化模块;
所述全局归一化模块, 用于对上述矢量进行全局归一化处理;
所述局部归一化模块, 用于计算每个矢量的局部归一化因子, 并对所述全局归一化模块 输出的矢量进行局部归一化处理, 输出到所述量化模块;
所述量化模块, 用于对经过两次归一化后的矢量进行量化, 并计算量化后的残差。
22、 一种多分辨率矢量量化的音频解码器, 其特征在于, 包括解码和逆量化器、 多分辨率逆 矢量量化器、 多分辨率逆向滤波器和频率-时间映射器; 所述解码和逆量化器, 用于对码流解复用、 熵解码和逆量化, 得到边信息及编码数据, 输出到所述多分辨率逆矢量量化器中;
所述多分辨率逆矢量量化器, 用于进行逆矢量量化过程, 重构量化的矢量, 并且将重构 矢量加到时频平面上的残差系数, 输出到所述多分辨率逆向滤波器;
所述多分辨率逆向滤波器, 用于对所述多分辨率矢量量化器重构的矢量进行逆向滤波, 并输出到所述频率-时间映射器;
所述频率-时间映射器, 用于完成信号从频率到时间的映射, 得到最终重构的音频信号。
23、 根据权利要求 22所述的多分辨率矢量量化的音频解码器, 其特征在于, 所述多分辨率 逆矢量量化器包括解复用模块、 逆量化模块、 归一化矢量计算模块、 矢量重构模块和加法模 块;
所述解复用模块, 用于对接收到的码流进行解复用, 获得归一化因子和选择点的量化索 引;
所述逆量化模块, 用于根据所述解复用模块输出的信息获取能量包络、 矢量量化位置信 息, 并进行逆量化获取引导点和选择点矢量, 计算出二次归一化因子, 输出到所述归一化矢 量计算模块;
所述归一化矢量计算模块, 用于对选择点矢量进行逆二次归一化, 获得归一化矢量, 输 出到所述矢量重构模块中;
所述矢量重构模块,用于根据能量包络对归一化矢量进行逆一次归一化,获得重构矢量; 所述加法模块,用于将所述矢量重构模块输出的重构矢量与对应时频平面的反量化残差 相加, 得到逆量化的时频系数, 作为所述多分辨率逆向滤波器的输入。
24、 根据权利要求 22所述的多分辨率矢量量化的音频解码器, 其特征在于, 所述多分辨率 逆向滤波器进一步包括: 时频系数组织模块、 N个多分辨率综合模块和 M个等带宽余弦调制 滤波器, 且满足 M=N+1;
所述时频系数组织模块,用于将逆量化系数按滤波输入方式进行组织,如果是緩变信号, 则输出到所述等带宽余弦调制滤波器; 如果是快变信号, 则输出到所述多分辨率综合模块; 所述多分辨率综合模块, 用于将多分辨率时频系数映射成等带宽的余弦调制滤波系数, 并输出到所述等带宽余弦调制滤波器;
所述等带宽余弦调制滤波器, 用于对信号进行滤波, 获得时域脉冲编码调制输出。
PCT/CN2003/000790 2003-09-17 2003-09-17 Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio WO2005027094A1 (fr)

Priority Applications (6)

Application Number Priority Date Filing Date Title
PCT/CN2003/000790 WO2005027094A1 (fr) 2003-09-17 2003-09-17 Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio
JP2005508847A JP2007506986A (ja) 2003-09-17 2003-09-17 マルチ解像度ベクトル量子化のオーディオcodec方法及びその装置
AU2003264322A AU2003264322A1 (en) 2003-09-17 2003-09-17 Method and device of multi-resolution vector quantilization for audio encoding and decoding
EP03818611A EP1667109A4 (en) 2003-09-17 2003-09-17 METHOD AND DEVICE FOR QUANTIFYING MULTI-RESOLUTION VECTOR FOR AUDIO CODING AND DECODING
US10/572,769 US20070067166A1 (en) 2003-09-17 2003-09-17 Method and device of multi-resolution vector quantilization for audio encoding and decoding
CNA038270625A CN1839426A (zh) 2003-09-17 2003-09-17 多分辨率矢量量化的音频编解码方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2003/000790 WO2005027094A1 (fr) 2003-09-17 2003-09-17 Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio

Publications (1)

Publication Number Publication Date
WO2005027094A1 true WO2005027094A1 (fr) 2005-03-24

Family

ID=34280738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2003/000790 WO2005027094A1 (fr) 2003-09-17 2003-09-17 Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio

Country Status (6)

Country Link
US (1) US20070067166A1 (zh)
EP (1) EP1667109A4 (zh)
JP (1) JP2007506986A (zh)
CN (1) CN1839426A (zh)
AU (1) AU2003264322A1 (zh)
WO (1) WO2005027094A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009511966A (ja) * 2005-10-12 2009-03-19 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ マルチチャンネル音声信号の時間的および空間的整形
JP2009512895A (ja) * 2005-10-21 2009-03-26 クゥアルコム・インコーポレイテッド スペクトル・ダイナミックスに基づく信号コーディング及びデコーディング
JP2009514034A (ja) * 2005-10-31 2009-04-02 エルジー エレクトロニクス インコーポレイティド 信号処理方法及びその装置、並びにエンコード、デコード方法及びその装置
US8392176B2 (en) 2006-04-10 2013-03-05 Qualcomm Incorporated Processing of excitation in audio coding and decoding
US8428957B2 (en) 2007-08-24 2013-04-23 Qualcomm Incorporated Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands
US9105264B2 (en) 2009-07-31 2015-08-11 Panasonic Intellectual Property Management Co., Ltd. Coding apparatus and decoding apparatus
CN109087654A (zh) * 2014-03-24 2018-12-25 杜比国际公司 对高阶高保真立体声信号应用动态范围压缩的方法和设备
CN110310659A (zh) * 2013-07-22 2019-10-08 弗劳恩霍夫应用研究促进协会 用重构频带能量信息值解码或编码音频信号的设备及方法
CN112071297A (zh) * 2020-09-07 2020-12-11 西北工业大学 一种矢量声的自适应滤波方法

Families Citing this family (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW594674B (en) * 2003-03-14 2004-06-21 Mediatek Inc Encoder and a encoding method capable of detecting audio signal transient
JP4579930B2 (ja) * 2004-01-30 2010-11-10 フランス・テレコム 次元ベクトルおよび可変解像度量子化
US8345890B2 (en) 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US8194880B2 (en) 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US8744844B2 (en) 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US9185487B2 (en) 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US8150065B2 (en) 2006-05-25 2012-04-03 Audience, Inc. System and method for processing an audio signal
US8934641B2 (en) * 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
CN101308655B (zh) * 2007-05-16 2011-07-06 展讯通信(上海)有限公司 一种音频编解码方法与装置
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
US20110135007A1 (en) * 2008-06-30 2011-06-09 Adriana Vasilache Entropy-Coded Lattice Vector Quantization
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
EP2144230A1 (en) 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Low bitrate audio encoding/decoding scheme having cascaded switches
MX2011003824A (es) * 2008-10-08 2011-05-02 Fraunhofer Ges Forschung Esquema de codificacion/decodificacion de audio conmutado de resolucion multiple.
CN101436406B (zh) * 2008-12-22 2011-08-24 西安电子科技大学 音频编解码器
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
US8718290B2 (en) 2010-01-26 2014-05-06 Audience, Inc. Adaptive noise reduction using level cues
US9378754B1 (en) 2010-04-28 2016-06-28 Knowles Electronics, Llc Adaptive spatial classifier for multi-microphone systems
US8400876B2 (en) * 2010-09-30 2013-03-19 Mitsubishi Electric Research Laboratories, Inc. Method and system for sensing objects in a scene using transducer arrays and coherent wideband ultrasound pulses
CN104620315B (zh) * 2012-07-12 2018-04-13 诺基亚技术有限公司 一种矢量量化的方法及装置
FR3000328A1 (fr) * 2012-12-21 2014-06-27 France Telecom Attenuation efficace de pre-echos dans un signal audionumerique
EP2804176A1 (en) * 2013-05-13 2014-11-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio object separation from mixture signal using object-specific time/frequency resolutions
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
SG10201609218XA (en) 2013-10-31 2016-12-29 Fraunhofer Ges Forschung Audio Decoder And Method For Providing A Decoded Audio Information Using An Error Concealment Modifying A Time Domain Excitation Signal
PL3285254T3 (pl) 2013-10-31 2019-09-30 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Dekoder audio i sposób dostarczania zdekodowanej informacji audio z wykorzystaniem ukrywania błędów na bazie sygnału wzbudzenia w dziedzinie czasu
WO2015072883A1 (en) * 2013-11-18 2015-05-21 Baker Hughes Incorporated Methods of transient em data compression
KR102626320B1 (ko) 2014-03-28 2024-01-17 삼성전자주식회사 선형예측계수 양자화방법 및 장치와 역양자화 방법 및 장치
CN112927703A (zh) 2014-05-07 2021-06-08 三星电子株式会社 对线性预测系数量化的方法和装置及解量化的方法和装置
US9978388B2 (en) 2014-09-12 2018-05-22 Knowles Electronics, Llc Systems and methods for restoration of speech components
WO2016142002A1 (en) * 2015-03-09 2016-09-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal
US10063892B2 (en) * 2015-12-10 2018-08-28 Adobe Systems Incorporated Residual entropy compression for cloud-based video applications
GB2547877B (en) * 2015-12-21 2019-08-14 Graham Craven Peter Lossless bandsplitting and bandjoining using allpass filters
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
WO2018201112A1 (en) * 2017-04-28 2018-11-01 Goodwin Michael M Audio coder window sizes and time-frequency transformations
US10891960B2 (en) * 2017-09-11 2021-01-12 Qualcomm Incorproated Temporal offset estimation
DE102017216972B4 (de) * 2017-09-25 2019-11-21 Carl Von Ossietzky Universität Oldenburg Verfahren und Vorrichtung zur rechnergestützten Verarbeitung von Audiosignalen
US11423313B1 (en) * 2018-12-12 2022-08-23 Amazon Technologies, Inc. Configurable function approximation based on switching mapping table content
CN115979261B (zh) * 2023-03-17 2023-06-27 中国人民解放军火箭军工程大学 一种多惯导系统的轮转调度方法、系统、设备及介质
CN118296306B (zh) * 2024-05-28 2024-09-06 小舟科技有限公司 基于分形维数增强的脑电信号处理方法、装置及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5473727A (en) * 1992-10-31 1995-12-05 Sony Corporation Voice encoding method and voice decoding method
CN1222997A (zh) * 1996-07-01 1999-07-14 松下电器产业株式会社 音频信号编码方法、解码方法,及音频信号编码装置、解码装置
CN1224523A (zh) * 1997-05-15 1999-07-28 松下电器产业株式会社 音频信号编码装置和译码装置以及音频信号编码和译码方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1180126B (it) * 1984-11-13 1987-09-23 Cselt Centro Studi Lab Telecom Procedimento e dispositivo per la codifica e decodifica del segnale vocale mediante tecniche di quantizzazione vettoriale
IT1184023B (it) * 1985-12-17 1987-10-22 Cselt Centro Studi Lab Telecom Procedimento e dispositivo per la codifica e decodifica del segnale vocale mediante analisi a sottobande e quantizzazione vettorariale con allocazione dinamica dei bit di codifica
IT1195350B (it) * 1986-10-21 1988-10-12 Cselt Centro Studi Lab Telecom Procedimento e dispositivo per la codifica e decodifica del segnale vocale mediante estrazione di para metri e tecniche di quantizzazione vettoriale
JPH07212239A (ja) * 1993-12-27 1995-08-11 Hughes Aircraft Co ラインスペクトル周波数のベクトル量子化方法および装置
TW321810B (zh) * 1995-10-26 1997-12-01 Sony Co Ltd
JP3353266B2 (ja) * 1996-02-22 2002-12-03 日本電信電話株式会社 音響信号変換符号化方法
JP3849210B2 (ja) * 1996-09-24 2006-11-22 ヤマハ株式会社 音声符号化復号方式
US6363338B1 (en) * 1999-04-12 2002-03-26 Dolby Laboratories Licensing Corporation Quantization in perceptual audio coders with compensation for synthesis filter noise spreading
US6298322B1 (en) * 1999-05-06 2001-10-02 Eric Lindemann Encoding and synthesis of tonal audio signals using dominant sinusoids and a vector-quantized residual tonal signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5473727A (en) * 1992-10-31 1995-12-05 Sony Corporation Voice encoding method and voice decoding method
CN1222997A (zh) * 1996-07-01 1999-07-14 松下电器产业株式会社 音频信号编码方法、解码方法,及音频信号编码装置、解码装置
CN1224523A (zh) * 1997-05-15 1999-07-28 松下电器产业株式会社 音频信号编码装置和译码装置以及音频信号编码和译码方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAN XINGDE ZHU XIAOMING A.H. ET AL.: "EAC audio coding technology", ELECTRONIC AUDIO TECHNOLOGY, February 2003 (2003-02-01), pages 11 - 15 *
See also references of EP1667109A4 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009511966A (ja) * 2005-10-12 2009-03-19 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ マルチチャンネル音声信号の時間的および空間的整形
US8644972B2 (en) 2005-10-12 2014-02-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Temporal and spatial shaping of multi-channel audio signals
US9361896B2 (en) 2005-10-12 2016-06-07 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Temporal and spatial shaping of multi-channel audio signal
JP2009512895A (ja) * 2005-10-21 2009-03-26 クゥアルコム・インコーポレイテッド スペクトル・ダイナミックスに基づく信号コーディング及びデコーディング
US8027242B2 (en) 2005-10-21 2011-09-27 Qualcomm Incorporated Signal coding and decoding based on spectral dynamics
JP2009514034A (ja) * 2005-10-31 2009-04-02 エルジー エレクトロニクス インコーポレイティド 信号処理方法及びその装置、並びにエンコード、デコード方法及びその装置
US8392176B2 (en) 2006-04-10 2013-03-05 Qualcomm Incorporated Processing of excitation in audio coding and decoding
US8428957B2 (en) 2007-08-24 2013-04-23 Qualcomm Incorporated Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands
US9105264B2 (en) 2009-07-31 2015-08-11 Panasonic Intellectual Property Management Co., Ltd. Coding apparatus and decoding apparatus
CN110310659A (zh) * 2013-07-22 2019-10-08 弗劳恩霍夫应用研究促进协会 用重构频带能量信息值解码或编码音频信号的设备及方法
US11735192B2 (en) 2013-07-22 2023-08-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
US11769512B2 (en) 2013-07-22 2023-09-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for decoding and encoding an audio signal using adaptive spectral tile selection
US11769513B2 (en) 2013-07-22 2023-09-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for decoding or encoding an audio signal using energy information values for a reconstruction band
CN110310659B (zh) * 2013-07-22 2023-10-24 弗劳恩霍夫应用研究促进协会 用重构频带能量信息值解码或编码音频信号的设备及方法
US11922956B2 (en) 2013-07-22 2024-03-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for encoding or decoding an audio signal with intelligent gap filling in the spectral domain
US11996106B2 (en) 2013-07-22 2024-05-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Apparatus and method for encoding and decoding an encoded audio signal using temporal noise/patch shaping
CN109087654A (zh) * 2014-03-24 2018-12-25 杜比国际公司 对高阶高保真立体声信号应用动态范围压缩的方法和设备
CN109087654B (zh) * 2014-03-24 2023-04-21 杜比国际公司 对高阶高保真立体声信号应用动态范围压缩的方法和设备
US11838738B2 (en) 2014-03-24 2023-12-05 Dolby Laboratories Licensing Corporation Method and device for applying Dynamic Range Compression to a Higher Order Ambisonics signal
CN112071297A (zh) * 2020-09-07 2020-12-11 西北工业大学 一种矢量声的自适应滤波方法
CN112071297B (zh) * 2020-09-07 2023-11-10 西北工业大学 一种矢量声的自适应滤波方法

Also Published As

Publication number Publication date
CN1839426A (zh) 2006-09-27
EP1667109A4 (en) 2007-10-03
US20070067166A1 (en) 2007-03-22
AU2003264322A1 (en) 2005-04-06
EP1667109A1 (en) 2006-06-07
JP2007506986A (ja) 2007-03-22

Similar Documents

Publication Publication Date Title
WO2005027094A1 (fr) Procede et dispositif de quantification de vecteur multi-resolution multiple pour codage et decodage audio
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
CA2608030C (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
CN100395817C (zh) 编码设备、解码设备和解码方法
US7343287B2 (en) Method and apparatus for scalable encoding and method and apparatus for scalable decoding
KR101343267B1 (ko) 주파수 세그먼트화를 이용한 오디오 코딩 및 디코딩을 위한 방법 및 장치
US6182034B1 (en) System and method for producing a fixed effort quantization step size with a binary search
US6029126A (en) Scalable audio coder and decoder
CN102436819B (zh) 无线音频压缩、解压缩方法及音频编码器和音频解码器
US9037454B2 (en) Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT)
WO2005096274A1 (fr) Dispositif et procede de codage/decodage audio ameliores
US7512539B2 (en) Method and device for processing time-discrete audio sampled values
CN101223577A (zh) 对低比特率音频信号进行编码/解码的方法和设备
CN1264533A (zh) 多声道低比特率编码解码方法和设备
CN101162584A (zh) 使用带宽扩展技术对音频信号编码和解码的方法和设备
KR20130047643A (ko) 통신 시스템에서 신호 코덱 장치 및 방법
Kumar et al. The optimized wavelet filters for speech compression
JPH10276095A (ja) 符号化器及び復号化器
JP3557164B2 (ja) オーディオ信号符号化方法及びその方法を実行するプログラム記憶媒体
WO2005096508A1 (fr) Equipement de codage et de decodage audio ameliore, procede associe
James et al. A comparative study of speech compression using different transform techniques
CN100538821C (zh) 快变音频信号的编解码方法
Hosny et al. Novel techniques for speech compression using wavelet transform
AU2011205144B2 (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
Mandridake et al. Joint wavelet transform and vector quantization for speech coding

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 03827062.5

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE EG ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR KZ LK LR LS LT LU LV MA MD MG MK MW MX MZ NI NO NZ OM PG PH PL RO RU SC SD SE SG SK SL SY TJ TM TR TT TZ UA UG US UZ VC VN YU ZM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR HU IE IT LU NL PT RO SE SI SK TR BF BJ CF CI CM GA GN GQ GW ML MR NE SN TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2003818611

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2005508847

Country of ref document: JP

WWP Wipo information: published in national office

Ref document number: 2003818611

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2007067166

Country of ref document: US

Ref document number: 10572769

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 10572769

Country of ref document: US