WO2009059564A1 - A multi-rate speech audio encoding method - Google Patents

A multi-rate speech audio encoding method Download PDF

Info

Publication number
WO2009059564A1
WO2009059564A1 PCT/CN2008/072946 CN2008072946W WO2009059564A1 WO 2009059564 A1 WO2009059564 A1 WO 2009059564A1 CN 2008072946 W CN2008072946 W CN 2008072946W WO 2009059564 A1 WO2009059564 A1 WO 2009059564A1
Authority
WO
WIPO (PCT)
Prior art keywords
spectrum
signal
index value
grid
bits
Prior art date
Application number
PCT/CN2008/072946
Other languages
French (fr)
Chinese (zh)
Inventor
Zexin Liu
Fuwei Ma
Wei Xiao
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2009059564A1 publication Critical patent/WO2009059564A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • a multi-rate speech and audio coding method The present application claims the priority of a Chinese patent application filed on November 5, 2007, with the application number 200710165110.3, and the invention is a multi-rate speech and audio coding method, the entire contents thereof. This is incorporated herein by reference.
  • the present invention relates to coding techniques, and more particularly to a method of multi-rate speech and audio coding.
  • FIG. 1 is a flow chart of a prior art lattice vector quantization method. As shown in Figure 1, in general, lattice vector quantization includes the following steps:
  • Step 101 Find a corresponding grid point according to the principle of proximity to the frequency domain signal.
  • Step 102 Determine an index value of the corresponding grid point according to the size of the spectrum energy corresponding to each grid point and the total number of bits.
  • the index value when Ck is in the basic codebook, the index value includes the codebook index value nk and the corresponding codeword index value Ik; when Ck is not in the basic codebook, Voronoi is extended for Ck, and the index value includes the codebook at this time.
  • an index kv of the extended codebook is also included.
  • Step 103 When the number of bits is insufficient, the elements of the grid points corresponding to the less energy frequency are all forcibly set to zero.
  • step 104 the index values of the grid points are written into the code stream in order from low frequency to high frequency.
  • step 105 At the decoding end, the quantized spectrum sequence is sequentially decoded from the low frequency to the high frequency according to the decoded index value.
  • the human auditory system itself also has the function of signal masking. That is, when the signal sound intensity is less than a certain threshold, even if there is no masking of other signals, the human auditory system does not feel the existence of the signal, and such masking is called absolute masking. It can be seen from the test that the absolute masking domain value decreases with increasing frequency in the range of 0 to 500 Hz, and the absolute masking domain value is almost unchanged in the range of 500 to 5000 Hz.
  • some elements of the grid may be forced to zero due to the limitation of the total number of bits.
  • the quality of the code will be greatly reduced. Therefore, it is necessary to set which grid points are set to 0 according to a decision, and the criteria for which grid points are retained.
  • the encoding side encodes the lattice points of the transform domain in a storage according to the order of the spectral energy from the largest to the smallest, in the storage, the index value of each lattice point is solved from first to last.
  • the elements of the grid corresponding to the spectrum with relatively small spectral energy ie, the grid points placed at the relatively later position
  • their index values are also obtained.
  • the parameters of the transform domain coding at the encoding end are that all index values (index values of 0 and not 0) are written in the code stream in low frequency to high frequency order, so when the number of bits is insufficient, at the decoding end, the code is Only a small amount of low frequency information can be recovered in the stream, and some of these low frequency information is set to 0 spectrum, which causes some elements of the grid to be not set to 0, but there are not enough bits to encode, so that when When a part of the bits is added to increase the code rate, the quality of the output voice or audio signal is not significantly improved.
  • the index value of the grid point is obtained, the importance of the grid point is determined according to the magnitude of the spectrum energy.
  • the element of the grid point with the smaller spectral energy is set to 0; Not necessarily an important component, such a decision criterion may set the element of the grid corresponding to some important components to 0, affecting the quality of the output signal.
  • whether the signal is masked or not depends on the magnitude of the spectral energy, and to some extent, depends on the difference between the masked signal and the masked signal, and there is no such coding method. Consider this difference. Summary of the invention
  • the main purpose of the embodiments of the present invention is to provide a multi-rate speech and audio coding.
  • the method thereby improving the quality of the output speech (audio) signal when the bits in the transform domain coding are insufficient.
  • the technical solution in the embodiment of the present invention is implemented as follows:
  • a method for multi-rate speech and audio coding comprising:
  • the index values corresponding to the grid points are programmed into the code stream according to the order in which the first ratio is large to small.
  • a method for multi-rate speech and audio coding is provided in an embodiment of the present invention.
  • the difference signal between the synthesized speech obtained by encoding and then locally decoding the input signal and the input signal is solved in advance, and then the respective spectral vectors corresponding to the difference signal are obtained nearest to each other.
  • the corresponding index value of the grid point is programmed into the code stream, and according to the function of the perceptual weighting filter, the more important information is finely quantized and preferentially coded into the code stream, and the unimportant information is roughly quantized, so that the decoding end When the number of bits is insufficient, more important information can be decoded, thereby improving the quality of the decoded speech.
  • FIG. 2 is a flowchart of a multi-rate speech and audio encoding method according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a multi-rate speech and audio encoding method according to another embodiment of the present invention.
  • FIG. 2 is a flowchart of a multi-rate speech and audio encoding method according to an embodiment of the present invention. Specifically, this is The example includes the following steps:
  • Step 201 The encoder receives the input signal.
  • Step 202 The encoding end performs CELP (Code Excited Linear Pre-diction) encoding on the input signal.
  • CELP Code Excited Linear Pre-diction
  • the coding method is not limited to CELP, but other methods can be used.
  • the input signal can be encoded in two layers (for example, L1 layer and L2 layer).
  • the input signal may be encoded without being layered or divided into one or more layers. How many layers are encoded, that is, how many different rates of speech coding can be achieved. The number of specific layers can be made according to actual needs.
  • Step 203 Perform local decoding on the CELP-encoded signal to obtain a decoding signal.
  • Step 204 Solve the spectrum according to the decoded signal obtained in step 203. Specifically, the synthesized speech of the first two layers (the spectrum Freq_R2 of the audio M-word) in the decoded signal is solved according to the decoded signal. How many layers are programmed in step 202, and how many layers should be solved in this step.
  • Step 205 solving a difference signal between the input signal and the synthesized speech signal of the first two layers of the decoded signal obtained in step 203, and solving the MDCT coefficient of the difference signal, that is, the spectrum Freq_err.
  • Step 206 After performing the operation (for example, rounding operation) on the MDCT coefficients in step 205, a plurality of grid points closest to the corresponding spectrum vector distance are obtained. If you use RE8, you can get 35 grid points.
  • the operation for example, rounding operation
  • step 207 the frequency spectrum Freq_R2 in step 204 and the Freq_err difference value in step 205 are transformed with the Freq_err to obtain a ratio Ratio[k], and each grid point in step 206 is divided into N regions in the order of the preceding and succeeding.
  • the formula for solving is as follows:
  • Ratioik] ⁇ Freq _R2[l] -Fre q _ err[l ⁇ ⁇ )
  • Ratio [k] each ratio in the array uniquely corresponds to a grid point in step 206.
  • an array Ratio[k] consisting of 35 ratios can be obtained.
  • the obtained plurality of lattice points are divided into N regions in the order of the order in the array Ratio[k], and N is an integer greater than or equal to 1.
  • Step 208 sort the grid points.
  • the lattice points in the above N regions are arranged according to the auditory characteristics of the human ear, and the signal with a small possibility of masking is placed in front.
  • the cells are arranged in reverse order, the grid points in the last region of the array Ratio[k] are arranged at the forefront, the grid points in the first region are arranged at the end, and the other regions are arranged accordingly. Then arrange the grid points in each area in the order of Ratio[k] from largest to smallest. Place the reordered grid points in a new array in order.
  • the ratio[k] is divided into two regions, the first n ratio is the first region, and the later (35-n) ratio is the second region.
  • n is an integer greater than or equal to 1.
  • the grid points in step 206 are sorted in descending order, placed in front of an array of 35 elements R[k] (35-n) Among the elements; the first area is similar to the second area, and the grid points in step 306 are sorted according to the value of Ratio[k] in the first area in descending order. In the last n elements of the array R[k].
  • how many values are taken as the first region should be preset according to the actual application. Multiple regions are sorted according to the coding characteristics of the first few layers of the encoder or the characteristics of the MDCT coefficients of the difference signal, and the region where the corresponding lattice point of the signal with low probability of being masked is ranked first. The area where the corresponding grid point of the signal that is likely to be masked is ranked last.
  • Step 209 Solving the spectrum W_Freq of the perceptual weighting filter corresponding to the difference signal.
  • A represents a linear prediction coefficient, reflecting the spectral envelope value of the high frequency band
  • z represents the frequency domain
  • ⁇ and ⁇ represent weighting factors, which are generally constant.
  • the spectrum W-Freq is obtained by MDCT transform based on the perceptual weighting filter ⁇ ( ⁇ ).
  • Step 211 Solve the index value of each grid point. According to the data in the array in the above step S208 and the total number of bits that can be utilized, the index value of the corresponding grid point is obtained. When the number of bits is insufficient, the elements in the last m grid points in the array in step 208 are set to 0, and the index values of these grid points are also 0. m is a predetermined integer greater than or equal to 1. Among them, the value of m can be set in advance according to the total number of bits.
  • Steps 210 and 211 enumerated in this embodiment are not limited to the above sequence, and step 211 may be operated first, and then step 210 is operated.
  • Step 212 Write the index values of the corresponding grid points into the code stream according to the order of the Rat[k] values from large to small.
  • the index value of the corresponding grid point of the Rat[k] value (the more important signal) is first written into the code stream, and the index value of the corresponding grid point with a small Rat[k] value is written into the code stream.
  • the invention is not limited to the above embodiments.
  • the present invention is not limited to the RE8 grid used in the present embodiment, and other methods such as Z8 grid can be used.
  • Sorting the grid points in step 208 may also be based on other principles.
  • the grid order may be determined by using the global index order determination mode, that is, all the grid points are not divided into regions, and only the size of Ratio[k] is performed. arrangement. The specific need to use the scheme to sort the grid points, you can choose according to actual needs.
  • the above steps S201 to S212 in this embodiment are not limited to the above order.
  • the decision criterion represented by the formula (1) is more in line with the principle of the masking effect: According to the principle of the masking effect, if the difference between Freq_R2 and Freq_err is smaller, then their The closer the frequency is, the smaller the possibility that Freq_err is masked off. In addition, in the case where the above difference is the same, the larger the ratio of the above difference to the locally decoded Freq-R2, the possibility that Freq_err is masked. The sex is smaller.
  • the method in this embodiment it can be ensured that the guiding item corresponding to the signal that is less likely to be masked is not forcibly set to 0, thereby ensuring that when the number of bits is insufficient, the more important information will be compared. Finely quantized and prioritized into the code stream, not important The information will be roughly quantified.
  • the pilot index value is determined in the code stream according to the locally decoded CELP-encoded synthesized speech (audio) spectrum Freq_R2 and the ratio of the spectrum W-Fre of the perceptual weighting filter.
  • the reason for the order is: According to the function of the perceptual weighting filter, a larger distortion is allocated at a larger spectral energy of the input signal, and a distortion is minimized at a smaller spectral energy, so that for a CELP-encoded signal, This will result in a relatively coarse quantization at a larger Rat, which is the focus of the lattice vector quantization in this embodiment.
  • the method can be made.
  • the number of bits at the decoding end is insufficient, more important information can be decoded, thereby improving the quality of the decoded speech.
  • FIG. 3 is a flowchart of a multi-rate speech and audio encoding method according to another embodiment of the present invention. As shown in FIG. 3, the multi-rate speech and audio encoding method in this embodiment includes the following steps:
  • Step 301 Solving the MDCT coefficients of the R3 and R4 layers, that is, solving the MDCT coefficients of the first three layers and the first four layers.
  • the 35 frequency modules corresponding to the MDCT coefficients are divided into two regions of 0 ⁇ 2kHz and 2 ⁇ 7kHz according to the spectrum range.
  • the spectrum range of the first 10 frequency modules is 0 ⁇ 2kHz
  • the spectrum range of the last 25 frequency modules is 2 ⁇ 7kHz. How many frequency modules in the specific two regions are different in different embodiments.
  • Step 302 Acquire a grid point whose spectrum range is 2 ⁇ 7 kHz for processing.
  • Step 303 Determine whether the total number of bits is sufficient. If yes, go to step 305; otherwise, go to step 304;
  • Step 304 Set the value of the grid point whose spectrum energy is 2 ⁇ 7 kHz to be smaller than 0. That is, the grid points with the spectrum range of 2 ⁇ 7 kHz are sorted according to the order of the spectrum energy from the largest to the smallest, and the values of the grid points with the smaller n spectrum energy are set to 0, and n is an integer greater than or equal to 1. The n can be set in advance according to actual application conditions.
  • Step 305 the index value of the grid point in the spectrum range of 2 ⁇ 7 kHz is solved, and the corresponding index value of the grid point where the grid element is set to 0 is also set to 0.
  • Step 306 Acquire a grid point whose spectrum range is 0 ⁇ 2 kHz for processing.
  • Step 307 determining whether the total number of bits is sufficient. Determining whether the total number of bits is sufficient, if yes, executing step 309; otherwise, performing step 308;
  • Step 308 in the case that the total number of bits is insufficient, according to the total number of available bits, the elements of the n grid points having a small spectral energy range of 0 to 2 kHz are set to 0. That is, the grid points with the spectrum range of 0 ⁇ 2kHz are sorted according to the order of the spectrum energy from the largest to the smallest, and the elements of the grids with the smaller m spectrum energy are set to 0, and m is an integer greater than or equal to 1.
  • the m can be set in advance according to the actual application.
  • Step 309 the index value of each grid point in 0 ⁇ 2 kHz is solved, and the corresponding index value of the grid point where the grid element is set to 0 is also set to 0.
  • Step 310 Program the obtained index values into the code stream according to the order of the grid points. Specifically, the grid points corresponding to the spectrum of 2 to 7 kHz are ranked first, and the grid points corresponding to the spectrum of 0 to 2 kHz are ranked. Thus, each index value can be programmed into the code stream according to its importance in decoding.
  • Step 311 ending the encoding.
  • the index value of the boot item corresponding to the MDCT language of 2 ⁇ 7 kHz can be placed in the front position of the code stream, and the index value of the boot item corresponding to the MDCT language of 0 ⁇ 2 kHz can be placed behind, forming a Complete stream of code.
  • Step 301 in the present invention is not limited to solving the MDCT coefficients of the R3 and R4 layers, and the corresponding first few layers can be selected according to actual needs to solve the MDCT coefficients.
  • the frequency range of the 35 frequency modules corresponding to the MDCT coefficients divided into two parts according to the spectrum range can also be selected according to the actual situation.
  • the above sorting method is applicable to an embedded multi-rate speech coding algorithm with low-level CELP coding and high-level transform coding.
  • 2 kHz is selected as the demarcation point because CELP coding has a good effect on the processing of low frequency signals of 0 to 2 kHz; at the same time, since the signal processed by the higher layer is the difference between the original input signal and the locally decoded lower layer signal.
  • the signal signal of the value signal so at high Of the signals to be processed by the layer, spectral signals above 2 kHz are more important information.
  • the encoding method for the CELP encoding portion is the same as the encoding method for the CELP encoding portion shown in Fig. 2. I will not repeat them here.
  • mode switching is performed, and it is decided whether to select the global index order determining mode or the block index order determining mode.
  • the mode switching of the index sorting is performed, and it is determined whether the static mode is sorted or the dynamic mode is sorted.
  • the signal-to-noise ratio or a non-zero value that can be obtained at the codec side determines the order of the lattice code index in the code stream.

Abstract

A multi-rate speech audio encoding method includes: calculating a difference value signal between a synthetic speech obtained by decoding the coded input signal and the input signal; calculating a index value corresponding to the nearest lattice from each frequency spectrum vector corresponding to the difference value; calculating first ratio value of the perception weighting filter corresponding to the difference value signal to the frequency spectrum of the synthetic speech; incorporating the index corresponding to the lattice to the code stream according to the degressive order of the first ratio value (212).

Description

一种多速率语音频编码的方法 本申请要求了 2007年 11月 5 日提交的、 申请号为 200710165110.3、 发 明名称为一种多速率语音频编码的方法的中国专利申请的优选权, 其全部内 容通过引用结合在本申请中。  A multi-rate speech and audio coding method. The present application claims the priority of a Chinese patent application filed on November 5, 2007, with the application number 200710165110.3, and the invention is a multi-rate speech and audio coding method, the entire contents thereof. This is incorporated herein by reference.
技术领域 Technical field
本发明涉及编码技术, 尤其是指一种多速率语音频编码的方法。  The present invention relates to coding techniques, and more particularly to a method of multi-rate speech and audio coding.
背景技术 Background technique
在目前的多速率语音频编码中, 当输入信号较符合音乐特性或码率较高 的情况下(例如, 在 AMR-WB+、 G.729.1和 G.VBR中), 多釆用变换域编码, 即通过变换方法, 例如, 修正的离散余弦变换(MDCT )或快速傅立叶(FFT ) 变换等, 将时域信号变换到频域。 而在对变换域编码的参数进行量化时, 则 多釆用格型矢量量化技术。 图 1 为现有技术中格型矢量量化方法的流程图。 如图 1所示, 一般来说, 格型矢量量化包括如下所述的步骤:  In the current multi-rate speech and audio coding, when the input signal is more in line with the musical characteristics or the code rate is higher (for example, in AMR-WB+, G.729.1, and G.VBR), multi-purpose transform domain coding is used. That is, the time domain signal is transformed into the frequency domain by a transform method, for example, a modified discrete cosine transform (MDCT) or a fast Fourier transform (FFT) transform. When the parameters of the transform domain coding are quantized, the lattice vector quantization technique is used. FIG. 1 is a flow chart of a prior art lattice vector quantization method. As shown in Figure 1, in general, lattice vector quantization includes the following steps:
步骤 101 , 对频域信号根据就近原则找到相应的格点。  Step 101: Find a corresponding grid point according to the principle of proximity to the frequency domain signal.
即对输入频谱矢量根据就近原则, 具体来说, 在 8维的戈塞特(Gosset ) 点阵(称为 RE8格)或 Z8格或 Z16格等, 中找到与其离得最近的格点 Ck; 步骤 102,根据每个格点所对应的频谱能量的大小和总比特数, 求取相应 格点的索引值。  That is, the input spectrum vector is found according to the principle of proximity, specifically, in the 8-dimensional Gosset dot matrix (called RE8 grid) or Z8 grid or Z16 grid, etc., the grid point Ck closest to it is found; Step 102: Determine an index value of the corresponding grid point according to the size of the spectrum energy corresponding to each grid point and the total number of bits.
其中, 当 Ck在基本码书中, 索引值包括码书索引值 nk和相应码字索引 值 Ik; 当 Ck不在基本码书中时, 对 Ck要进行 Voronoi扩展, 此时索引值除 了包括码书索引值 nk和码字索引值 Ik外, 还包括扩展码书的索引 kv。  Wherein, when Ck is in the basic codebook, the index value includes the codebook index value nk and the corresponding codeword index value Ik; when Ck is not in the basic codebook, Voronoi is extended for Ck, and the index value includes the codebook at this time. In addition to the index value nk and the codeword index value Ik, an index kv of the extended codebook is also included.
步骤 103 , 当比特数不足时, 能量较小的频语对应的格点的元素将全部被 强行置 0。  Step 103: When the number of bits is insufficient, the elements of the grid points corresponding to the less energy frequency are all forcibly set to zero.
步骤 104, 按照从低频到高频的顺序, 将这些格点的索引值写进码流。 步骤 105, 在解码端,根据解码出的索引值从低频到高频顺序解码出量化 后的频谱序列。 由信号的掩蔽效应可知, 声强大的信号可以掩蔽掉其周围声强小的信号, 使得人的听觉系统感觉不到被掩蔽信号的存在; 同时, 人的听觉系统本身也 具有信号掩蔽的作用, 即当信号声强小于一定的门限时, 即使没有其它信号 的掩蔽, 人的听觉系统也感觉不到该信号的存在, 这种掩蔽称为绝对掩蔽。 通过试验检测可知, 在 0 ~ 500Hz的范围内, 绝对掩蔽域值随着频率的增加而 减小, 而在 500~5000Hz的范围内, 绝对掩蔽域值几乎不变。 In step 104, the index values of the grid points are written into the code stream in order from low frequency to high frequency. Step 105: At the decoding end, the quantized spectrum sequence is sequentially decoded from the low frequency to the high frequency according to the decoded index value. It can be known from the masking effect of the signal that the powerful signal can mask the signal with small sound intensity around it, so that the human auditory system can not feel the existence of the masked signal; at the same time, the human auditory system itself also has the function of signal masking. That is, when the signal sound intensity is less than a certain threshold, even if there is no masking of other signals, the human auditory system does not feel the existence of the signal, and such masking is called absolute masking. It can be seen from the test that the absolute masking domain value decreases with increasing frequency in the range of 0 to 500 Hz, and the absolute masking domain value is almost unchanged in the range of 500 to 5000 Hz.
在对输入信号的频谱进行格型矢量量化时, 由于总比特数的限制, 有些 格点的元素(例如, 输入信号的频谱所对应的量化值)可能要被强制置 0。 此 时, 如果把一些带有重要信息的频语模块对应的格点强制置 0, 将会大大降低 编码质量,因此需要根据一个决定将哪些格点置成 0,而保留哪些格点的准则。  When performing lattice vector quantization on the spectrum of the input signal, some elements of the grid (for example, the quantized value corresponding to the spectrum of the input signal) may be forced to zero due to the limitation of the total number of bits. At this time, if you set some grid points corresponding to the frequency module with important information to 0, the quality of the code will be greatly reduced. Therefore, it is necessary to set which grid points are set to 0 according to a decision, and the criteria for which grid points are retained.
在上述的编码方法中, 由于在编码端根据频谱能量从大到小的顺序, 将 变换域编码的格点放在一个存储中, 在存储中, 从先到后求解每个格点的索 引值, 当比特数不足时, 频谱能量相对较小的频谱对应的格点 (即放在存储 相对较后面的位置的格点)的元素将全部被强行置 0 , 求得的它们的索引值也 为 0。 编码端的变换域编码的参数是按低频到高频顺序把所有索引值(为 0的 和非为 0 的索引值)写在码流中, 因此, 当比特数不足时, 在解码端, 从码 流中只能恢复出少量的低频信息, 并且这些低频信息中, 有些是被置成 0 的 频谱, 从而导致虽然有些格点的元素没被置成 0, 但没有足够的比特来编码, 使得当增加部分比特来增加码率时, 输出的语音或音频信号质量也没有明显 提升。 另外, 当求取格点的索引值时, 根据频谱能量的大小决定格点的重要 性, 当比特数不够时, 将频谱能量较小的格点的元素置成 0; 但频谱能量大的 部分不一定是重要的成分, 这样的判决准则可能将一些重要成分对应的格点 的元素置成 0 , 影响输出信号的质量。 此外, 根据掩蔽效应的原理可知, 信号 是否被掩蔽不完全取决于频谱能量的大小, 在一定程度上, 还取决于掩蔽信 号和被掩蔽信号之间的差别, 而在上述的编码方法中也没有考虑这种差别。 发明内容  In the above encoding method, since the encoding side encodes the lattice points of the transform domain in a storage according to the order of the spectral energy from the largest to the smallest, in the storage, the index value of each lattice point is solved from first to last. When the number of bits is insufficient, the elements of the grid corresponding to the spectrum with relatively small spectral energy (ie, the grid points placed at the relatively later position) are all forcibly set to 0, and their index values are also obtained. 0. The parameters of the transform domain coding at the encoding end are that all index values (index values of 0 and not 0) are written in the code stream in low frequency to high frequency order, so when the number of bits is insufficient, at the decoding end, the code is Only a small amount of low frequency information can be recovered in the stream, and some of these low frequency information is set to 0 spectrum, which causes some elements of the grid to be not set to 0, but there are not enough bits to encode, so that when When a part of the bits is added to increase the code rate, the quality of the output voice or audio signal is not significantly improved. In addition, when the index value of the grid point is obtained, the importance of the grid point is determined according to the magnitude of the spectrum energy. When the number of bits is insufficient, the element of the grid point with the smaller spectral energy is set to 0; Not necessarily an important component, such a decision criterion may set the element of the grid corresponding to some important components to 0, affecting the quality of the output signal. In addition, according to the principle of masking effect, whether the signal is masked or not depends on the magnitude of the spectral energy, and to some extent, depends on the difference between the masked signal and the masked signal, and there is no such coding method. Consider this difference. Summary of the invention
有鉴于此, 本发明实施例的主要目的在于提供一种多速率语音频编码的 方法, 从而提高变换域编码中比特不足时输出语音(音频)信号的质量。 为达到上述目的, 本发明实施例中的技术方案是这样实现的: In view of this, the main purpose of the embodiments of the present invention is to provide a multi-rate speech and audio coding. The method thereby improving the quality of the output speech (audio) signal when the bits in the transform domain coding are insufficient. To achieve the above objective, the technical solution in the embodiment of the present invention is implemented as follows:
一种多速率语音频编码的方法, 该方法包括:  A method for multi-rate speech and audio coding, the method comprising:
求解将输入信号进行编码再本地解码得到的合成语音与所述输入信号的 差值信号;  Solving a difference signal of the synthesized speech obtained by encoding and then locally decoding the input signal and the input signal;
求解与所述差值信号相应的各个频谱矢量离得最近的格点相应的索引 值;  And an index value corresponding to the nearest lattice point of each spectrum vector corresponding to the difference signal is solved;
求解与所述差值信号相应的感觉加权滤波器的频谱与所述合成语音的频 谱的第一比值;  Solving a first ratio of a spectrum of the perceptual weighting filter corresponding to the difference signal to a spectrum of the synthesized speech;
根据所述第一比值从大到小的顺序将所述格点相应的索引值编入码流。 综上可知, 本发明的实施例中提供了一种多速率语音频编码的方法。 通 过使用本实施例中的方法, 预先求解将输入信号进行编码再本地解码得到的 合成语音与所述输入信号的差值信号, 再求解与所述差值信号相应的各个频 谱矢量离得最近的格点相应的索引值, 以及与所述差值信号相应的感觉加权 滤波器的频谱与所述合成语音的频谱的第一比值, 最后根据所述第一比值从 大到小的顺序将所述格点相应的索引值编入码流, 能够根据感觉加权滤波器 的作用, 将较重要的信息精细地量化并优先编入码流, 而不重要的信息将被 粗略地量化, 使得在解码端比特数不够时, 可以解码出较重要的信息, 从而 提高解码出的语音的质量。  The index values corresponding to the grid points are programmed into the code stream according to the order in which the first ratio is large to small. In summary, a method for multi-rate speech and audio coding is provided in an embodiment of the present invention. By using the method in this embodiment, the difference signal between the synthesized speech obtained by encoding and then locally decoding the input signal and the input signal is solved in advance, and then the respective spectral vectors corresponding to the difference signal are obtained nearest to each other. a corresponding index value of the grid point, and a first ratio of the spectrum of the perceptual weighting filter corresponding to the difference signal to the spectrum of the synthesized speech, and finally according to the order of the first ratio from large to small The corresponding index value of the grid point is programmed into the code stream, and according to the function of the perceptual weighting filter, the more important information is finely quantized and preferentially coded into the code stream, and the unimportant information is roughly quantized, so that the decoding end When the number of bits is insufficient, more important information can be decoded, thereby improving the quality of the decoded speech.
附图说明 DRAWINGS
图 1为现有技术中格型矢量量化方法的流程图;  1 is a flow chart of a lattice vector quantization method in the prior art;
图 2为本发明实施例多速率语音频编码方法的流程图;  2 is a flowchart of a multi-rate speech and audio encoding method according to an embodiment of the present invention;
图 3为本发明另一实施例中多速率语音频编码方法的流程图。  FIG. 3 is a flowchart of a multi-rate speech and audio encoding method according to another embodiment of the present invention.
具体实施方式 detailed description
为使本发明的目的、 技术方案和优点表达得更加清楚明白, 下面结合附 图及具体实施例对本发明再作进一步详细的说明。  The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
图 2 为本发明实施例中多速率语音频编码方法的流程图。 具体地, 本实 施例包括如下步骤: FIG. 2 is a flowchart of a multi-rate speech and audio encoding method according to an embodiment of the present invention. Specifically, this is The example includes the following steps:
步骤 201 , 编码端接收输入信号。  Step 201: The encoder receives the input signal.
步骤 202,编码端对输入信号进行 CELP ( Code Excited Linear Pre-diction, 码激励线性预测)编码。 编码方式并不局限于 CELP, 还能釆用其它方式。  Step 202: The encoding end performs CELP (Code Excited Linear Pre-diction) encoding on the input signal. The coding method is not limited to CELP, but other methods can be used.
在 CELP编码过程中, 可将输入信号分两层(例如, L1层和 L2层)分 别进行编码。 在本发明中, 对输入信号进行编码可以不分层, 或者分为一层 以上。 分多少层进行编码, 即可以实现多少种不同速率的语音编码。 具体的 分层的数目可以根据实际需要进行。  In the CELP encoding process, the input signal can be encoded in two layers (for example, L1 layer and L2 layer). In the present invention, the input signal may be encoded without being layered or divided into one or more layers. How many layers are encoded, that is, how many different rates of speech coding can be achieved. The number of specific layers can be made according to actual needs.
步骤 203 ,对所述进行 CELP编码后的信号在编码端进行本地解码得到解 码信号。  Step 203: Perform local decoding on the CELP-encoded signal to obtain a decoding signal.
步骤 204, 根据步骤 203中得到的解码信号求解频谱。 具体来说, 即根据 解码信号求解出解码信号中的前两层的合成语音(音频 M言号的频谱 Freq_R2。 步骤 202中编多少层, 在这个步骤中应当解多少层。  Step 204: Solve the spectrum according to the decoded signal obtained in step 203. Specifically, the synthesized speech of the first two layers (the spectrum Freq_R2 of the audio M-word) in the decoded signal is solved according to the decoded signal. How many layers are programmed in step 202, and how many layers should be solved in this step.
步骤 205,求解输入信号与步骤 203中得到的解码信号的前两层的合成语 音信号的差值信号, 并求解出该差值信号的 MDCT系数, 即频谱 Freq_err。  Step 205, solving a difference signal between the input signal and the synthesized speech signal of the first two layers of the decoded signal obtained in step 203, and solving the MDCT coefficient of the difference signal, that is, the spectrum Freq_err.
步骤 206, 对步骤 205中的 MDCT系数进行操作 (例如, 取整操作 )后, 可得到与相应的频谱矢量距离最近的多个格点。 如果釆用 RE8格, 则能够得 到 35个格点。  Step 206: After performing the operation (for example, rounding operation) on the MDCT coefficients in step 205, a plurality of grid points closest to the corresponding spectrum vector distance are obtained. If you use RE8, you can get 35 grid points.
步骤 207,将步骤 204中的频谱 Freq_R2与步骤 205中的 Freq_err差值与 所述 Freq_err进行变换后求比值 Ratio[k] , 并将步骤 206中的各个格点按前后 顺序分成 N个区域。 以 RE8格为例 , 求解的公式如下所示:  In step 207, the frequency spectrum Freq_R2 in step 204 and the Freq_err difference value in step 205 are transformed with the Freq_err to obtain a ratio Ratio[k], and each grid point in step 206 is divided into N regions in the order of the preceding and succeeding. Taking RE8 as an example, the formula for solving is as follows:
Ratioik] = Υ{ Freq _R2[l] -Freq _ err[l { χ ) Ratioik] = Υ{ Freq _R2[l] -Fre q _ err[l { χ )
, { Freq _R2[l] )  , { Freq _R2[l] )
其中, l = 8*k+i, k=0,l,2, ... , 34, i=0,l,2, ... , 7。  Where l = 8*k+i, k=0, l, 2, ..., 34, i=0, l, 2, ..., 7.
通过使用公式(1 )进行求解, 可以得到一个由多个比值组成的数组  By solving with formula (1), you can get an array of multiple ratios
Ratio [k] , 数组中的每一个比值都唯一对应一个步骤 206中的格点。  Ratio [k] , each ratio in the array uniquely corresponds to a grid point in step 206.
如果是釆用 RE8格, 则可得到一个由 35个比值构成的数组 Ratio[k]。 将得到的多个格点按位于数组 Ratio[k]中的前后顺序分成 N个区域, N为 一个大于等于 1的整数。 If the RE8 cell is used, an array Ratio[k] consisting of 35 ratios can be obtained. The obtained plurality of lattice points are divided into N regions in the order of the order in the array Ratio[k], and N is an integer greater than or equal to 1.
步骤 208, 将格点进行排序。  Step 208, sort the grid points.
将上述 N个区域中的格点根据人耳的听觉特性进行排列, 将被掩蔽的可 能性小的信号排在前面。 本实施例中釆用反序排列, 将数组 Ratio[k]中位于最 后的区域中的格点排列于最前, 将第一个区域中的格点排列于最后, 其它区 域也相应进行排列。 然后将每个区域中的格点都按 Ratio[k]从大到小的顺序进 行排列。 将重新排序后的各格点按前后顺序置于一个新的数组中。  The lattice points in the above N regions are arranged according to the auditory characteristics of the human ear, and the signal with a small possibility of masking is placed in front. In this embodiment, the cells are arranged in reverse order, the grid points in the last region of the array Ratio[k] are arranged at the forefront, the grid points in the first region are arranged at the end, and the other regions are arranged accordingly. Then arrange the grid points in each area in the order of Ratio[k] from largest to smallest. Place the reordered grid points in a new array in order.
下面以一个具体例子来说明。 将 Ratio[k]分成两个区域, 前 n个比值为第 一个区域, 后(35-n )个比值为第二个区域。 其中, n为一个大于等于 1的整 数。 根据第二个区域内 Ratio[k]的值, 按从大到小的顺序将步骤 206中的格点 进行排序, 放在一个由 35个元素组成的数组 R[k]的前(35-n )个元素中; 第 一个区域的做法和第二个区域相似,根据第一个区域内 Ratio[k]的值按从大到 小的顺序将步骤 306中的格点进行排序,放在上述数组 R[k]的后 n个元素中。  The following is a specific example. The ratio[k] is divided into two regions, the first n ratio is the first region, and the later (35-n) ratio is the second region. Where n is an integer greater than or equal to 1. According to the value of Ratio[k] in the second region, the grid points in step 206 are sorted in descending order, placed in front of an array of 35 elements R[k] (35-n) Among the elements; the first area is similar to the second area, and the grid points in step 306 are sorted according to the value of Ratio[k] in the first area in descending order. In the last n elements of the array R[k].
其中, 本实施例中将 Ratio[k]分成两个区域时, 具体取前多少个值作为第 一个区域应当根据实际应用情况而预先设定。 多个区域, 并根据所釆用的编码器前几层的编码特性或差值信号的 MDCT系 数的特性进行排序, 将被掩蔽的可能性小的信号相应的格点所在的区域排在 前, 将被掩蔽的可能性大的信号相应的格点所在的区域排在最后。  When the ratio[k] is divided into two regions in this embodiment, how many values are taken as the first region should be preset according to the actual application. Multiple regions are sorted according to the coding characteristics of the first few layers of the encoder or the characteristics of the MDCT coefficients of the difference signal, and the region where the corresponding lattice point of the signal with low probability of being masked is ranked first. The area where the corresponding grid point of the signal that is likely to be masked is ranked last.
步骤 209, 求解与上述差值信号相应的感觉加权滤波器的频谱 W— Freq。 所述的感觉加权滤波器 H(z)满足公式: H(z) = ^^/ /^-^-1)。 其中, A 表示线性预测系数, 反映了高频带的频谱包络值, z表示频域, β和 γ则表示 加权因子, 一般情况下为常数。 Step 209: Solving the spectrum W_Freq of the perceptual weighting filter corresponding to the difference signal. The perceptual weighting filter H(z) satisfies the formula: H(z) = ^^/ /^-^- 1 ). Where A represents a linear prediction coefficient, reflecting the spectral envelope value of the high frequency band, z represents the frequency domain, and β and γ represent weighting factors, which are generally constant.
根据感觉加权滤波器 Η(ζ)进行 MDCT变换得到频谱 W— Freq。  The spectrum W-Freq is obtained by MDCT transform based on the perceptual weighting filter Η(ζ).
步骤 210, 将步骤 204中的频谱 Freq_R2与步骤 209中的 W— Freq进行变 换后求比值 Rat[k] , 求解的公式如下所示: Rat[k] =
Figure imgf000008_0001
Step 210: Convert the spectrum Freq_R2 in step 204 and W_Freq in step 209 to obtain a ratio Rat[k], and the formula is as follows: Rat[k] =
Figure imgf000008_0001
其中, l = 8*k+i, k=0,l,2, ... , 34, i=0,l,2, ... , 7。  Where l = 8*k+i, k=0, l, 2, ..., 34, i=0, l, 2, ..., 7.
步骤 211 , 求解各格点的索引值。 根据上述步骤 S208中的数组中的数据 以及可以利用的总比特数, 求取相应格点的索引值。 当比特数不够时, 将步 骤 208中的数组中排在最后的 m个格点中的元素置为 0 , 此时这些格点的索 引值也为 0。 m为一个预先设定的、 大于等于 1的整数。 其中, 可根据总比特 数的多少, 预先设定 m的值。  Step 211: Solve the index value of each grid point. According to the data in the array in the above step S208 and the total number of bits that can be utilized, the index value of the corresponding grid point is obtained. When the number of bits is insufficient, the elements in the last m grid points in the array in step 208 are set to 0, and the index values of these grid points are also 0. m is a predetermined integer greater than or equal to 1. Among them, the value of m can be set in advance according to the total number of bits.
本实施例中所列举的步骤 210和 211并不局限于上述顺序, 还可以先操 作步骤 211 , 再操作步骤 210。  Steps 210 and 211 enumerated in this embodiment are not limited to the above sequence, and step 211 may be operated first, and then step 210 is operated.
步骤 212,根据 Rat[k]值的从大到小的排列顺序,将相应的格点的索引值, 顺序写进码流。 即将 Rat[k]值大(较重要的信号)的对应的格点的索引值先写 进码流, Rat[k]值小的对应的格点的索引值后写进码流。  Step 212: Write the index values of the corresponding grid points into the code stream according to the order of the Rat[k] values from large to small. The index value of the corresponding grid point of the Rat[k] value (the more important signal) is first written into the code stream, and the index value of the corresponding grid point with a small Rat[k] value is written into the code stream.
当然本发明并不局限于上述实施例。 本发明也不局限于本实施例中釆用 的 RE8格, 还可以釆用其它方式, 例如 Z8格。 在步骤 208中对格点进行排序 也可以根据其它原则, 例如, 还可以釆用全局索引顺序确定模式来排列格点, 即所有的格点不分区域, 而只按 Ratio[k]的大小进行排列。 具体需要釆用什么 方案来对格点进行排序, 可以根据实际需要来选择。 本实施例中上述步骤 S201 ~ S212并不局限于上述顺序。  Of course, the invention is not limited to the above embodiments. The present invention is not limited to the RE8 grid used in the present embodiment, and other methods such as Z8 grid can be used. Sorting the grid points in step 208 may also be based on other principles. For example, the grid order may be determined by using the global index order determination mode, that is, all the grid points are not divided into regions, and only the size of Ratio[k] is performed. arrangement. The specific need to use the scheme to sort the grid points, you can choose according to actual needs. The above steps S201 to S212 in this embodiment are not limited to the above order.
由上可知, 在本实施例中, 釆用公式(1 )所表示的判决准则更符合掩蔽 效应的原理: 才艮据掩蔽效应的原理, 如果 Freq_R2与 Freq_err的差值越小, 则说明它们的频语越接近, 因此 Freq_err被掩蔽掉的可能性越小; 另外, 在 上述差值相同的情况下, 上述的差值与本地解码后的 Freq— R2 比值越大, 则 Freq_err被掩蔽掉的可能性越小。 因此, 通过本实施例中的方法, 可保证被掩 蔽掉的可能性较小的信号所对应的引导项不会被强制置为 0 ,从而保证当比特 数不足时, 较重要的信息将被较精细地量化并被优先编入码流, 而不重要的 信息将被粗略地量化。 From the above, in the present embodiment, the decision criterion represented by the formula (1) is more in line with the principle of the masking effect: According to the principle of the masking effect, if the difference between Freq_R2 and Freq_err is smaller, then their The closer the frequency is, the smaller the possibility that Freq_err is masked off. In addition, in the case where the above difference is the same, the larger the ratio of the above difference to the locally decoded Freq-R2, the possibility that Freq_err is masked. The sex is smaller. Therefore, by the method in this embodiment, it can be ensured that the guiding item corresponding to the signal that is less likely to be masked is not forcibly set to 0, thereby ensuring that when the number of bits is insufficient, the more important information will be compared. Finely quantized and prioritized into the code stream, not important The information will be roughly quantified.
此外, 在本实施例中, 是根据本地解码出的釆用 CELP编码的合成语音 (音频) 的频谱 Freq_R2和感觉加权滤波器的频谱 W— Freq的比值 Rat, 来确 定引导项索引值在码流中的顺序的, 原因在于: 根据感觉加权滤波器的作用, 在输入信号频谱能量较大处分配较大的失真, 而在频谱能量较小处尽量减少 失真,从而对于釆用 CELP编码的信号,将会造成在 Rat较大处量化相对较粗 糙, 而这正是本实施例中的格型矢量量化的重点。 所以, 通过将 Rat较大的那 些引导项的索引值放在码流的较前面的位置 ,而将 Rat较小的那些引导项的索 引值放在码流的较后面的位置的方法, 可使得在解码端比特数不够时, 可以 解码出较重要的信息, 从而提高解码出的语音的质量。  In addition, in this embodiment, the pilot index value is determined in the code stream according to the locally decoded CELP-encoded synthesized speech (audio) spectrum Freq_R2 and the ratio of the spectrum W-Fre of the perceptual weighting filter. The reason for the order is: According to the function of the perceptual weighting filter, a larger distortion is allocated at a larger spectral energy of the input signal, and a distortion is minimized at a smaller spectral energy, so that for a CELP-encoded signal, This will result in a relatively coarse quantization at a larger Rat, which is the focus of the lattice vector quantization in this embodiment. Therefore, by placing the index value of those boots with a larger Rat at the earlier position of the code stream, and placing the index value of those guide items with a smaller Rat at the later position of the code stream, the method can be made. When the number of bits at the decoding end is insufficient, more important information can be decoded, thereby improving the quality of the decoded speech.
在本发明的另一个实施例中, 还可通过一个确定的频谱值作为分界点来 确定格型码书在码流中的顺序。 图 3 为本发明另一实施例中多速率语音频编 码方法的流程图, 如图 3 所示, 本实施例中多速率语音频编码方法包括如下 所述的步骤:  In another embodiment of the invention, the order of the trellis codebooks in the codestream can also be determined by a determined spectral value as a demarcation point. FIG. 3 is a flowchart of a multi-rate speech and audio encoding method according to another embodiment of the present invention. As shown in FIG. 3, the multi-rate speech and audio encoding method in this embodiment includes the following steps:
步骤 301 , 求解 R3和 R4层的 MDCT 系数, 即求解前三层和前四层的 MDCT系数。具体的求解方法可以参照前一个实施例,这里不再赘述。以 RE8 格为例, 将 MDCT 系数对应的 35 个频语模块按频谱范围分成 0~2kHz 和 2~7kHz两个区域。 例如, 前 10个频语模块所对应的频谱范围为 0~2kHz的, 而后 25个频语模块所对应的频谱范围为 2~7kHz。 具体的两个区域中有多少 频语模块在不同的实施例中各不相同。  Step 301: Solving the MDCT coefficients of the R3 and R4 layers, that is, solving the MDCT coefficients of the first three layers and the first four layers. For a specific solution method, reference may be made to the previous embodiment, and details are not described herein again. Taking the RE8 grid as an example, the 35 frequency modules corresponding to the MDCT coefficients are divided into two regions of 0~2kHz and 2~7kHz according to the spectrum range. For example, the spectrum range of the first 10 frequency modules is 0~2kHz, and the spectrum range of the last 25 frequency modules is 2~7kHz. How many frequency modules in the specific two regions are different in different embodiments.
步骤 302, 获取频谱范围为 2~7kHz的格点进行处理。  Step 302: Acquire a grid point whose spectrum range is 2~7 kHz for processing.
步骤 303 , 判断总比特数是否足够。 如果是, 则执行步骤 305; 否则, 执 行步骤 304;  Step 303: Determine whether the total number of bits is sufficient. If yes, go to step 305; otherwise, go to step 304;
步骤 304, 将频谱范围为 2~7kHz的频谱能量较小的格点的值置为 0。 即 将频谱范围为 2~7kHz的格点按照频谱能量从大到小的顺序进行排序, 将后 n 个频谱能量较小的格点的值置为 0, n为一个大于等于 1的整数。 所述 n可根 据实际应用情况预先进行设定。 步骤 305, 求解频谱范围为 2~7kHz内格点的索引值, 格点元素被置为 0 的格点相应的索引值也被置为 0。 Step 304: Set the value of the grid point whose spectrum energy is 2~7 kHz to be smaller than 0. That is, the grid points with the spectrum range of 2~7 kHz are sorted according to the order of the spectrum energy from the largest to the smallest, and the values of the grid points with the smaller n spectrum energy are set to 0, and n is an integer greater than or equal to 1. The n can be set in advance according to actual application conditions. Step 305, the index value of the grid point in the spectrum range of 2~7 kHz is solved, and the corresponding index value of the grid point where the grid element is set to 0 is also set to 0.
步骤 306, 获取频谱范围为 0~2kHz的格点进行处理。  Step 306: Acquire a grid point whose spectrum range is 0~2 kHz for processing.
步骤 307, 判断总比特数是否足够。 判断总比特数是否够, 如果是, 则执 行步骤 309; 否则, 执行步骤 308;  Step 307, determining whether the total number of bits is sufficient. Determining whether the total number of bits is sufficient, if yes, executing step 309; otherwise, performing step 308;
步骤 308, 在总比特数不足的情况下, 根据可用的总比特数, 将频谱范围 为 0~2kHz的频谱能量较小的 n个格点的元素置为 0。即将频谱范围为 0~2kHz 的格点按照频谱能量从大到小的顺序进行排序, 将后 m个频谱能量较小的格 点的元素置为 0 , m为一个大于等于 1的整数。 所述 m可根据实际应用情况 预先进行设定。  Step 308, in the case that the total number of bits is insufficient, according to the total number of available bits, the elements of the n grid points having a small spectral energy range of 0 to 2 kHz are set to 0. That is, the grid points with the spectrum range of 0~2kHz are sorted according to the order of the spectrum energy from the largest to the smallest, and the elements of the grids with the smaller m spectrum energy are set to 0, and m is an integer greater than or equal to 1. The m can be set in advance according to the actual application.
步骤 309, 求解 0~2kHz内各格点的索引值, 格点元素被置为 0的格点相 应的索引值也置为 0。  Step 309, the index value of each grid point in 0~2 kHz is solved, and the corresponding index value of the grid point where the grid element is set to 0 is also set to 0.
步骤 310、将求解得到的索引值按格点的排列顺序编入码流。具体地, 与 2~7kHz的频谱相应的格点排在前, 与 0~2kHz的频谱相应的格点排在后。 从 而能够将各索引值按其在解码中的重要性编入码流。  Step 310: Program the obtained index values into the code stream according to the order of the grid points. Specifically, the grid points corresponding to the spectrum of 2 to 7 kHz are ranked first, and the grid points corresponding to the spectrum of 0 to 2 kHz are ranked. Thus, each index value can be programmed into the code stream according to its importance in decoding.
步骤 311 , 结束编码。  Step 311, ending the encoding.
通过上述的方法, 可将 2~7kHz的 MDCT语对应的引导项的索引值放在 码流的前面位置, 而将 0~2kHz的 MDCT语对应的引导项的索引值放在其后, 形成一个完整的码流。  Through the above method, the index value of the boot item corresponding to the MDCT language of 2~7 kHz can be placed in the front position of the code stream, and the index value of the boot item corresponding to the MDCT language of 0~2 kHz can be placed behind, forming a Complete stream of code.
本发明中步骤 301并不局限于求解 R3和 R4层的 MDCT系数,完全可以 根据实际需要选择相应前几层来求解 MDCT系数。本发明中 MDCT系数对应 的 35个频语模块按频谱范围分成两个部分所釆用的频谱范围也可以根据实际 情况来选定。  Step 301 in the present invention is not limited to solving the MDCT coefficients of the R3 and R4 layers, and the corresponding first few layers can be selected according to actual needs to solve the MDCT coefficients. In the present invention, the frequency range of the 35 frequency modules corresponding to the MDCT coefficients divided into two parts according to the spectrum range can also be selected according to the actual situation.
上述的排序方法, 适用于低层用 CELP编码, 高层用变换编码的嵌入式 多速率语音编码算法。在上述的方法中,选择 2kHz作为分界点,是因为 CELP 编码对于 0~2kHz的低频信号的处理的效果艮好; 同时, 由于高层处理的信号 是原始输入信号和本地解码出的低层信号的差值信号的频谱信号, 所以在高 层所需处理的信号中, 2kHz以上的频谱信号是更重要的信息。 因此, 在编码 差值信号的频谱时, 应优先考虑编码 2kHz以上的频谱信号, 从而保证在解码 端当比特数不足时, 可优先解码出 2kHz以上的较重要的信息, 而不是解码出 较不重要低频的信息。 The above sorting method is applicable to an embedded multi-rate speech coding algorithm with low-level CELP coding and high-level transform coding. In the above method, 2 kHz is selected as the demarcation point because CELP coding has a good effect on the processing of low frequency signals of 0 to 2 kHz; at the same time, since the signal processed by the higher layer is the difference between the original input signal and the locally decoded lower layer signal. The signal signal of the value signal, so at high Of the signals to be processed by the layer, spectral signals above 2 kHz are more important information. Therefore, when encoding the spectrum of the difference signal, priority should be given to encoding the spectrum signal above 2 kHz, so that when the number of bits is insufficient at the decoding end, the more important information above 2 kHz can be preferentially decoded, instead of decoding less. Important low frequency information.
在上述的方法中,关于 CELP编码部分的编码方法,与图 2中所示的 CELP 编码部分的编码方法相同。 在此不再赘述。  In the above method, the encoding method for the CELP encoding portion is the same as the encoding method for the CELP encoding portion shown in Fig. 2. I will not repeat them here.
此外, 上述步骤 301 ~ 311的方法, 也可以与实施例一中的方法相结合, 从而更好的实现本发明的目的。  In addition, the methods of the above steps 301 to 311 can also be combined with the method in the first embodiment to better achieve the object of the present invention.
通过本实施例中的方法, 可实现:  Through the method in this embodiment, it is possible to:
1 )根据实际编码器的要求, 进行模式切换, 决定是选择全局索引顺序确 定模式还是分块索引顺序确定模式。  1) According to the requirements of the actual encoder, mode switching is performed, and it is decided whether to select the global index order determining mode or the block index order determining mode.
2 )根据实际编码器的要求, 进行索引排序的模式切换, 决定是进行静态 模式的排序还是动态模式的排序。  2) According to the requirements of the actual encoder, the mode switching of the index sorting is performed, and it is determined whether the static mode is sorted or the dynamic mode is sorted.
3 )根据本地解码出的低层信号的频谱与原始输入信号和本地解码出的低 层信号的差值的频谱的差值, 再和本地解码出的低层信号的频谱做比值来作 为准则;  3) based on the difference between the spectrum of the locally decoded lower layer signal and the difference between the original input signal and the locally decoded lower layer signal, and then the ratio of the locally decoded lower layer signal as a criterion;
4 )在编解码端都可以得到的信噪比或者某个非 0值来决定码流中格型码 书索引的顺序。  4) The signal-to-noise ratio or a non-zero value that can be obtained at the codec side determines the order of the lattice code index in the code stream.
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保护 范围。 凡在本发明的精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。  The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

权 利 要求 书 Claim
1、 一种多速率语音频编码的方法, 其特征在于, 包括:  A method for multi-rate speech and audio coding, comprising:
求解将输入信号进行编码再本地解码得到的合成语音与所述输入信号的差 值信号;  Solving a difference signal of the synthesized speech obtained by encoding and then locally decoding the input signal and the input signal;
求解与所述差值信号相应的各个频谱矢量离得最近的格点相应的索引值; 求解与所述差值信号相应的感觉加权滤波器的频谱与所述合成语音的频谱 的第一比值;  Solving an index value corresponding to a grid point of each of the spectral vectors corresponding to the difference signal; and solving a first ratio of a spectrum of the perceptual weighting filter corresponding to the difference signal to a spectrum of the synthesized speech;
根据所述第一比值从大到小的顺序将所述格点相应的索引值编入码流。 The index values corresponding to the grid points are programmed into the code stream according to the order in which the first ratio is large to small.
2、 根据权利要求 1所述的多速率语音频编码的方法, 其特征在于, 所述求 解与所述差值信号相应的各个频谱矢量离得最近的格点相应的索引值包括: 求解所述差值信号相应的频谱信号; The multi-rate speech and audio coding method according to claim 1, wherein the calculating an index value corresponding to a lattice point that is closest to each of the spectral vectors corresponding to the difference signal comprises: a corresponding spectral signal of the difference signal;
根据格点的特性, 将频语信号分成频谱矢量;  According to the characteristics of the grid point, the frequency signal is divided into spectrum vectors;
求解所述频谱矢量所对应的格点和所述格点的索引值。  Solving a grid point corresponding to the spectrum vector and an index value of the grid point.
3、 根据权利要求 2所述的多速率语音频编码的方法, 其特征在于, 求解所 述频谱矢量所对应的格点和所述格点的索引值包括:  The multi-rate speech and audio coding method according to claim 2, wherein the solution of the lattice point corresponding to the spectral vector and the index value of the lattice point comprises:
求解所述频谱矢量所对应的格点;  Solving a lattice point corresponding to the spectrum vector;
根据可以利用的总比特数, 求解所述频谱矢量所对应的格点的索引值; 若总比特数不足, 将被掩蔽的可能性较大的信号相应的格点的索引值置为  Calculating an index value of a lattice point corresponding to the spectrum vector according to the total number of bits that can be utilized; if the total number of bits is insufficient, setting an index value of a corresponding grid point of a signal that is more likely to be masked is
4、 根据权利要求 3所述的多速率语音频编码的方法, 其特征在于, 所述若 总比特数不足, 将被掩蔽的可能性较大的信号相应的格点的元素置为零包括: 计算所述差值信号的频谱与所述合成语音的频谱的第二比值; The multi-rate speech and audio coding method according to claim 3, wherein if the total number of bits is insufficient, the element of the corresponding lattice point of the signal that is more likely to be masked is set to zero: Calculating a second ratio of a spectrum of the difference signal to a spectrum of the synthesized speech;
根据临界频带将所述频谱矢量所对应的格点分成至少一部分, 若分为一部 分以上, 则根据所述频谱矢量的重要性将各个部分进行排序;  Dividing a grid point corresponding to the spectrum vector into at least a part according to a critical frequency band, and if divided into more than one part, sorting each part according to the importance of the spectrum vector;
将每一部分中的格点按第二比值从大到小的顺序进行排列;  Arranging the grid points in each section in descending order of the second ratio;
当总比特数不够时, 将重新排列后的格点中后面预定项数的格点的元素以 及索引值置为零。 When the total number of bits is insufficient, the elements of the grid points of the predetermined number of items in the rearranged grid points and the index value are set to zero.
5、 根据权利要求 4所述的多速率语音频编码的方法, 其特征在于, 所述根 据所述频谱矢量的重要性将各个部分进行排序包括: 5. The method of multi-rate speech and audio coding according to claim 4, wherein the ordering the respective parts according to the importance of the spectral vector comprises:
根据人耳的听觉特性、 所使用的编码器的前几层编码特性或所述差值信号 的频谱系数的特性将各个部分进行排序。  The parts are sorted according to the auditory characteristics of the human ear, the first few layers of coding characteristics of the encoder used, or the characteristics of the spectral coefficients of the difference signal.
6、 根据权利要求 4或 5所述的多速率语音频编码的方法, 其特征在于, 所 述计算第二比值 Ratio[k]的公式为:
Figure imgf000013_0001
6. The method of multi-rate speech and audio coding according to claim 4 or 5, wherein the formula for calculating the second ratio Ratio[k] is:
Figure imgf000013_0001
其中, Freq_R2为根据所述输入信号编码再本地解码后的前几层的合成信号 的频谱, Freq_err为所述差值信号的频谱系数, l = S*k+i, k=0, \,2, . . . , 34 , /=0, 1,2, . . . ,  Wherein, Freq_R2 is a spectrum of a composite signal of the first few layers after the local decoding is encoded according to the input signal, and Freq_err is a spectral coefficient of the difference signal, l = S*k+i, k=0, \, 2 , . . . , 34 , /=0, 1,2, . . . ,
7、 根据权利要求 4所述的多速率语音频编码的方法, 其特征在于, 所述若 总比特数不足, 将被掩蔽的可能性较大的信号相应的格点的元素置为零具体为: 将频谱矢量根据临界频带分成至少一个部分, 若分为一部分以上, 则根据 频谱矢量的重要性将各个部分进行排序; The multi-rate speech and audio coding method according to claim 4, wherein if the total number of bits is insufficient, the element corresponding to the signal corresponding to the signal that is more likely to be masked is set to zero, specifically : dividing the spectrum vector into at least one part according to the critical band, and if divided into more than one part, sorting the parts according to the importance of the spectrum vector;
将每一部分中的格点按频谱能量从大到小的顺序进行排列;  Arranging the grid points in each part in order of the spectrum energy from large to small;
当总比特数不够时, 将重新排列后的每一部分格点中后面预定项数的格点 的元素以及索引值置为零。  When the total number of bits is insufficient, the elements of the grid points of the predetermined number of items in each of the rearranged grid points and the index value are set to zero.
8、 根据权利要求 7所述的多速率语音频编码的方法, 其特征在于, 当总比 特数不够时, 将重新排列后的格点中后面预定项数的格点的元素和索引值置为 零包括:  8. The method according to claim 7, wherein when the total number of bits is insufficient, the elements and index values of the lattice points of the predetermined number of items in the rearranged lattice points are set to Zero includes:
在某一部分格点中发现总比特数不够时, 将所述某一部分格点中后面预定 项数的格点的元素和索引值置为零。  When the total number of bits is found to be insufficient in a certain partial grid point, the element and index value of the grid points of the predetermined number of items in the certain partial grid points are set to zero.
9、 根据权利要求 7所述的多速率语音频编码的方法, 其特征在于, 所述根 据频谱矢量的重要性将各个部分进行排序包括:  9. The method of multi-rate speech and audio coding according to claim 7, wherein the ordering the respective parts according to the importance of the spectral vector comprises:
根据人耳的听觉特性、 前几层釆用的编码器的特性或差值信号的频谱系数 的特性将各个部分进行排序。 The parts are sorted according to the auditory characteristics of the human ear, the characteristics of the encoders of the first few layers, or the characteristics of the spectral coefficients of the difference signals.
10、 根据权利要求 1 所述的多速率语音频编码的方法, 其特征在于, 所述第一比值 Rat[k]的公式为: 10. The method of multi-rate speech and audio coding according to claim 1, wherein the formula of the first ratio Rat[k] is:
^ log (Freq _R2[l]f + (Freq _R2[l + '^ log (Freq _R2[l]f + (Freq _R2[l + '
og (W _Freq[l]f + (W _Freq[l + ] 其中, Freq_R2为根据所述输入信号编码再解码后的前两层的合成信号的频 谱, Freq_err为所述差值信号的频谱系数, Hk+i, k=0,\,2, ... , 34, ,=0,1,2", 7。  Og (W _Freq[l]f + (W _Freq[l + ] where Freq_R2 is the spectrum of the synthesized signals of the first two layers after being decoded according to the input signal, and Freq_err is the spectral coefficient of the difference signal, Hk+i, k=0,\,2, ..., 34, ,=0,1,2", 7.
PCT/CN2008/072946 2007-11-05 2008-11-05 A multi-rate speech audio encoding method WO2009059564A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710169619.5 2007-11-05
CN 200710169619 CN101430879B (en) 2007-11-05 2007-11-05 Multi-speed audio encoding method

Publications (1)

Publication Number Publication Date
WO2009059564A1 true WO2009059564A1 (en) 2009-05-14

Family

ID=40625388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/072946 WO2009059564A1 (en) 2007-11-05 2008-11-05 A multi-rate speech audio encoding method

Country Status (2)

Country Link
CN (1) CN101430879B (en)
WO (1) WO2009059564A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5226085A (en) * 1990-10-19 1993-07-06 France Telecom Method of transmitting, at low throughput, a speech signal by celp coding, and corresponding system
US6594627B1 (en) * 2000-03-23 2003-07-15 Lucent Technologies Inc. Methods and apparatus for lattice-structured multiple description vector quantization coding
KR20050022419A (en) * 2003-08-30 2005-03-08 엘지전자 주식회사 Apparatus and method for spectrum vector quantizing in vocoder
CN1659785A (en) * 2002-05-31 2005-08-24 沃伊斯亚吉公司 Method and system for multi-rate lattice vector quantization of a signal
CN101000768A (en) * 2006-06-21 2007-07-18 北京工业大学 Embedded speech coding decoding method and code-decode device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5226085A (en) * 1990-10-19 1993-07-06 France Telecom Method of transmitting, at low throughput, a speech signal by celp coding, and corresponding system
US6594627B1 (en) * 2000-03-23 2003-07-15 Lucent Technologies Inc. Methods and apparatus for lattice-structured multiple description vector quantization coding
CN1659785A (en) * 2002-05-31 2005-08-24 沃伊斯亚吉公司 Method and system for multi-rate lattice vector quantization of a signal
KR20050022419A (en) * 2003-08-30 2005-03-08 엘지전자 주식회사 Apparatus and method for spectrum vector quantizing in vocoder
CN101000768A (en) * 2006-06-21 2007-07-18 北京工业大学 Embedded speech coding decoding method and code-decode device

Also Published As

Publication number Publication date
CN101430879B (en) 2011-08-10
CN101430879A (en) 2009-05-13

Similar Documents

Publication Publication Date Title
JP5085543B2 (en) Selective use of multiple entropy models in adaptive coding and decoding
ES2474915T3 (en) Encoding device, decoding device and corresponding methods
RU2522020C1 (en) Hierarchical audio frequency encoding and decoding method and system, hierarchical frequency encoding and decoding method for transient signal
US7693709B2 (en) Reordering coefficients for waveform coding or decoding
US7684981B2 (en) Prediction of spectral coefficients in waveform coding and decoding
KR101139172B1 (en) Technique for encoding/decoding of codebook indices for quantized mdct spectrum in scalable speech and audio codecs
US8639519B2 (en) Method and apparatus for selective signal coding based on core encoder performance
KR101130355B1 (en) Efficient coding of digital media spectral data using wide-sense perceptual similarity
JP5123173B2 (en) Subband speech codec with multi-stage codebook and redundant coding technology field
EP1904999B1 (en) Frequency segmentation to obtain bands for efficient coding of digital media
CN105144288B (en) Advanced quantizer
WO2007132750A1 (en) Lsp vector quantization device, lsp vector inverse-quantization device, and their methods
WO2007114290A1 (en) Vector quantizing device, vector dequantizing device, vector quantizing method, and vector dequantizing method
JP5714002B2 (en) Encoding device, decoding device, encoding method, and decoding method
JP7167335B2 (en) Method and Apparatus for Rate-Quality Scalable Coding Using Generative Models
WO2011045926A1 (en) Encoding device, decoding device, and methods therefor
WO2009059564A1 (en) A multi-rate speech audio encoding method
JP4563881B2 (en) Audio encoding apparatus and program
WO2011045927A1 (en) Encoding device, decoding device and methods therefor
Shin et al. Low-complexity predictive trellis coded quantization of wideband speech LSF parameters
JP5635213B2 (en) Encoding method, encoding apparatus, decoding method, decoding apparatus, program, and recording medium
KR20010040902A (en) A system and method for providing split vector quantization data coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08847314

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08847314

Country of ref document: EP

Kind code of ref document: A1