WO2008138267A1 - Procede de post-traitement et appareil d'amelioration de ton fondamental - Google Patents

Procede de post-traitement et appareil d'amelioration de ton fondamental Download PDF

Info

Publication number
WO2008138267A1
WO2008138267A1 PCT/CN2008/070931 CN2008070931W WO2008138267A1 WO 2008138267 A1 WO2008138267 A1 WO 2008138267A1 CN 2008070931 W CN2008070931 W CN 2008070931W WO 2008138267 A1 WO2008138267 A1 WO 2008138267A1
Authority
WO
WIPO (PCT)
Prior art keywords
gain
post
decoded signal
filter
pitch
Prior art date
Application number
PCT/CN2008/070931
Other languages
English (en)
French (fr)
Inventor
Li Liu
Wei Li
Junbin Cao
Xiaogang Sun
Qing Zhang
Lijing Xu
Jianfeng Xu
Zhengzhong Du
Chen Hu
Lei Miao
Yi Yang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2008138267A1 publication Critical patent/WO2008138267A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering

Definitions

  • the present invention relates to the field of audio decoding technologies, and in particular, to a pitch-enhanced adaptive post-processing technique in an audio decoding process.
  • BACKGROUND OF THE INVENTION In the audio decoding process, in order to improve the perceived effect of the decoded speech, it is necessary to perform post-processing operations on the decoded speech.
  • the purpose of the post-processing is to enhance the perceptual quality-related information in the synthesized sound signal, i.e., to reduce or remove interference information that degrades the perceived quality to improve the perceived quality.
  • the techniques used in the post-processing are generally divided into formant post-processing techniques and pitch post-processing techniques. In pitch post processing, the frequency response of the filter needs to be related to harmonics.
  • the post-processing method is a band-selectable pitch enhancement post-processing algorithm.
  • the synthesized sound signal that has been decoded is divided into two sub-bands, and for the low frequency band, first, the adaptive pitch enhancement filter is used to The noise is attenuated, and then processed by low-pass filtering; for another frequency band, it is directly filtered by a high-pass filter; finally, the signals of the two frequency bands respectively processed are summed to obtain a pitch-enhanced Synthesize sound signals.
  • Fig. 1 for the purpose of pitch enhancement post-processing, two modules of Pitch enhancer and low-pass filter are used in the low frequency sub-band. among them:
  • the function of the Pitch enhancer module is to perform an appropriate degree of inter-harmonic noise on the low-frequency end of the decoded signal, and then pass the Low-pass filter to filter out the spectral tilt and other undesired Frequency component; the implementation of the Pitch enhancer module uses a time-varying linear filter.
  • the Low-pass filter module is a linear phase FIR (finite impulse response) low pass filter.
  • the register needs to be updated in each sub-frame using the signal state processed by the low-pass filter.
  • the noise component between the harmonics at the low-frequency end of the decoded speech signal can be eliminated, so that the perceived quality of the synthesized synthesized sound is improved.
  • Embodiments of the present invention provide a method and apparatus for implementing post-processing of pitch enhancement to simplify post-processing and improve the quality of audio signals obtained by post-processing.
  • a method for implementing pitch enhancement post-processing includes a process of post-filtering a decoded signal, and the process includes:
  • a device for implementing pitch enhancement post processing comprising:
  • a gain evaluation unit configured to obtain a gain of the decoded signal
  • a threshold value determining unit configured to determine whether a gain of the decoded signal determined by the gain evaluation unit exceeds a predetermined threshold
  • the adaptive post filter is configured to perform long-term post-filtering processing only on the decoded signal whose gain of the decoded signal exceeds a predetermined threshold according to the judgment result of the threshold judging unit.
  • a computer program product comprising: computer program code, when the computer program code is executed by a computer, the computer program code can cause the computer to perform the method of implementing pitch enhancement processing comprising A step of.
  • FIG. 1 is a schematic diagram showing the principle of post-processing implementation of pitch enhancement used in the prior art
  • FIG. 2 is a schematic diagram of a processing procedure of a method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a gain evaluation unit according to an embodiment of the present invention.
  • Mode for Carrying Out the Invention the energy characteristics of the decoded signal are fully utilized, and the pitch gain and pitch period values obtained by decoding are compared to obtain pitch information that best reflects the characteristics of the sound, thereby providing selective use.
  • the pitch enhancement post-processing filter provides a better perceptual quality domain value evaluation and decision implementation for the decoded signal.
  • the method may be: first, obtaining a gain of the decoded signal, and then determining whether the gain exceeds a predetermined threshold; if yes, performing long-time filtering processing on the decoded signal, otherwise, The decoded signal is directly output.
  • the post filter used in the post-filtering process on the decoded signal may be, but not limited to, a full zero post filter.
  • the post filter selects the full zero post filter, the local adjustment factor and the adaptive global gain involved in the corresponding filter function may be further improved.
  • the specific parameter value of the perceived quality may be used in the embodiments of the present invention for post filtering processing.
  • the speech coding part adopts CELP (Code-Excited Linear Prediction) coding technology.
  • CELP Code-Excited Linear Prediction
  • the input signal is pre-emphasized and subjected to 16-order linear prediction analysis, and then encoded by a pitch synthesis filter.
  • the expression of the pitch synthesis filter is:
  • is the pitch period
  • is the pitch gain
  • 2 is the transformed symbol
  • the formant portion of speech is more important for auditory perception than the trough portion of speech; therefore, at lower coding rates, it is often desirable to sacrifice the performance of the trough region to maximize the encoding of the formant. This allows the trough to contain more perceptually encoded noise relative to the crest, including the trough between the peaks of the pitch harmonics.
  • a corresponding post-processing filter may be set on the decoding end to reduce the coding noise to obtain better perceptual quality.
  • a specific implementation manner of the method for implementing the pitch enhancement post-processing in the audio decoding process provided by the embodiment of the present invention is as shown in FIG. 2, and specifically includes the following steps: Step 1, determining a gain of the received decoded signal according to the decoded signal;
  • a ratio of signal amplitudes in an adjacent period ie, a ratio of a decoded signal to a signal amplitude of an adjacent pitch period
  • i, i+T are the time points corresponding to the decoded signal, and T is the pitch period;
  • the ratio ration is compared to the gain obtained by decoding from the code stream, and the smaller one is taken as the gain value of the final decoded signal.
  • Step 2 determining whether the gain determined in step 1 exceeds a predetermined threshold, and if so, executing step 3, otherwise performing step 4;
  • a judgment threshold E is set for when to use the post-processing filter, that is, the gain value E determined in step 1. . . ">When greater than E to, the corresponding long-term post-filtering operation is performed, otherwise the long-term post-filtering process is not performed; wherein the judging process based on the domain value Eto is mainly considering that the voiced speech frame has a strong period
  • the characteristic of sex is that the gain g p ' decoded from the code stream transmitted from the encoding end can reflect this characteristic of voiced sound.
  • the value of the threshold may be determined according to a specific situation.
  • the threshold may be selected in the range of 0.
  • the decoded signal that is, the pitch synthesis signal obtained by the decoding end decoding
  • step 4 is performed;
  • the all-zero post-filter can be used as a post-filter to attenuate the noise between the fundamental harmonics; wherein, to ensure that the peak of the pitch harmonic is still at the above frequency, the zero should be added to the valley between the harmonics of the pitch
  • the corresponding position of the frequency that is, ⁇ "/T, ..., (2T _ l) * r / T, therefore, the full zero post filter can be used
  • the form of the device is:
  • T is the pitch period, which is the total gain control for the filter, which is a local adjustment factor, and 2 is the 2 transform. symbol;
  • the pitch period T of the all-zero post filter can be determined by using the pitch period determined in the AMR-WB+ codec, for example, using the pitch tracking module.
  • the output T is used as the pitch period.
  • the value range is usually between 0-1, and its value determines the degree of weighting between signals separated by one pitch period.
  • the AMR-WB+ codec is taken as an example.
  • the The value can be selected to be 0.1; in this step, in order to prevent the signal distortion caused by the noise attenuation between the pitch harmonics of the post filter, the adaptive gain control processing method is used to determine the adaptive global gain e i
  • the corresponding process for determining the adaptive global gain is as follows:
  • step 4 the pitch synthesis signal obtained by the decoding end is output. Specifically, it may be as follows: It is assumed that in step 2 and step 3, the decoded pitch synthesis signal is s y nth - in , and the output signal after the pitch long time and post filter processing is s ⁇ th-Gut, then step 2 is performed. And the processing of step 3 can be expressed by the following formula:
  • h is the impulse response function of the adaptive post filter H(z); and the equation ( 8) indicates that there are two kinds of pitch synthesis signals output in step 4:
  • a pitch synthesis signal which is subjected to the adaptive filtering control after the long-term post-filtering process of step 3 to prevent signal distortion caused by the noise attenuation between the pitch harmonics of the post-filter;
  • the other is a pitch synthesis signal that is directly output without the processing of step 3.
  • the embodiment of the present invention further provides a device for implementing the pitch enhancement post-processing in the audio decoding process, and the specific implementation structure is as shown in FIG. 3, which may specifically include the following processing units:
  • the unit is configured to obtain a gain of the decoded signal
  • the unit may specifically include:
  • the ratio determining unit 3011 is configured to determine a ratio of the decoded signal to the signal amplitude of the adjacent pitch period, that is, determine a ratio of the signal amplitude in the previous pitch period to the signal amplitude in the current pitch period;
  • the gain determining unit 3012 of the decoded signal is used for comparing and selecting the ratio to be compared with the gain obtained by the decoding, and taking the smaller of the two as the gain of the decoded signal.
  • the unit is configured to determine whether a gain of the decoded signal determined by the gain evaluation unit exceeds a predetermined threshold
  • the predetermined threshold value selected by the threshold judging unit may be 0.6 when the apparatus is used in the AMR-WB+ decoding process.
  • It is configured to perform long-term post-filtering processing only on the decoded signal whose gain of the decoded signal exceeds a predetermined threshold according to the determination result of the threshold determining unit;
  • the all-zero post filter adopts: the value is 0.1, and the value of the adaptive global gain is 1 + x gain after all zeros. In order to avoid signal distortion caused by the noise reduction between the pitch harmonics of the post filter.
  • a post filter for pitch enhancement may also adopt a comb filter.
  • the comb filter utilizes the strong periodicity of the voiced sound. In the frequency domain, the comb filter retains the fundamental frequency of the sound signal and its harmonic components of integer multiples, suppressing non-harmonic components.
  • the gap between the harmonics is mainly noise-based, ideally, if the fundamental frequency (pitch period) is known, the noise between the harmonics can be completely filtered out.
  • k -L (10) where x(n) is the decoded speech signal and y(n) is the output processed by the comb filter; a k (-L
  • ⁇ k ⁇ L is the 2L+1 tap coefficients of the comb filter, and the coefficients can be adaptive to the change of the spectrum of the speech signal.
  • the value can be configured by referring to the gain of the decoded signal obtained above; For the pitch period T, it is necessary to avoid repeated predictions.
  • the output y(n) is the delay-weighted average of the input x(n) to emphasize the periodic component; when the delay coincides with the pitch period, the averaging process will cause periodicity.
  • the components are enhanced, and those non-periodic components or other components that differ from the signal period are suppressed or completely eliminated.
  • the embodiment of the present invention in the case of performing pitch enhancement processing on the decoded sound signal of the whole frequency band by using the FIR filter, the process of determining the domain value and the process of configuring the filter coefficients can be compared.
  • the embodiment of the present invention is also capable of adapting the energy variation of the sound signal to the decoding end in each subframe to obtain a better pitch enhancement effect.
  • the pitch-enhanced post-processing process can be implemented in a relatively simple operation process, and the perceived quality of the decoded sound is improved.
  • the implementation provided by the embodiment of the present invention improves the perceived quality of the music signal by performing subjective and objective tests on a large number of music sequences while performing pitch enhancement on the speech signal to obtain better perceptual quality. Big.
  • a person skilled in the art can understand that the implementation of each process in the foregoing embodiments may be performed by hardware related to program instructions, and the program may be stored in a readable storage medium, and the program executes the above method when executed. Corresponding steps.
  • the storage medium may be, for example, a ROM/RAM, a magnetic disk, an optical disk, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

实现基音增强后处理的方法及装置 技术领域 本发明涉及音频解码技术领域,尤其涉及一种音频解码过程中的基音增强的自适应 后处理技术。 发明背景 在音频解码过程中, 为改善解码后的语音的感知效果, 需要针对解码后的语音进行 后处理操作。所述后处理的目的是在合成声音信号中增强与感知质量相关的信息, 即降 低或去除使感知质量下降的干扰信息, 以提高感知质量。 目前, 在后处理过程中采用的 技术一般分为共振峰后处理技术和基音后处理技术。在基音后处理技术中, 滤波器的频 率响应需要与谐波相关。
以 AMR-WB+ (Adaptive Multi-Rate Wideband plus, 增强自适应多速宽带)编解码 为例, 其采用的后处理方式为频带可选的基音增强后处理算法。 如图 1所示, 在该后处 理算法中, 具体是将已经解码的合成声音信号分成两个子频带, 对于其中的低频带, 首 先通过自适应基音增强滤波器, 以对低频端基音谐波间的噪声进行削弱, 然后再通过低 通滤波进行处理; 对于另一个频带则直接通过高通滤波器进行滤波处理; 最后, 将分别 经过相应处理的两个频带的信号加和, 从而得到基音增强后的合成声音信号。
在图 1中, 为实现基音增强后处理的目的, 在低频子频带中采用了 Pitch enhancer (基音增强)和 Low-pass filter (低通滤波器)两个模块。 其中:
所述的 Pitch enhancer模块的作用是对已解码信号低频端的内部谐波噪声 ( inter-harmonic noise )进行适当程度的肖 ij弱, 然后再通过 Low- pass filter以滤除 频谱倾斜及其他一些不希望的频率成分; 该 Pitch enhancer模块的实现过程采用了一个 时变的线性滤波器,
所述的 Low-pass filter (低通滤波器)模块为线性相位 FIR (有限脉冲响应)低通 滤波器。在实现过程中, 需要利用经过低通滤波器处理后的信号状态在每个子帧中对寄 存器进行更新。
通过上述后处理方法可以消除解码语音信号低频端的谐波间的噪声成分,使得解码 后的合成声音感知质量有所提高。
在实现本发明过程中,发明人发现已有的基音增强的后处理实现方式至少存在如下 问题:
在已有的基音增强后处理算法中, 需对解码语音信号先进行分频操作, 并对不同的 子频带作不同的滤波处理, 导致相应的后处理过程实现复杂。 发明内容 本发明的实施例提供了一种基音增强的后处理实现方法及装置, 以简化后处理过 程, 提高后处理获得的音频信号的质量。
一种实现基音增强后处理的方法, 包括对解码信号进行后滤波处理的过程, 且该过 程包括:
获取解码信号的增益, 判断所述增益是否超过预定的阈值, 并在确定所述增益超过 所述预定的阈值后, 对解码信号进行后滤波处理。
一种实现基音增强后处理的装置, 包括:
增益评估单元, 用于获取解码信号的增益;
阈值判断单元,用于判断所述增益评估单元确定的所述解码信号的增益是否超过预 定的阈值;
自适应后滤波器, 用于根据所述阈值判断单元的判断结果, 仅对所述解码信号的增 益超过预定的阈值的解码信号进行长时后滤波处理。
一种计算机程序产品, 所述计算机程序产品包括计算机程序代码, 当所述计算机程 序代码被一个计算机执行的时候,所述计算机程序代码可以使得所述计算机执行所述实 现基音增强后处理的方法包含的步骤。
由上述本发明的实施例提供的技术方案可以看出, 本发明实施例中针对滤波器系 数的配置和阈值的判断的处理过程的实现较为简单, 并可以获得较佳的基音增强效果。 同时, 本发明实施例中是针对整个的解码语音信号进行基音增强处理, 而不需进行分频 处理,也不用分别地进行低通滤波和高通滤波操作,进一步降低了处理过程的复杂程度。 附图简要说明 图 1为现有技术中采用的基音增强的后处理实现原理示意图;
图 2为本发明实施例提供的方法的处理过程示意图;
图 3为本发明实施例提供的装置的结构示意图;
图 4为本发明实施例中的增益评估单元的结构示意图。 实施本发明的方式 本发明实施例中, 充分利用已解码信号的能量特点, 将其与解码得到的基音增益和 基音周期值进行比较以取得最能反映声音特点的基音信息,从而提供了选择使用基音增 强后处理滤波器以使解码信号有更好的感知质量的域值评估和判定实现方案。
本发明实施例中, 具体可以为: 首先, 获取解码信号的增益, 之后, 判断所述的增 益是否超过预定的阈值, 若超过, 则对解码信号进行长时后滤波处理后输出, 否则, 可 以直接输出所述解码信号。其中, 所述的对解码信号进行后滤波处理所采用的后滤波器 可以但不限于为全零点后滤波器。
另外, 本发明实施例中, 若所述的后滤波器选择全零点后滤波器, 则还可以针对相 应的滤波器函数中涉及的局部调整因子 ^和自适应全局增益 ,给出了进一步提高音频 感知质量的具体参数值。 当然, 本发明实施例中也可以采用其他类型的后滤波器进行后 滤波处理。
为便于对本发明实施例的理解, 首先对基音谐波间编码噪声的产生原因进行说明。 以 AMR-WB+编码为例, 其中的语音编码部分采用 CELP (码激励线性预测, Code-Excited Linear Prediction)编码技术。 在编码端, 对输入信号进行预加重处理, 并进行 16-阶的线性预测分析后, 再用基音合成滤波器对其进行编码处理。 所述的基音 合成滤波器的表达式为:
1 1
其中, τ是基音周期, ^是基音增益, 2是变换的符号。
在语音感知理论中, 语音的共振峰部分要比语音的波谷部分对听觉感知更重要; 因此, 在较低编码速率下, 通常需要牺牲波谷区域的性能而尽量使对共振峰的编码更优 越。这就使得波谷相对于波峰可能包含更多的感知编码噪声, 包括基音谐波峰值之间的 波谷。
基于上述编码噪声产生的原因, 本发明实施例中, 在解码端, 可以设置相应的后 处理滤波器, 以削减所述的编码噪声, 以便获得更好的感知质量。
下面将结合附图对本发明实施例的具体实现过程进行说明。
本发明实施例提供的音频解码过程中实现基音增强后处理的方法的具体实现方式 如图 2所示, 具体包括以下步骤: 步骤 1, 根据解码信号确定接收到的解码信号的增益;
具体可以为: 在相邻一个周期内的信号幅值的比值(即解码信号与相邻基音周期 的信号幅值的比值) 为-
Figure imgf000006_0001
式 (2) 中, i、 i+T为解码信号对应的时间点, T为基音周期;
将该比值 ration与从码流中解码获得的增益进行比较, 并取其中较小的一个值作 为最终的解码信号的增益值。
步骤 2, 判断步骤 1确定的增益是否超过预定的阈值, 如果是, 则执行步骤 3, 否则 执行步骤 4;
本发明实施例中, 根据解码后合成声音信号的当前基音周期和邻近基音周期的信 号能量特点, 对于何时使用后处理滤波器, 设置了一个判断阈值 E , 即当步骤 1确定的 增益值 E。。">大于 Eto时,才进行相应的长时后滤波操作,否则不进行该长时后滤波处理; 其中,基于所述域值 Eto的判断处理主要是考虑到浊音语音帧具有较强的周期性的 特点, 即: 从编码端传送的码流中解码出的增益 gp '即能反映出浊音的这种特性。 根据 大量的程序调试和对参量的变化情况的观察, 可看出: 在浊音帧中, gp '的值较大且接 近于一个稳定的值; 在清音帧中, gp '则较小, 并有很大一部分趋近于 0; 总体来看, 的值和当前基音周期的信号幅值与前一个基音周期的信号幅值的比值大体相近; 以 AMR-WB+编解码为例,经大量实验,并比较各次实验后解码信号与原声音信号之间的 pesq
(客观话音质量评定)差值, 可以选择 0. 6;
需要说明的是, 根据不同的编解码框架, 可以根据具体情况确定所述阈值的取值, 例如, 在除 AMR-WB+编解码外的其他编解码过程中, 所述阈值的选取范围可在 0— 1之间; 步骤 3, 对解码信号(即解码端解码获得的基音合成信号)进行长时后滤波处理后 输出, 并执行步骤 4;
具体可以采用全零点后滤波器作为后滤波器对基音谐波间的噪声做削弱处理; 其 中, 为保证基音谐波的波峰仍在以上的频率处, 零点应添加在与基音谐波间的波谷位置 相对应的频率处, 即在^ "/T , ……, (2T _ l)* r/T处, 因此, 可以采用的全零点后滤波 器的形式为:
H(z) = G!x(l+/lxz"T) (3) 式 (3) 中, T为基音周期, 为对该滤波器的总的增益控制, 为一个局部调整 因子, 22变换的符号;
在该步骤中, 以采用 AMR-WB+编解码为例, 则所述全零点后滤波器的基音周期 T的 确定可以采用 AMR-WB+编解码中所采用的基音周期确定方式, 如采用 pitch tracking模 块输出的 T作为基音周期。 为避免出现 pitch doubling (双基音)现象, 还需计算延迟 为 T/2的两处信号的归一化自相关值, 若所述的归一化自相关值大于 0.95, 则将 T/2作为 后处理中的新的基音周期值, 以在低频端更精确并实时地得到相应基音周期值;
在该步骤中, 的取值范围通常在 0-1之间, 其取值决定了相隔一个基音周期的信 号之间的加权程度, 仍以 AMR-WB+编解码为例, 经实验后, 所述的 值可以选择为 0.1; 在该步骤中, 为防止后滤波器对基音谐波间的噪声削弱的同时所带来的信号扭曲, 则采用自适应增益控制的处理方式确定自适应全局增益 ei,相应的确定自适应全局增益 的过程如下:
假设 k时刻该后处理滤波器的输入为 χ(η) 输出为 y(n) , 则从(3)的传输函数可 得
y(n) = G1x[x(n)+ lxx(n-T)] (4) 对于浊音帧, 根据浊音的强周期性可知, 其相邻基音周期内的波形可看作是幅度 上有些微的差异, 所以可令:
x(n - T) « gain x x(n) (5)
将(5)代入到 (4) 中, 可得
y(n) - G^^ + Axgainlxxin) (6) 由以上推导可以看出, 若不做自适应的增益控制, 则滤波器在完成削弱谐波间噪 声的基音增强后处理的同时会使输出 y(n)比输入 大出很多, 将使最终的合成语音信 号的感知质量大大下降; 故选择自适应全局增益 的值为:
1 + X gain (了) 这样, 便可以确定全零点后滤波器的各参数。 步骤 4, 将解码端获得的基音合成信号输出。 具体可以为: 假设在步骤 2和步骤 3中, 解码后的基音合成信号为 synth-in, 进行 基音长时后滤波处理后的输出信号为 s^th-Gut, 则所述的步骤 2和步骤 3的处理可以通 过下式表示:
synth— in, if Ecom <Ethr
synth_out=
synth— in ® h, if Ecom≥Ethr
(8)
式 (8) 中, h为自适应后滤波器 H(z)的脉冲响应函数; 且该式 (8 )表示, 在步 骤 4中输出的基音合成信号有两种:
( 1 )一种是经过步骤 3的长时后滤波处理后, 且经过自适应增益控制的基音合成 信号, 以防止后滤波器对基音谐波间的噪声削弱的同时所带来的信号扭曲;
(2) 另一种为未经步骤 3处理而直接输出的基音合成信号。 本发明实施例还提供了一种音频解码过程中实现基音增强后处理的装置, 其具体 实现结构如图 3所示, 具体可以包括以下处理单元:
( 1 )增益评估单元 301
该单元用于获取解码信号的增益;
如图 4所示, 该单元具体可以包括:
比值确定单元 3011, 用于确定解码信号与相邻基音周期的信号幅值的比值, 即确定 上一基音周期内的信号幅值与当前基音周期内的信号幅值的比值;
解码信号的增益确定单元 3012,用于比较并选择所述比值与解码获得的增益进行比 较, 并取两者中较小的一个值作为解码信号的增益。
(2) 阈值判断单元 302
该单元用于判断所述增益评估单元确定的所述解码信号的增益是否超过预定的阈 值;
若所述装置用于 AMR-WB+解码过程中,则该阈值判断单元选择的所述预定的阈值可 以为 0. 6。
( 3) 自适应后滤波器 303
其用于根据所述阈值判断单元的判断结果,仅对所述解码信号的增益超过预定的阈 值的解码信号进行长时后滤波处理;
所述的自适应后滤波器可以选择全零点后滤波器, 且所述全零点后滤波器的函数 为: H(z) = Gl X(l + x z— ) ; 其中, 为自适应全局增益, 1为局部调整因子, T为基音周期;
而且, 若所述装置用于 AMR-WB+解码过程中, 则所述全零点后滤波器采用: 所述 的 值为 0.1,且所述自适应全局增益的值 1 + x gain 的全零点后滤波器, 以便 于避免所述后滤波器可以对基音谐波间的噪声削弱的同时所带来的信号扭曲。
需要说明的是, 本发明实施例中, 用于基音增强的后滤波器也可采用梳状滤波器 (Comb filter) 。 梳状滤波器利用了浊音的强周期性, 在频域, 梳状滤波器能够保留 声音信号的基频及其整数倍数的各谐波分量, 抑制非谐波分量。
由于各谐波之间的间隙基本以噪声为主, 故在理想情况下, 若获知基频 (基音周 期)便可以将谐波之间的噪声完全滤掉。
本发明实施例中采用梳状滤波器的传输函数为-
Figure imgf000009_0001
相对应的时域表达式为:
L
y(n) =∑ akx(n-kT)
k=-L (10) 其中, x(n) 是解码后的语音信号, y(n) 是经梳状滤波器处理后的输出; ak (-L
^k^L) 是梳状滤波器的 2L+1个抽头系数, 系数 可以自适应于语音信号谱的变化, 在各个子帧中, 的取值可参考上述获得的解码信号的增益进行配置; 对基音周期 T, 要避免重复预测的情况。
从式 (10) 中可以看出, 输出 y(n) 是输入 x(n) 的延时加权平均值, 以强调周 期性分量; 当延时与基音周期一致时, 这个平均过程会使周期性分量得到加强, 而那些 非周期性分量或其它与信号周期不同的分量将受到抑制或彻底消除。 综上所述, 本发明实施例中, 在采用 FIR滤波器对全频带的解码声音信号进行基音 增强后处理的情况下, 所述域值的判断过程, 以及滤波器系数的配置过程均可以较为简 单地实现, 而且, 本发明实施例还能够在每个子帧中自适应于解码端合成声音信号的能 量变化, 得到较优的基音增强效果。 例如, 基于 AMR-WB+编解码框架, 可在相对简单的 操作过程中实现基音增强的后处理过程, 提高了解码声音的感知质量。 而且,本发明实施例提供的实现方案在对语音信号进行基音增强以获得较好感知质 量的同时, 经过对大量音乐序列进行的主客观 测试, 发现其对音乐信号的感知质量的 提高程度也非常大。 本领域普通技术人员可以理解实现上述实施例中的各处理过程可以通过程序指令 相关的硬件来完成, 所述的程序可以存储于可读取存储介质中, 该程序在执行时执行上 述方法中的对应步骤。 所述的存储介质可以如: ROM/RAM、 磁碟、 光盘等。 以上所述, 仅为本发明较佳的具体实施方式, 但本发明的保护范围并不局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到的变化或替 换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应该以权利要求的保 护范围为准。

Claims

权利要求
1、 一种实现基音增强后处理的方法, 包括对解码信号进行后滤波处理的过程, 其 特征在于, 该过程包括:
获取解码信号的增益, 判断所述增益是否超过预定的阈值, 并在确定所述增益超过 所述预定的阈值后, 对解码信号进行后滤波处理。
2、根据权利要求 1所述的方法, 其特征在于, 所述的获取解码信号的增益的步骤具 体包括:
确定解码信号与相邻基音周期的信号幅值的比值;
将所述比值与解码获得的增益进行比较,并取两者中较小的一个值作为所述解码信 号的增益。
3、根据权利要求 1所述的方法, 其特征在于, 所述的对解码信号进行后滤波处理的 步骤包括:
采用全零点后滤波器对解码信号进行后滤波处理, 且所述全零点后滤波器的函数 为: H (z) = G ' x G + A x z -T ), 其中, 为自适应全局增益, 为局部调整因子, τ 为基音周期, 22变换的符号。
4、根据权利要求 3所述的方法, 其特征在于, 在增强自适应多速宽带 AMR-WB +编解 码过程中, 所述的 值选择为 0. 1, 且自适应全局增益: G i -. 1
' 1 + A x Sain, 其中, gain为在各个子帧中解码信号的增益。
5、 根据权利要求 1、 2、 3或 4所述的方法, 其特征在于, 在 AMR-WB +编解码过程中, 所述的预定的阈值为 0. 6。
6、 一种实现基音增强后处理的装置, 其特征在于, 包括:
增益评估单元, 用于获取解码信号的增益;
阈值判断单元,用于判断所述增益评估单元确定的所述解码信号的增益是否超过预 定的阈值;
自适应后滤波器, 用于根据所述阈值判断单元的判断结果, 仅对所述解码信号的增 益超过预定的阈值的解码信号进行后滤波处理。
7、 根据权利要求 6所述的装置, 其特征在于, 所述的增益评估单元具体包括: 比值确定单元, 用于确定解码信号与相邻基音周期的信号幅值的比值;
解码信号的增益确定单元, 用于将所述比值与解码获得的增益进行比较, 并取两者 中较小的一个值作为解码信号的增益。
8、根据权利要求 6所述的装置, 其特征在于, 所述的自适应后滤波器为全零点后滤 波器, 且所述全零点后滤波器的函数为:
H(z) = Gl X(l+Axz-T); 其中, 为自适应全局增益, 1为局部调整因子, T为基音周期。
9、根据权利要求 8所述的装置, 其特征在于, 在所述装置用于 AMR-WB+解码过程中 时,所述全零点后滤波器采用:所述的 ^值为 0.1,且自适应全局增益 1 + X g ain 的全零点后滤波器。
10、根据权利要求 6、 7、 8或 9所述的装置, 其特征在于, 在所述装置用于 AMR-WB +解码过程中时, 阈值判断单元选择的所述预定的阈值为 0.6。
11、一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码, 当所述计算机程序代码被一个计算机执行的时候,所述计算机程序代码可以使得所述计 算机执行权利要求 1至 5项中任意一项的步骤。
PCT/CN2008/070931 2007-05-11 2008-05-09 Procede de post-traitement et appareil d'amelioration de ton fondamental WO2008138267A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710104394.5 2007-05-11
CN 200710104394 CN101303858B (zh) 2007-05-11 2007-05-11 实现基音增强后处理的方法及装置

Publications (1)

Publication Number Publication Date
WO2008138267A1 true WO2008138267A1 (fr) 2008-11-20

Family

ID=40001704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/070931 WO2008138267A1 (fr) 2007-05-11 2008-05-09 Procede de post-traitement et appareil d'amelioration de ton fondamental

Country Status (2)

Country Link
CN (1) CN101303858B (zh)
WO (1) WO2008138267A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383151B (zh) * 2007-09-06 2011-07-13 中兴通讯股份有限公司 一种数字音频质量增强系统和方法
CN101587711B (zh) * 2008-05-23 2012-07-04 华为技术有限公司 基音后处理方法、滤波器以及基音后处理系统
CN101604525B (zh) * 2008-12-31 2011-04-06 华为技术有限公司 基音增益获取方法、装置及编码器、解码器
DK2732638T3 (en) * 2011-07-14 2015-12-07 Sonova Ag Speech enhancement system and method
CN104205213B (zh) * 2012-03-23 2018-01-05 西门子公司 语音信号处理方法及装置以及使用其的助听器
CN102930872A (zh) * 2012-11-05 2013-02-13 深圳广晟信源技术有限公司 用于宽带语音解码中基音增强后处理的方法及装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0527791A (ja) * 1991-07-22 1993-02-05 Nec Corp 音声合成器
US5506934A (en) * 1991-06-28 1996-04-09 Sharp Kabushiki Kaisha Post-filter for speech synthesizing apparatus
US5752222A (en) * 1995-10-26 1998-05-12 Sony Corporation Speech decoding method and apparatus
US5774835A (en) * 1994-08-22 1998-06-30 Nec Corporation Method and apparatus of postfiltering using a first spectrum parameter of an encoded sound signal and a second spectrum parameter of a lesser degree than the first spectrum parameter
EP1308932A2 (en) * 2001-10-03 2003-05-07 Broadcom Corporation Adaptive postfiltering methods and systems for decoding speech
JP2004015537A (ja) * 2002-06-07 2004-01-15 Matsushita Electric Ind Co Ltd オーディオ信号符号化装置
US20040019481A1 (en) * 2002-07-25 2004-01-29 Mutsumi Saito Received voice processing apparatus
US6941263B2 (en) * 2001-06-29 2005-09-06 Microsoft Corporation Frequency domain postfiltering for quality enhancement of coded speech

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW376611B (en) * 1998-05-26 1999-12-11 Koninkl Philips Electronics Nv Transmission system with improved speech encoder
CN1186765C (zh) * 2002-12-19 2005-01-26 北京工业大学 2.3kb/s谐波激励线性预测语音编码方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5506934A (en) * 1991-06-28 1996-04-09 Sharp Kabushiki Kaisha Post-filter for speech synthesizing apparatus
JPH0527791A (ja) * 1991-07-22 1993-02-05 Nec Corp 音声合成器
US5774835A (en) * 1994-08-22 1998-06-30 Nec Corporation Method and apparatus of postfiltering using a first spectrum parameter of an encoded sound signal and a second spectrum parameter of a lesser degree than the first spectrum parameter
US5752222A (en) * 1995-10-26 1998-05-12 Sony Corporation Speech decoding method and apparatus
US6941263B2 (en) * 2001-06-29 2005-09-06 Microsoft Corporation Frequency domain postfiltering for quality enhancement of coded speech
EP1308932A2 (en) * 2001-10-03 2003-05-07 Broadcom Corporation Adaptive postfiltering methods and systems for decoding speech
JP2004015537A (ja) * 2002-06-07 2004-01-15 Matsushita Electric Ind Co Ltd オーディオ信号符号化装置
US20040019481A1 (en) * 2002-07-25 2004-01-29 Mutsumi Saito Received voice processing apparatus

Also Published As

Publication number Publication date
CN101303858A (zh) 2008-11-12
CN101303858B (zh) 2011-06-01

Similar Documents

Publication Publication Date Title
AU2006232358B2 (en) Systems, methods, and apparatus for highband burst suppression
JP5722437B2 (ja) 広帯域音声コーディングのための方法、装置、およびコンピュータ可読記憶媒体
KR101699898B1 (ko) 스펙트럼 영역에서 디코딩된 오디오 신호를 처리하기 위한 방법 및 장치
AU2003233722B2 (en) Methode and device for pitch enhancement of decoded speech
JP5047268B2 (ja) Mdct係数を使用する音声後処理
US10730329B2 (en) Frequency band extension in an audio signal decoder
JP2012163981A (ja) オーディオコーデックポストフィルタ
EP3427256B1 (en) Hybrid concealment techniques: combination of frequency and time domain packet loss concealment in audio codecs
WO2008138267A1 (fr) Procede de post-traitement et appareil d&#39;amelioration de ton fondamental
Vaillancourt et al. New post-processing techniques for low bit rate celp codecs
CN115428069A (zh) 低音后置滤波器的低成本适配

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08734283

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08734283

Country of ref document: EP

Kind code of ref document: A1