WO2012159370A1 - Voice enhancement method and device - Google Patents

Voice enhancement method and device Download PDF

Info

Publication number
WO2012159370A1
WO2012159370A1 PCT/CN2011/078087 CN2011078087W WO2012159370A1 WO 2012159370 A1 WO2012159370 A1 WO 2012159370A1 CN 2011078087 W CN2011078087 W CN 2011078087W WO 2012159370 A1 WO2012159370 A1 WO 2012159370A1
Authority
WO
WIPO (PCT)
Prior art keywords
linear prediction
coefficient
prediction coefficients
lifting factor
prediction coefficient
Prior art date
Application number
PCT/CN2011/078087
Other languages
French (fr)
Chinese (zh)
Inventor
田薇
李玉龙
邝秀玉
贺知明
Original Assignee
华为技术有限公司
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 电子科技大学 filed Critical 华为技术有限公司
Priority to CN201180001446.0A priority Critical patent/CN103038825B/en
Priority to PCT/CN2011/078087 priority patent/WO2012159370A1/en
Publication of WO2012159370A1 publication Critical patent/WO2012159370A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information

Definitions

  • Embodiments of the present invention relate to the field of communications, and in particular, to a voice enhancement method and apparatus. Background technique
  • the Tandem scheme When the Tandem scheme is used for code stream conversion, the speech quality is impaired due to the inclusion of two distortion compressions, and the objective Mean Opinion Score (MOS) decreases, which affects the intelligibility of the speech.
  • MOS Mean Opinion Score
  • the Transcoding scheme can greatly reduce the amount of computation.
  • the voice quality due to the mismatch between the rates of the two streams, the voice quality is still impaired after the stream conversion, and the speech is understandable.
  • the degree of decline that is, the level of recognition of speech decreases.
  • One technical problem to be solved by the present invention is to overcome the shortcomings of the prior art in improving the speech intelligibility while reducing the speech quality, and to provide a high effect by using the formant and the medium and high frequency components of the speech to the intelligibility of the speech.
  • a speech enhancement method for frequency compensation is to overcome the shortcomings of the prior art in improving the speech intelligibility while reducing the speech quality, and to provide a high effect by using the formant and the medium and high frequency components of the speech to the intelligibility of the speech.
  • a speech enhancement method comprising: acquiring M first linear prediction coefficients of a voiced frame signal, where M is an order of a linear prediction filter;
  • Obtaining a lifting factor wherein the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients; Modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the second short time corresponding to the M second linear prediction coefficients obtained after the modification is performed.
  • the spectral envelope is enhanced compared to the first short-time spectral envelope corresponding to the M first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
  • a voice enhancement device includes: an ear module, M first linear prediction coefficients for acquiring a voiced frame signal, where M is an order of the linear prediction filter ;
  • a processing module configured to obtain a lifting factor, where the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients;
  • a synthesizing module modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the M corresponding second linear prediction coefficients obtained by the modification correspond to Compared with the first short-time spectral envelope corresponding to the M first linear prediction coefficients, the two short-time spectral envelopes are enhanced in the formant energy and the medium-high frequency spectral components are compensated to some extent.
  • the lifting factor includes the correlation between the frequencies of the speech, and the modification of the speech short-term spectral envelope is obtained by modifying the M first linear prediction coefficients, and also includes the correlation of the speech, so that The modified short-time spectral envelope has its formant energy enhanced and the mid-high frequency spectral components of speech loss are compensated to some extent.
  • the effect of the resonance energy on the speech quality and the contribution of the high-frequency spectral components in the speech to the speech intelligibility, after the processing of the method of the embodiment of the present invention, the quality and intelligibility of the speech are jointly improved.
  • the speech enhancement method according to the embodiment of the present invention has a simple calculation process, good robustness, can simultaneously improve the intelligibility and quality of speech, and can recover high frequency components lost due to coding distortion, and is particularly suitable for improving convergence and intercommunication of different gateways. The resulting deterioration in communication voice quality.
  • FIG. 3 is a comparison of the voiced frames in the frequency domain after the cascading scheme and the voice enhancement method of the embodiment of the present invention, wherein FIG. 3(a) is the original voice, and FIG. 3(b) is the original voice processed by the cascading scheme.
  • Figure 3 (c) is a frequency distribution after the cascaded speech is processed by the speech enhancement method of the embodiment of the present invention;
  • FIG. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
  • Figure 6 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic hardware structural diagram of a device for implementing an embodiment of the present invention. detailed description
  • the technical solution of the present invention can be applied to various communication systems, such as: GSM, Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), general packet Wireless Service (GPRS, General Packet Radio Service), Long Term Evolution (LTE), etc.
  • GSM Global System for Mobile Communications
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • GPRS General Packet Radio Service
  • LTE Long Term Evolution
  • FIG. 1 is a flow chart of a method 100 for enhancing voice transmission in accordance with an embodiment of the present invention. As shown in FIG. 1, the method 100 includes:
  • the acquired voiced frame can be set as the transfer function of the voice transmission.
  • M is the order of the linear prediction filter and is the first linear prediction coefficient.
  • the boosting factor is obtained based on the correlation between the frequencies in the short-term spectral envelope corresponding to the M first linear predictive coefficients.
  • the first linear prediction coefficient is calculated according to the following formula:
  • the short-term spectral envelope of the speech frame can be defined as:
  • Step 130 is specifically described below, that is, the first linear prediction coefficients are modified according to the correlation between the lifting factor and the first linear prediction coefficients, so that the second linear prediction coefficients obtained after the modification are modified.
  • the corresponding second short-term spectral envelope is enhanced compared with the first short-time spectral envelope corresponding to the first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
  • the first linear prediction coefficient of the input speech frame signal is normalized as follows:
  • the voiced frame signal can be linearly filtered by using equation (15), thereby obtaining an intelligibility improvement.
  • Voice frame signal ⁇ i xy(n - i)
  • the method of the embodiment of the present invention may include the process of determining whether the voice frame is a voiced frame, and only the voice frame is voiced.
  • the voice frame is processed according to the method of the embodiment of the present invention, and when the voice frame is an unvoiced frame, the output is directly output, thereby saving processing resources and improving processing efficiency.
  • the speech frame signal may be pre-emphasized, for example, pre-emphasized according to equation (16):
  • FIG. 2 is an LPC spectrum of a voiced frame processed using the prior art cascade scheme and the voice enhancement method of the embodiment of the present invention.
  • the LPC spectrum of the voiced frames processed by the speech enhancement method of the present invention is generally enhanced, including not only the enhancement of the formant energy.
  • FIG. 3 is a comparison of the voiced frames in the frequency domain after the cascading scheme and the voice enhancement method of the embodiment of the present invention, wherein FIG. 3(a) is the original voice, and FIG. 3(b) is the original voice processed by the cascading scheme.
  • FIG. 3(c) is a speech enhancement after cascading speech through an embodiment of the present invention The frequency distribution after the method is processed. It can be seen from the comparison of Figs. 3(b) and 3(c) that after the speech enhancement method of the embodiment of the present invention, the medium and high frequency components in the original speech are significantly compensated.
  • FIG. 4 is a DRT score of the original speech, the concatenated processed speech, and the speech processed according to the method of the embodiment of the present invention.
  • 0 denotes original speech
  • I denotes speech after one cascade processing
  • II denotes a speech frame after secondary concatenation processing
  • III denotes a speech frame after three sub-continuous processing
  • ell denotes according to the present
  • the method of the embodiment of the invention processes the secondary concatenated speech frame
  • elll represents the method for processing the three sub-linked speech frames according to the method of the embodiment of the invention. Comparing III and elll, it can be seen that DRT can be increased by up to 6.26% after being processed by the method of the embodiment of the present invention.
  • the lifting factor includes the correlation between the frequencies of the speech, and the modification of the speech short-term spectral envelope is obtained by modifying the M first linear prediction coefficients, and also includes the correlation of the speech, so that The modified short-time spectral envelope has its formant energy enhanced and the mid-high frequency spectral components of speech loss are compensated to some extent.
  • the effect of the resonance energy on the speech quality and the contribution of the high-frequency spectral components in the speech to the speech intelligibility, after the processing of the method of the embodiment of the present invention, the quality and intelligibility of the speech are jointly improved.
  • the calculation process is simple and robust. Since the correlation between the respective frequencies of the speech is utilized, the prior art can solve the problem of processing the distortion formant enhancement or the resonance peak information loss, and can well recover the high loss due to different network fusion. Frequency component.
  • FIG. 5 is a schematic structural diagram of a voice enhancement device 200 according to an embodiment of the present invention.
  • the speech enhancement device can be used to implement the methods of embodiments of the present invention.
  • the voice enhancement device 200 includes: an acquisition module 210, configured to acquire M first linear prediction coefficients of a voiced frame signal, where M is a P-means of the linear prediction filter;
  • the processing module 220 is configured to obtain a lifting factor, where the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients;
  • the synthesizing module 230 is configured to modify the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the M second linear prediction coefficients obtained after the modification correspond to The second short-term spectral envelope is enhanced compared to the first short-time spectral envelope corresponding to the M first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
  • the obtaining module 210 is configured to calculate the first linear prediction coefficient by using a Levinson-Dubin recursive algorithm according to an autocorrelation function of the voiced frame.
  • the processing module is configured to calculate the lifting factor according to the above formulas (10) - (12).
  • the synthesizing module is configured to modify the first linear prediction coefficient by using the above formula (13) to obtain the second linear prediction coefficient.
  • the speech enhancement apparatus 200 further includes a filtering module 240 for linearly filtering the voiced frame signal according to the second linear prediction coefficient, according to an embodiment of the present invention.
  • the voice enhancement device 200 further includes a pre-emphasis module 250, configured to use the foregoing formula (16) before the acquiring module acquires M first linear prediction coefficients of the voiced frame signal according to an embodiment of the present invention.
  • the voiced frame signal is pre-emphasized.
  • the acquiring module may be configured to determine whether the voice frame is a voiced frame, and only if the voice frame is a voiced frame, the voice frame is processed according to the method of the embodiment of the present invention, and the voice frame is processed. In the case of unvoiced frames, direct output is used to save processing resources and improve processing efficiency.
  • the speech enhancement device 200 can be implemented by using various hardware devices, such as a digital signal processing (DSP) chip, wherein the obtained module 210
  • DSP digital signal processing
  • the processing module 220, the synthesizing module 230, and the filtering module 240 may each be implemented based on separate hardware devices, or may be integrated into one hardware device.
  • FIG. 7 is a schematic hardware architecture 700 of a speech enhancement device 200 for implementing an embodiment of the present invention.
  • the hardware structure 700 includes a DSP chip 710, a memory 720, and an interface unit 730.
  • the DSP chip 710 can be used to implement the processing functions of the voice enhancement device 200 of the embodiment of the present invention, including the processing functions of the acquisition module 210, the processing module 220, the synthesis module 230, and the filtering module 240.
  • the memory 720 can be used to store the voiced frame signals to be processed and intermediate variables of the processing and processed voiced frame signals and the like.
  • the interface unit 730 can be used for data transmission with a subordinate device.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including
  • a plurality of instructions are used to make a computer device (which may be a personal computer, a server, and the storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM, a random access memory).
  • Memory a variety of media such as a disk or a disc that can store program code.

Abstract

The embodiments of the present invention relate to a voice enhancement method and device. The voice enhancement method includes: acquiring M first linear prediction coefficients of a voiced sound frame signal, wherein M is the order of a linear prediction filter; acquiring a raising factor, wherein the raising factor is obtained according to the relevance among the frequencies in the short-time spectrum envelope corresponding to the M first linear prediction coefficients; modifying the M first linear prediction coefficients according to the relevance between the raising factor and the M first linear prediction coefficients so that the formant energy of a second short-time spectrum envelope corresponding to M second linear prediction coefficients obtained after modification is enhanced and the medium-high frequency spectrum components thereof are compensated to a certain extent as compared to the first short-time spectrum envelope corresponding to the M first linear prediction coefficients. Given the determining effect of the formant energy on the tone quality of the voice and the contribution to the sentence intelligibility of the voice by the medium-high frequency spectrum components of the voice, after the processing of the method in the embodiments of the present invention, the quality of and intelligibility of the voice are improved together.

Description

语音增强方法和 i殳备  Voice enhancement method and i equipment
技术领域 Technical field
本发明实施例涉及通信领域, 具体地说, 涉及语音增强方法和设备。 背景技术  Embodiments of the present invention relate to the field of communications, and in particular, to a voice enhancement method and apparatus. Background technique
无线技术的发展, 使得网络间互融的现象日趋增多, 而要实现网络间的 互通, 则需进行不同码流之间的转换。 如要实现 IP 电话网和移动电话网的 融合, 以手机拨打 IP 电话为例(其中 IP 电话的语音编码使用的比较多的 G..723和 G..729协议; 而移动通信领域使用的比较多的是自适应多码率语音 编码 (AMR)标准), 则需要实现 G.729和 AMR这两种不同码流之间的转换。 目前码流间的转换主要有两种方案,级联( Tandem )和码流转换( Transcoding ) 方案。 采用 Tandem方案进行码流转换时, 由于包括了两次失真压缩, 语音 质量受损, 客观平均意见得分(Mean Opinion Score, 筒称 MOS ) 下降, 影 响了语音的可懂度。 而 Transcoding 方案相对于前一种方案是能大幅度的降 低运算量, 但由于两种码流之间的速率等存在不匹配, 故经过码流转换后其 语音质量仍然受损, 语音的可懂度发生下降, 即语音的辨识水平下降。  With the development of wireless technology, the phenomenon of mutual integration between networks is increasing. To achieve inter-network communication, different code streams need to be converted. To realize the convergence of IP telephony and mobile telephony, take the IP phone as an example (the G..723 and G..729 protocols used in the voice coding of IP telephony; and the comparison in the mobile communication field) Most of them are adaptive multi-rate speech coding (AMR) standards, and it is necessary to implement conversion between two different streams of G.729 and AMR. At present, there are mainly two schemes for conversion between streams, Tandem and Transcoding. When the Tandem scheme is used for code stream conversion, the speech quality is impaired due to the inclusion of two distortion compressions, and the objective Mean Opinion Score (MOS) decreases, which affects the intelligibility of the speech. Compared with the former scheme, the Transcoding scheme can greatly reduce the amount of computation. However, due to the mismatch between the rates of the two streams, the voice quality is still impaired after the stream conversion, and the speech is understandable. The degree of decline, that is, the level of recognition of speech decreases.
现有技术其语音可懂度的提升有可能会同时放大或引入刺耳的噪声、 带 来失真甚至是畸变, 且无法恢复丟失的高频成份。 也就是说现有技术中的语 音可懂度提升是以语音质量的牺牲为代价的, 即目前的技术难于实现语音可 懂度与语音质量的共同提升。 发明内容  The advancement of speech intelligibility in the prior art may simultaneously amplify or introduce harsh noise, distortion or even distortion, and cannot recover lost high frequency components. That is to say, the improvement of speech intelligibility in the prior art is at the expense of the sacrifice of speech quality, that is, the current technology is difficult to achieve the common improvement of speech intelligibility and speech quality. Summary of the invention
本发明所要解决的一个技术问题在于,克服现有技术在提升语音可懂度 的同时语音质量下降的缺点, 利用共振峰以及语音的中高频成分对语音可懂 度的作用, 提供一种具有高频补偿作用的语音增强方法。  One technical problem to be solved by the present invention is to overcome the shortcomings of the prior art in improving the speech intelligibility while reducing the speech quality, and to provide a high effect by using the formant and the medium and high frequency components of the speech to the intelligibility of the speech. A speech enhancement method for frequency compensation.
根据本发明实施例, 提出了一种语音增强方法, 所述方法包括: 获取浊音帧信号的 M个第一线性预测系数, 其中 M是线性预测滤波器 的阶数;  According to an embodiment of the present invention, a speech enhancement method is provided, the method comprising: acquiring M first linear prediction coefficients of a voiced frame signal, where M is an order of a linear prediction filter;
获取提升因子, 其中, 所述提升因子根据所述 M个第一线性预测系数 对应的短时谱包络中频率之间的相关性得到; 根据所述提升因子以及所述 M个第一线性预测系数之间的相关性修改 所述 M个第一线性预测系数, 使得修改后得到的 M个第二线性预测系数所 对应的第二短时谱包络与所述 M个第一线性预测系数所对应的第一短时谱 包络相比, 共振峰能量得到增强并且中高频频谱分量得到一定程度的补偿。 Obtaining a lifting factor, wherein the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients; Modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the second short time corresponding to the M second linear prediction coefficients obtained after the modification is performed The spectral envelope is enhanced compared to the first short-time spectral envelope corresponding to the M first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
根据本发明实施例, 提出了一种语音增强设备, 所述设备包括: 获耳 ^莫块, 用于获取浊音帧信号的 M个第一线性预测系数, 其中 M是 线性预测滤波器的阶数;  According to an embodiment of the present invention, a voice enhancement device is provided, where the device includes: an ear module, M first linear prediction coefficients for acquiring a voiced frame signal, where M is an order of the linear prediction filter ;
处理模块, 用于获取提升因子, 其中, 所述提升因子根据所述 M个第 一线性预测系数对应的短时谱包络中频率之间的相关性得到;  a processing module, configured to obtain a lifting factor, where the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients;
合成模块, 根据所述提升因子以及所述 M个第一线性预测系数之间的 相关性修改所述 M个第一线性预测系数, 使得修改后得到的 M个第二线性 预测系数所对应的第二短时谱包络与所述 M个第一线性预测系数所对应的 第一短时谱包络相比,共振峰能量得到增强并且中高频频谱分量得到一定程 度的补偿。  a synthesizing module, modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the M corresponding second linear prediction coefficients obtained by the modification correspond to Compared with the first short-time spectral envelope corresponding to the M first linear prediction coefficients, the two short-time spectral envelopes are enhanced in the formant energy and the medium-high frequency spectral components are compensated to some extent.
在本发明实施例的方法中, 提升因子包含了语音各频率间的相关性, 语 音短时谱包络的修改通过 M个第一线性预测系数的修改得到, 也包含了语 音的相关性,使得修改后的短时谱包络其共振峰能量得到增强且语音丟失的 中高频频谱分量得到了一定程度的补偿。 由共振峰能量对语音音质的决定作 用及语音中高频频谱成份对语音可懂度的贡献性, 经过本发明实施例方法的 处理后, 语音的质量和可懂度都得到了共同的提升。  In the method of the embodiment of the present invention, the lifting factor includes the correlation between the frequencies of the speech, and the modification of the speech short-term spectral envelope is obtained by modifying the M first linear prediction coefficients, and also includes the correlation of the speech, so that The modified short-time spectral envelope has its formant energy enhanced and the mid-high frequency spectral components of speech loss are compensated to some extent. The effect of the resonance energy on the speech quality and the contribution of the high-frequency spectral components in the speech to the speech intelligibility, after the processing of the method of the embodiment of the present invention, the quality and intelligibility of the speech are jointly improved.
根据本发明实施例的语音增强方法计算过程筒单、 鲁棒性好、 能同时提 高语音的可懂度和质量, 而且能恢复由于编码失真而丟失的高频成分, 特别 适合改善不同网关融合互通所引起的通信语音质量下降的情况。 附图说明  The speech enhancement method according to the embodiment of the present invention has a simple calculation process, good robustness, can simultaneously improve the intelligibility and quality of speech, and can recover high frequency components lost due to coding distortion, and is particularly suitable for improving convergence and intercommunication of different gateways. The resulting deterioration in communication voice quality. DRAWINGS
为了更清楚地说明本发明实施例的技术方案, 下面将对实施例或现有技 术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图 仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造 性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.
图 1是本发明实施例的方法的流程图;  1 is a flow chart of a method of an embodiment of the present invention;
图 2是采用现有技术的级联方案以及采用本发明实施例的语音增强方法 处理过的浊音帧的 LPC谱; 2 is a cascading scheme using the prior art and a voice enhancement method using the embodiment of the present invention The LPC spectrum of the processed voiced frame;
图 3是浊音帧经过级联方案和本发明实施例的语音增强方法处理后在频 域的比较, 其中图 3 ( a )是原始语音, 图 3 ( b )是原始语音经过级联方案 处理后的频率分布, 图 3 ( c )是级联后的语音经过本发明实施例的语音增强 方法处理后的频率分布;  3 is a comparison of the voiced frames in the frequency domain after the cascading scheme and the voice enhancement method of the embodiment of the present invention, wherein FIG. 3(a) is the original voice, and FIG. 3(b) is the original voice processed by the cascading scheme. Figure 3 (c) is a frequency distribution after the cascaded speech is processed by the speech enhancement method of the embodiment of the present invention;
图 4是原始语音、级联处理后的语音以及根据本发明实施例的方法处理 后的语音的 DRT得分;  4 is a DRT score of the original speech, the concatenated processed speech, and the speech processed according to the method of the embodiment of the present invention;
图 5是本发明实施例的设备的示意结构图;  FIG. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention; FIG.
图 6是本发明实施例的设备的示意结构图; 和  Figure 6 is a schematic structural diagram of an apparatus according to an embodiment of the present invention; and
图 7是用来实现本发明实施例的设备的示意硬件结构图。 具体实施方式  FIG. 7 is a schematic hardware structural diagram of a device for implementing an embodiment of the present invention. detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创 造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without making creative labor are within the scope of the present invention.
本发明的技术方案, 可以应用于各种通信系统, 例如: GSM, 码分多址 ( CDMA, Code Division Multiple Access ) 系统, 宽带码分多址( WCDMA, Wideband Code Division Multiple Access Wireless ) , 通用分组无线业务 ( GPRS , General Packet Radio Service ) , 长期演进 ( LTE , Long Term Evolution )等。  The technical solution of the present invention can be applied to various communication systems, such as: GSM, Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), general packet Wireless Service (GPRS, General Packet Radio Service), Long Term Evolution (LTE), etc.
图 1是本发明实施例的语音传输的增强方法 100的流程图。如图 1所示, 方法 100包括:  1 is a flow chart of a method 100 for enhancing voice transmission in accordance with an embodiment of the present invention. As shown in FIG. 1, the method 100 includes:
110: 获取浊音帧信号的 M个第一线性预测系数,其中 M是线性预测滤 波器的阶数;  110: Obtain M first linear prediction coefficients of the voiced frame signal, where M is an order of the linear prediction filter;
120: 获取提升因子, 其中, 所述提升因子根据所述 M个第一线性预测 系数对应的短时谱包络中频率之间的相关性得到;  120: Acquire a lifting factor, where the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients;
130: 根据所述提升因子以及所述 M个第一线性预测系数之间的相关性 修改所述 M个第一线性预测系数, 使得修改后得到的 M个第二线性预测系 数所对应的第二短时谱包络与所述 M个第一线性预测系数所对应的第一短 时谱包络相比,共振峰能量得到增强并且中高频频谱分量得到一定程度的补 110中, 可以设获取的浊音帧为 则语音传输的传递函数可以表 130: modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the modified second M prediction coefficients correspond to a second The short-time spectral envelope is enhanced compared with the first short-time spectral envelope corresponding to the M first linear prediction coefficients, and the mid-high frequency spectral components are supplemented to some extent. In 110, the acquired voiced frame can be set as the transfer function of the voice transmission.
Figure imgf000006_0001
Figure imgf000006_0001
其中 M为线性预测滤波器的阶数, 为第一线性预测系数。  Where M is the order of the linear prediction filter and is the first linear prediction coefficient.
下面具体描述在 120中, 根据 M个第一线性预测系数 对应的短时谱 包络中频率之间的相关性获取提升因子。  Specifically, in 120, the boosting factor is obtained based on the correlation between the frequencies in the short-term spectral envelope corresponding to the M first linear predictive coefficients.
第一线性预测系数 是根据下式计算的:  The first linear prediction coefficient is calculated according to the following formula:
i?„(j)-∑«A(i- = 0 \<j<M (2) 其中 R„( )为浊音帧 在 j时刻的自相关函数, 即  i?„(j)-∑«A(i- = 0 \<j<M (2) where R„( ) is the autocorrelation function of the voiced frame at j, ie
R„U) =∑^)s(n-j) (3) 根据本发明实施例, 可以采用莱文森 -杜宾( Levinson-Durbin )递推算法 来求解(2) 式, 其递推过程如下:  R „U) =∑^)s(nj) (3) According to an embodiment of the present invention, the Levinson-Durbin recursive algorithm can be used to solve the equation (2), and the recursion process is as follows:
a. 计算 的自相关函数 R„ ( , 7 = 0,1... ;  a. Calculated autocorrelation function R„ ( , 7 = 0,1... ;
b. 令 E(Q = Rn(0); b. Let E (Q = R n (0);
C. 递推过程从 Ϊ· = 1开始;  C. The recursion process begins with Ϊ· = 1;
d. 按照以下 (4) - (6) 式进行递推运算:  d. Perform a recursive operation according to the following formulas (4) - (6):
Rn(i)-∑a;(-r)Rn(i-j) R n (i)-∑a; ( - r) R n (ij)
Figure imgf000006_0002
Figure imgf000006_0002
e. i = i + l, 如果 >M , 则算法结束, 否则返回 (d) 步骤, 重新进行递 在以上(4) - (6) 式中, α)'·)表示第 i阶线性预测滤波器的第 j个预测 系数, E«为第 i阶线性预测滤波器的预测残差能量, 经过递推后可以得到第 i=l、 2、 〜M各阶预测器的解。 其最终解为: e. i = i + l, if >M, the algorithm ends, otherwise returns to step (d), re-transfer in the above (4) - (6), α) '·) represents the i-th linear prediction filter The jth prediction coefficient of the device, E« is the predicted residual energy of the i-th linear predictive filter, and after recursion, the solutions of the predictors of the i=l, 2, and M stages can be obtained. The final solution is:
j =aj (M) j = \,2,...,M (7) 若令 z = ( , 则可以得到浊音帧信号的发生模型的频率特性, 即语音发 生模型的线性系统的频率响应可以描述为: G G j = a j (M) j = \,2,...,M (7) If z = ( , then the frequency characteristic of the model of the voiced frame signal is obtained, that is, the frequency response of the linear system of the speech generation model Can be described as: GG
H(eia) (8) H(e ia ) (8)
A(eia) A(e ia )
i=l  i=l
根据功率谱的定义, 可以定义语音帧的短时谱包络为:  According to the definition of the power spectrum, the short-term spectral envelope of the speech frame can be defined as:
H(e"") (9) H(e"") (9)
A(elco) A(e lco )
下面具体描述步骤 130,即根据所述提升因子以及所述 Μ个第一线性预 测系数之间的相关性修改所述 Μ个第一线性预测系数, 使得修改后得到的 Μ个第二线性预测系数所对应的第二短时谱包络与所述 Μ个第一线性预测 系数所对应的第一短时谱包络相比,共振峰能量得到增强并且中高频频谱分 量得到一定程度的补偿。  Step 130 is specifically described below, that is, the first linear prediction coefficients are modified according to the correlation between the lifting factor and the first linear prediction coefficients, so that the second linear prediction coefficients obtained after the modification are modified. The corresponding second short-term spectral envelope is enhanced compared with the first short-time spectral envelope corresponding to the first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
首先对输入的语音帧信号 的第一线性预测系数 按照下式进行归一 化: First, the first linear prediction coefficient of the input speech frame signal is normalized as follows:
(^— ((^^( ^/?^ ?^, 二丄,?… ( 10) 再利用正弦模型对其进行处理:  (^—((^^( ^/?^ ?^, 二丄,?... ( 10) Reprocess it with a sinusoidal model:
在 0«;≥0时 flagi = ( 11-1 )At 0« ; ≥ 0, flagi = ( 11-1 )
Figure imgf000007_0001
Figure imgf000007_0001
在 cc; <0时 At cc ; <0
-1; xt < π -1; x t < π
flagi = 1; xi > π ( 11-2) Flagi = 1; x i > π ( 11-2)
0; xi = π 0; x i = π
则, 提升因子 f由下式给出  Then, the lifting factor f is given by
(∑(Αα-μ)) (∑(Αα -μ))
f (12)  f (12)
M  M
其中 为所述第一线性预测系数 A的均值, M 为线性预测滤波器的阶 数 c Where is the mean of the first linear prediction coefficient A, and M is the order of the linear prediction filter c
需要说明的是, 利用归一化的第一线性预测系数以及浊音帧的正弦模型 来获取提升因子, 仅仅是一种示例, 本领域技术人员可以根据具体情况选择 其他的方法来获取提升因子。  It should be noted that using the normalized first linear prediction coefficient and the sinusoidal model of the voiced frame to obtain the lifting factor is merely an example, and those skilled in the art may select other methods to obtain the lifting factor according to specific situations.
然后利用 (13)式修改上述线性预测系数" 以获得第二线性预测系数 β,:  Then, the above linear prediction coefficient is modified by the equation (13) to obtain the second linear prediction coefficient β,
= ι"··,'·_ι (13) 将修改后获得的第二线性预测系数 A替换(9 ) 式中的第一线性预测系 数 则传递函数可以写为:
Figure imgf000008_0001
= ι"··,'·_ι (13) Substituting the second linear prediction coefficient A obtained after the modification (9) The first linear prediction coefficient in the equation, the transfer function can be written as:
Figure imgf000008_0001
其中 表示经过本发明实施例的语音增强方法增强后输出的语音帧, 则  Wherein the speech frame outputted by the speech enhancement method of the embodiment of the present invention is represented,
y(n) = ^i x y(n - i)) + s(n) ( 15 ) 根据本发明实施例, 可以利用(15 )式对浊音帧信号 进行线性滤波, 由此得到可懂度提升的语音帧信号。 y(n) = ^ i xy(n - i)) + s(n) (15) According to an embodiment of the present invention, the voiced frame signal can be linearly filtered by using equation (15), thereby obtaining an intelligibility improvement. Voice frame signal.
需要注意的是, 以上根据提升因子以及第一线性预测系数的相关性而根 据式( 13 )修改第一线性预测系数仅仅是一种示例, 本领域技术人员可以根 据需要选择适当的方法来修改第一线性预测系数, 只要能实现共振峰能量得 到增强并且中高频频谱分量得到一定程度的补偿的技术效果即可。  It should be noted that the above modification of the first linear prediction coefficient according to the formula (13) according to the correlation between the lifting factor and the first linear prediction coefficient is merely an example, and those skilled in the art may select an appropriate method to modify the A linear prediction coefficient can be achieved as long as the resonance energy of the formant is enhanced and the medium and high frequency spectral components are compensated to a certain extent.
根据本发明实施例, 考虑到语音帧的共振峰仅在浊音帧中出现, 因此在 步骤 110之前, 本发明实施例的方法可以包括判断语音帧是否为浊音帧的过 程, 仅在语音帧为浊音帧的情况下, 才根据本发明实施例的方法对语音帧进 行处理, 而在语音帧为清音帧的情况下, 直接输出, 以节省处理资源, 提高 处理效率。  According to the embodiment of the present invention, in consideration of the fact that the formant of the speech frame is only present in the voiced frame, the method of the embodiment of the present invention may include the process of determining whether the voice frame is a voiced frame, and only the voice frame is voiced. In the case of a frame, the voice frame is processed according to the method of the embodiment of the present invention, and when the voice frame is an unvoiced frame, the output is directly output, thereby saving processing resources and improving processing efficiency.
根据本发明实施例, 在步骤 110之前, 可以对语音帧信号进行预加重, 例如根据式(16 )进行预加重:  According to an embodiment of the present invention, before step 110, the speech frame signal may be pre-emphasized, for example, pre-emphasized according to equation (16):
H(z) =l- 0.95z-1 ( 16 ) 在此情况下, 对输入的语音帧进行可懂度提升之后, 还要进行相反的处 理, 以消除预加重的影响。 H(z) = l- 0.95z- 1 ( 16 ) In this case, after the intelligibility of the input speech frame is improved, the opposite process is performed to eliminate the influence of the pre-emphasis.
根据本发明实施例的方法, 在一种具体的应用中, 本发明实施例的语音 增强方法的效果可以从图 2至图 4看出。  According to the method of the embodiment of the present invention, in a specific application, the effect of the speech enhancement method of the embodiment of the present invention can be seen from FIG. 2 to FIG.
图 2是采用现有技术的级联方案以及采用本发明实施例的语音增强方法 处理过的浊音帧的 LPC谱。 从图 2可以看出, 经过本发明的语音增强方法 处理的浊音帧的 LPC谱得到普遍增强, 不仅仅包括共振峰能量的增强。  2 is an LPC spectrum of a voiced frame processed using the prior art cascade scheme and the voice enhancement method of the embodiment of the present invention. As can be seen from Fig. 2, the LPC spectrum of the voiced frames processed by the speech enhancement method of the present invention is generally enhanced, including not only the enhancement of the formant energy.
图 3是浊音帧经过级联方案和本发明实施例的语音增强方法处理后在频 域的比较, 其中图 3 ( a )是原始语音, 图 3 ( b )是原始语音经过级联方案 处理后的频率分布, 图 3 ( c )是级联后的语音经过本发明实施例的语音增强 方法处理后的频率分布。 从图 3 ( b )与 3 ( c ) 的比较可以看出, 经过本发 明实施例的语音增强方法处理之后,原始语音中的中高频分量得到明显的补 偿。 3 is a comparison of the voiced frames in the frequency domain after the cascading scheme and the voice enhancement method of the embodiment of the present invention, wherein FIG. 3(a) is the original voice, and FIG. 3(b) is the original voice processed by the cascading scheme. Frequency distribution, FIG. 3(c) is a speech enhancement after cascading speech through an embodiment of the present invention The frequency distribution after the method is processed. It can be seen from the comparison of Figs. 3(b) and 3(c) that after the speech enhancement method of the embodiment of the present invention, the medium and high frequency components in the original speech are significantly compensated.
图 4是原始语音、级联处理后的语音以及根据本发明实施例的方法处理 后的语音的 DRT得分。 在图 4中, 0表示原始语音, I表示经过一次级联处 理后的语音; II表示经过二次级联处理后的语音帧; III表示经过三次级联处 理后的语音帧, ell表示根据本发明实施例的方法对二次级联后的语音帧进 行处理, elll表示根据本发明实施例的方法对三次级联后的语音帧进行处理。 比较 III和 elll, 可以看出, 经过本发明实施例的方法处理后, DRT最高可 提升 6.26%。  4 is a DRT score of the original speech, the concatenated processed speech, and the speech processed according to the method of the embodiment of the present invention. In FIG. 4, 0 denotes original speech, I denotes speech after one cascade processing; II denotes a speech frame after secondary concatenation processing; III denotes a speech frame after three sub-continuous processing, and ell denotes according to the present The method of the embodiment of the invention processes the secondary concatenated speech frame, and elll represents the method for processing the three sub-linked speech frames according to the method of the embodiment of the invention. Comparing III and elll, it can be seen that DRT can be increased by up to 6.26% after being processed by the method of the embodiment of the present invention.
在本发明实施例的方法中, 提升因子包含了语音各频率间的相关性, 语 音短时谱包络的修改通过 M个第一线性预测系数的修改得到, 也包含了语 音的相关性,使得修改后的短时谱包络其共振峰能量得到增强且语音丟失的 中高频频谱分量得到了一定程度的补偿。 由共振峰能量对语音音质的决定作 用及语音中高频频谱成份对语音可懂度的贡献性, 经过本发明实施例方法的 处理后, 语音的质量和可懂度都得到了共同的提升。  In the method of the embodiment of the present invention, the lifting factor includes the correlation between the frequencies of the speech, and the modification of the speech short-term spectral envelope is obtained by modifying the M first linear prediction coefficients, and also includes the correlation of the speech, so that The modified short-time spectral envelope has its formant energy enhanced and the mid-high frequency spectral components of speech loss are compensated to some extent. The effect of the resonance energy on the speech quality and the contribution of the high-frequency spectral components in the speech to the speech intelligibility, after the processing of the method of the embodiment of the present invention, the quality and intelligibility of the speech are jointly improved.
另外, 根据本发明实施例的方法, 计算过程筒单、 鲁棒性好。 由于利用 了语音各个频率之间的相关性,故能很好地解决现有技术在处理畸变共振峰 增强或是共振峰信息丟失的不足,且能很好的恢复由于不同网络融合而丟失 的高频成份。  In addition, according to the method of the embodiment of the invention, the calculation process is simple and robust. Since the correlation between the respective frequencies of the speech is utilized, the prior art can solve the problem of processing the distortion formant enhancement or the resonance peak information loss, and can well recover the high loss due to different network fusion. Frequency component.
图 5是本发明实施例的语音增强设备 200的示意结构图。语音增强设备 可以用来实施本发明实施例的方法。 如图 2所示, 语音增强设备 200包括: 获取模块 210,用于获取浊音帧信号的 M个第一线性预测系数,其中 M 是线性预测滤波器的 P介数;  FIG. 5 is a schematic structural diagram of a voice enhancement device 200 according to an embodiment of the present invention. The speech enhancement device can be used to implement the methods of embodiments of the present invention. As shown in FIG. 2, the voice enhancement device 200 includes: an acquisition module 210, configured to acquire M first linear prediction coefficients of a voiced frame signal, where M is a P-means of the linear prediction filter;
处理模块 220, 用于获取提升因子, 其中, 所述提升因子根据所述 M个 第一线性预测系数对应的短时谱包络中频率之间的相关性得到;  The processing module 220 is configured to obtain a lifting factor, where the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients;
合成模块 230,根据所述提升因子以及所述 M个第一线性预测系数之间 的相关性修改所述 M个第一线性预测系数, 使得修改后得到的 M个第二线 性预测系数所对应的第二短时谱包络与所述 M个第一线性预测系数所对应 的第一短时谱包络相比,共振峰能量得到增强并且中高频频谱分量得到一定 程度的补偿。 根据本发明实施例,所述获取模块 210用于根据所述浊音帧的自相关函 数, 利用莱文森 -杜宾递推算法来计算所述第一线性预测系数。 The synthesizing module 230 is configured to modify the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the M second linear prediction coefficients obtained after the modification correspond to The second short-term spectral envelope is enhanced compared to the first short-time spectral envelope corresponding to the M first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent. According to an embodiment of the present invention, the obtaining module 210 is configured to calculate the first linear prediction coefficient by using a Levinson-Dubin recursive algorithm according to an autocorrelation function of the voiced frame.
根据本发明实施例, 所述处理模块用于根据上述式 (10 ) - ( 12 )计算 所述提升因子。  According to an embodiment of the invention, the processing module is configured to calculate the lifting factor according to the above formulas (10) - (12).
根据本发明实施例, 所述合成模块用于利用上述式( 13 )修改所述第一 线性预测系数, 以获得所述第二线性预测系数。  According to an embodiment of the present invention, the synthesizing module is configured to modify the first linear prediction coefficient by using the above formula (13) to obtain the second linear prediction coefficient.
如图 6所示,根据本发明实施例,语音增强设备 200还包括滤波模块 240, 用于根据所述第二线性预测系数对所述浊音帧信号进行线性滤波。  As shown in FIG. 6, the speech enhancement apparatus 200 further includes a filtering module 240 for linearly filtering the voiced frame signal according to the second linear prediction coefficient, according to an embodiment of the present invention.
如图 6所示, 根据本发明实施例, 语音增强设备 200还包括预加重模块 250, 用于在所述获取模块获取浊音帧信号的 M个第一线性预测系数之前, 利用上述式(16 )对所述浊音帧信号预加重。  As shown in FIG. 6 , the voice enhancement device 200 further includes a pre-emphasis module 250, configured to use the foregoing formula (16) before the acquiring module acquires M first linear prediction coefficients of the voiced frame signal according to an embodiment of the present invention. The voiced frame signal is pre-emphasized.
根据本发明实施例, 所述获取模块可以用于判断语音帧是否为浊音帧, 仅在语音帧为浊音帧的情况下, 才根据本发明实施例的方法对语音帧进行处 理, 而在语音帧为清音帧的情况下, 直接输出, 以节省处理资源, 提高处理 效率。  According to the embodiment of the present invention, the acquiring module may be configured to determine whether the voice frame is a voiced frame, and only if the voice frame is a voiced frame, the voice frame is processed according to the method of the embodiment of the present invention, and the voice frame is processed. In the case of unvoiced frames, direct output is used to save processing resources and improve processing efficiency.
本领与技术人员应该理解,根据本发明实施例的语音增强设备 200可以 利用各种硬件设备例如数字信号处理(Digital Signal Processing, 筒称 DSP ) 芯片来实现, 其中所述的获耳 ^莫块 210、 处理模块 220、 合成模块 230以及 滤波模块 240, 可以各自分别基于分开的硬件设备来实现, 也可以集成在一 个硬件设备中来实现。  It should be understood by those skilled in the art that the speech enhancement device 200 according to the embodiment of the present invention can be implemented by using various hardware devices, such as a digital signal processing (DSP) chip, wherein the obtained module 210 The processing module 220, the synthesizing module 230, and the filtering module 240 may each be implemented based on separate hardware devices, or may be integrated into one hardware device.
图 7是用来实现本发明实施例的语音增强设备 200的一种示意硬件结构 700。 如图 7所示, 该硬件结构 700包括 DSP芯片 710、 存储器 720和接口 单元 730。 DSP芯片 710可以用来实现本发明实施例的语音增强设备 200的 处理功能, 包括完成获取模块 210、 处理模块 220、 合成模块 230以及滤波 模块 240全部的处理功能。存储器 720可以用来存储待处理的浊音帧信号以 及处理过程的中间变量以及经过处理的浊音帧信号等。接口单元 730可以用 来与下级设备进行数据传输。  FIG. 7 is a schematic hardware architecture 700 of a speech enhancement device 200 for implementing an embodiment of the present invention. As shown in FIG. 7, the hardware structure 700 includes a DSP chip 710, a memory 720, and an interface unit 730. The DSP chip 710 can be used to implement the processing functions of the voice enhancement device 200 of the embodiment of the present invention, including the processing functions of the acquisition module 210, the processing module 220, the synthesis module 230, and the filtering module 240. The memory 720 can be used to store the voiced frame signals to be processed and intermediate variables of the processing and processed voiced frame signals and the like. The interface unit 730 can be used for data transmission with a subordinate device.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。 Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the technical solution. The described functionality is implemented, but such implementation should not be considered to be beyond the scope of the invention.
所属领域的技术人员可以清楚地了解到, 为描述的方便和筒洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。  It will be apparent to those skilled in the art that, for the convenience of the description and the cleaning process, the specific operation of the system, the device and the unit described above may be referred to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。  In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise. The components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。  In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM, Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。  The functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including A plurality of instructions are used to make a computer device (which may be a personal computer, a server, and the storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM, a random access memory). Memory ), a variety of media such as a disk or a disc that can store program code.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权利要求 Rights request
1. 一种语音增强方法, 其特征在于, 包括:  A voice enhancement method, comprising:
获取浊音帧信号的 M个第一线性预测系数, 其中 M是线性预测滤波器 的阶数;  Obtaining M first linear prediction coefficients of the voiced frame signal, where M is an order of the linear prediction filter;
获取提升因子, 其中, 所述提升因子根据所述 M个第一线性预测系数 对应的短时谱包络中频率之间的相关性得到;  Obtaining a lifting factor, wherein the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the M first linear prediction coefficients;
根据所述提升因子以及所述 M个第一线性预测系数之间的相关性修改 所述 M个第一线性预测系数, 使得修改后得到的 M个第二线性预测系数所 对应的第二短时谱包络与所述 M个第一线性预测系数所对应的第一短时谱 包络相比, 共振峰得到增强并且中高频频谱分量得到一定程度的补偿。  Modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, so that the second short time corresponding to the M second linear prediction coefficients obtained after the modification is performed The spectral envelope is enhanced compared to the first short-time spectral envelope corresponding to the M first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
2. 如权利要求 1所述的方法, 其特征在于,  2. The method of claim 1 wherein
所述获取浊音帧信号的 M个第一线性预测系数, 包括:  And obtaining the M first linear prediction coefficients of the voiced frame signal, including:
根据所述浊音帧的自相关函数, 利用莱文森 -杜宾递推算法来计算所述 第一线性预测系数。  The first linear prediction coefficient is calculated using a Levinson-Dubin recursive algorithm based on the autocorrelation function of the voiced frame.
3. 如权利要求 1所述的方法, 其特征在于,  3. The method of claim 1 wherein
所述获取提升因子, 包括: 根据以下公式计算提升因子:  The obtaining a lifting factor includes: calculating a lifting factor according to the following formula:
xi = |α;| - ((int)(|a;| / 2 r) χ 2π), i = 1,2... M 在 cc;≥0时
Figure imgf000012_0001
x i = |α ; | - ((int)(|a ; | / 2 r) χ 2π), i = 1,2... M at cc ; ≥0
Figure imgf000012_0001
- 1; xi < π - 1; x i < π
flagi 1; xi > π Flagi 1; x i > π
π
Figure imgf000012_0002
π
Figure imgf000012_0002
其中, 为所述第一线性预测系数, ^为归一化的第一线性预测系数, ^^为正弦模型取值, 为0 ^的均值, M为线性预测的阶数, /为所述提升 因子。 Wherein, for the first linear prediction coefficient, ^ is a normalized first linear prediction coefficient, ^^ is a sinusoidal model value, is a mean of 0 ^, M is an order of linear prediction, / is the elevation factor.
4. 如权利要求 1至 3任一项所述的方法, 其特征在于, 所述根据所述提升因子以及所述 M个第一线性预测系数之间的相关性 修改所述 M个第一线性预测系数, 包括: 4. A method according to any one of claims 1 to 3, characterized in that Modifying the M first linear prediction coefficients according to the correlation between the lifting factor and the M first linear prediction coefficients, including:
利用以下公式修改所述第一线性预测系数, 以获得所述第二线性预测系 数:  The first linear prediction coefficient is modified using the following formula to obtain the second linear prediction coefficient:
j = l,..., i - l 其中, i为 M阶线性预测滤波器中第 i阶系数; 为第一线性预测系数, 表示第 i阶线性预测滤波器的第 j个线性预测系数; 为提升因子; 为第 二线性预测系数, 表示第 i阶线性预测滤波器的第 j个线性预测系数。  j = l,..., i - l where i is the i-th order coefficient in the M-order linear prediction filter; is the first linear prediction coefficient, representing the j-th linear prediction coefficient of the i-th linear prediction filter; The lifting factor; is the second linear prediction coefficient, representing the jth linear prediction coefficient of the i-th linear prediction filter.
5. 如权利要求 1至 4任一项所述的方法, 其特征在于, 所述方法还包 括:  The method according to any one of claims 1 to 4, wherein the method further comprises:
根据所述第二线性预测系数对所述浊音帧进行线性滤波。  The voiced frame is linearly filtered according to the second linear prediction coefficient.
6. 如权利要求 1至 5任一项所述的方法, 其特征在于,  The method according to any one of claims 1 to 5, characterized in that
在所述获取浊音帧信号的 M个第一线性预测系数之前, 所述方法还包 括:  Before the obtaining the M first linear prediction coefficients of the voiced frame signal, the method further includes:
利用下式对所述浊音帧信号预加重:  The voiced frame signal is pre-emphasized using the following equation:
H ( = 1 - 0·95ζ— 1 。 。 H ( = 1 - 0·95ζ- 1) .
7. 一种语音增强设备, 其特征在于, 所述设备包括:  A voice enhancement device, the device comprising:
获耳 ^莫块, 用于获取浊音帧信号的 Μ个第一线性预测系数, 其中 Μ是 线性预测滤波器的阶数;  Obtaining a first linear prediction coefficient for obtaining a voiced frame signal, where Μ is the order of the linear prediction filter;
处理模块, 用于获取提升因子, 其中, 所述提升因子根据所述 Μ个第 一线性预测系数对应的短时谱包络中频率之间的相关性得到;  a processing module, configured to obtain a lifting factor, where the lifting factor is obtained according to a correlation between frequencies in a short-term spectral envelope corresponding to the first linear prediction coefficients;
合成模块, 根据所述提升因子以及所述 Μ个第一线性预测系数之间的 相关性修改所述 Μ个第一线性预测系数, 使得修改后得到的 Μ个第二线性 预测系数所对应的第二短时谱包络与所述 Μ个第一线性预测系数所对应的 第一短时谱包络相比,共振峰能量得到增强并且中高频频谱分量得到一定程 度的补偿。  a synthesizing module, modifying the first linear prediction coefficients according to the correlation between the lifting factor and the first linear prediction coefficients, so that the second linear prediction coefficients obtained after the modification correspond to The two short-time spectral envelopes are enhanced in the formant energy compared to the first short-time spectral envelope corresponding to the first linear prediction coefficients, and the mid-high frequency spectral components are compensated to some extent.
8. 如权利要求 7所述的设备, 其特征在于,  8. Apparatus according to claim 7 wherein:
所述获取模块用于根据所述浊音帧的自相关函数, 利用莱文森-杜宾递 推算法来计算所述第一线性预 'J系数。  The obtaining module is configured to calculate the first linear pre-J coefficient by using a Levinson-Dubin recursive algorithm according to an autocorrelation function of the voiced frame.
9. 如权利要求 7所述的设备, 其特征在于,  9. Apparatus according to claim 7 wherein:
所述处理模块用于根据以下公式计算所述提升因子: xi = |α;|-((ίηΐ)(|α;|/2π-)χ2π-), a ≥0时 The processing module is configured to calculate the lifting factor according to the following formula: x i = |α ; |-((ίηΐ)(|α ; |/2π-)χ2π-), a ≥0
-1; xt > π -1; x t > π
flagi 1; xi < π Flagi 1; x i < π
0; = π  0; = π
Figure imgf000014_0001
Figure imgf000014_0001
∑(βα§ί-μ)) ∑( βα §ί -μ))
f  f
M  M
其中, 为所述第一线性预测系数, 为归一化的第一线性预测系数, ^^为正弦模型取值, 为0 ^的均值, M为线性预测的阶数, /为所述提升 因子。 Wherein, the first linear prediction coefficient is a normalized first linear prediction coefficient, ^^ is a sinusoidal model value, is a mean of 0 ^, M is an order of linear prediction, and / is the lifting factor .
10. 如权利要求 7至 9任一项所述的设备, 其特征在于,  10. Apparatus according to any one of claims 7 to 9, wherein
所述合成模块利用以下公式修改所述第一线性预测系数, 以获得所述第 二线性预测系数:  The synthesis module modifies the first linear prediction coefficient using a formula to obtain the second linear prediction coefficient:
= i,…, 1 (5)  = i,..., 1 (5)
其中, i为 M阶线性预测滤波器中第 i阶系数; 为第一线性预测系数, 表示第 i阶线性预测滤波器的第 j个线性预测系数; /为提升因子; 为第 二线性预测系数, 表示第 i阶线性预测滤波器的第 j个线性预测系数。  Where i is the i-th order coefficient of the M-order linear prediction filter; is the first linear prediction coefficient, representing the j-th linear prediction coefficient of the i-th linear prediction filter; / is a lifting factor; is the second linear prediction coefficient , represents the jth linear prediction coefficient of the i-th linear prediction filter.
11. 如权利要求 7至 10任一项所述的设备, 其特征在于, 所述设备还 包括:  The device according to any one of claims 7 to 10, wherein the device further comprises:
滤波模块,用于根据所述第二线性预测系数对所述浊音帧信号进行线性 滤波。  And a filtering module, configured to linearly filter the voiced frame signal according to the second linear prediction coefficient.
12. 如权利要求 7至 10任一项所述的设备, 其特征在于, 所述设备还 包括:  The device according to any one of claims 7 to 10, wherein the device further comprises:
预加重模块, 用于在所述获取模块获取浊音帧信号的 M个第一线性预 测系数之前, 利用下式对所述浊音帧信号预加重:  And a pre-emphasis module, configured to pre-emphasize the voiced frame signal by using the following formula before the obtaining module acquires the M first linear prediction coefficients of the voiced frame signal:
H(z)=l-0.95z-1 H(z)=l-0.95z- 1
PCT/CN2011/078087 2011-08-05 2011-08-05 Voice enhancement method and device WO2012159370A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201180001446.0A CN103038825B (en) 2011-08-05 2011-08-05 Voice enhancement method and device
PCT/CN2011/078087 WO2012159370A1 (en) 2011-08-05 2011-08-05 Voice enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/078087 WO2012159370A1 (en) 2011-08-05 2011-08-05 Voice enhancement method and device

Publications (1)

Publication Number Publication Date
WO2012159370A1 true WO2012159370A1 (en) 2012-11-29

Family

ID=47216591

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/078087 WO2012159370A1 (en) 2011-08-05 2011-08-05 Voice enhancement method and device

Country Status (2)

Country Link
CN (1) CN103038825B (en)
WO (1) WO2012159370A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI555010B (en) * 2013-12-16 2016-10-21 三星電子股份有限公司 Audio encoding method and apparatus, audio decoding method,and non-transitory computer-readable recoding medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3062945B1 (en) * 2017-02-13 2019-04-05 Centre National De La Recherche Scientifique METHOD AND APPARATUS FOR DYNAMICALLY CHANGING THE VOICE STAMP BY FREQUENCY SHIFTING THE FORMS OF A SPECTRAL ENVELOPE
CN106856623B (en) * 2017-02-20 2020-02-11 鲁睿 Baseband voice signal communication noise suppression method and system
CN110797039B (en) * 2019-08-15 2023-10-24 腾讯科技(深圳)有限公司 Voice processing method, device, terminal and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619646A (en) * 2003-11-21 2005-05-25 三星电子株式会社 Method of and apparatus for enhancing dialog using formants
US20100063808A1 (en) * 2008-09-06 2010-03-11 Yang Gao Spectral Envelope Coding of Energy Attack Signal
CN102044250A (en) * 2009-10-23 2011-05-04 华为技术有限公司 Band spreading method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619646A (en) * 2003-11-21 2005-05-25 三星电子株式会社 Method of and apparatus for enhancing dialog using formants
US20100063808A1 (en) * 2008-09-06 2010-03-11 Yang Gao Spectral Envelope Coding of Energy Attack Signal
CN102044250A (en) * 2009-10-23 2011-05-04 华为技术有限公司 Band spreading method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI555010B (en) * 2013-12-16 2016-10-21 三星電子股份有限公司 Audio encoding method and apparatus, audio decoding method,and non-transitory computer-readable recoding medium

Also Published As

Publication number Publication date
CN103038825A (en) 2013-04-10
CN103038825B (en) 2014-04-30

Similar Documents

Publication Publication Date Title
Li et al. Glance and gaze: A collaborative learning framework for single-channel speech enhancement
JP6374028B2 (en) Voice profile management and speech signal generation
US9218820B2 (en) Audio fingerprint differences for end-to-end quality of experience measurement
US11605394B2 (en) Speech signal cascade processing method, terminal, and computer-readable storage medium
CN1215459C (en) Bandwidth extension of acoustic signals
JP2021015301A (en) Device and method for reducing quantization noise in a time-domain decoder
WO2021052287A1 (en) Frequency band extension method, apparatus, electronic device and computer-readable storage medium
WO2021147237A1 (en) Voice signal processing method and apparatus, and electronic device and storage medium
WO2010066158A1 (en) Methods and apparatuses for encoding signal and decoding signal and system for encoding and decoding
WO2013060223A1 (en) Frame loss compensation method and apparatus for voice frame signal
US20100106269A1 (en) Method and apparatus for signal processing using transform-domain log-companding
CN1969319A (en) Signal encoding
WO2021052285A1 (en) Frequency band expansion method and apparatus, electronic device, and computer readable storage medium
KR101924767B1 (en) Voice frequency code stream decoding method and device
JP4679513B2 (en) Hierarchical coding apparatus and hierarchical coding method
WO2008110870A2 (en) Speech coding system and method
JP2008519990A (en) Signal coding method
JP2010078915A (en) Audio decoding method, apparatus, and program
WO2012159370A1 (en) Voice enhancement method and device
WO2023197809A1 (en) High-frequency audio signal encoding and decoding method and related apparatuses
WO2013066244A1 (en) Bandwidth extension of audio signals
US20110137644A1 (en) Decoding speech signals
JP6573887B2 (en) Audio signal encoding method, decoding method and apparatus
CN113035207A (en) Audio processing method and device
WO2010103854A2 (en) Speech encoding device, speech decoding device, speech encoding method, and speech decoding method

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001446.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11866050

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11866050

Country of ref document: EP

Kind code of ref document: A1