WO2014161334A1 - 一种语音通话方法及装置 - Google Patents

一种语音通话方法及装置 Download PDF

Info

Publication number
WO2014161334A1
WO2014161334A1 PCT/CN2013/087986 CN2013087986W WO2014161334A1 WO 2014161334 A1 WO2014161334 A1 WO 2014161334A1 CN 2013087986 W CN2013087986 W CN 2013087986W WO 2014161334 A1 WO2014161334 A1 WO 2014161334A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
denoised
voice
frequency domain
original
Prior art date
Application number
PCT/CN2013/087986
Other languages
English (en)
French (fr)
Inventor
康健超
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2014161334A1 publication Critical patent/WO2014161334A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the present invention relates to speech recognition technology in the field of mobile communication, and more particularly to a voice call method and apparatus in a situation where a surrounding environment does not allow a user to speak loudly. Background technique
  • mobile terminals such as mobile phones have become indispensable communication devices in people's daily lives.
  • the most important function is to make calls, and people use voice calls to enhance and connect with each other.
  • users often get affected by the surrounding environment during a call. In some environments, they cannot speak loudly after receiving a call. They can only express their meaning through a small voice, such as watching movies, meetings, etc. It may not be possible to hear the user's voice and influence the communication between the two parties.
  • the embodiment of the present invention provides a voice call method and apparatus, which can perform a clear call even in a situation where the surrounding environment does not allow the user to speak loudly.
  • An embodiment of the present invention provides a voice call method, where the method includes: receiving a call voice (0, and denoising the voice (0) to obtain a denoised voice Q (0; after determining the denoising The speech Q (the amplitude mean of 0 is smaller than the stored original speech (the amplitude average of 0), and the denoised speech Q (0 is enhanced and output.
  • the method further comprises: storing the original speech y(), and extracting an amplitude mean of the original speech.
  • performing denoising on the voice including: respectively performing a fast Fourier transform on the voice (o and the stored original voice to obtain a frequency domain signal o of the voice) and the original a frequency domain signal of the voice ⁇ »; determining a frequency domain signal of the noise in the voice according to the frequency domain signal of the voice and the frequency domain signal ⁇ » of the original voice; and the frequency domain signal of the voice and the The frequency domain signal of the noise is convoluted, and the frequency domain signal of the denoised voice is determined; and the frequency domain signal of the denoised voice is inverse fast Fourier transform to obtain the denoised voice Q ( 0.
  • the denoising speech is enhanced by: enhancing the denoised speech Q (0) according to an amplitude mean of the original speech, including: determining the denoised speech Q (the current amplitude mean of 0; determining the speech enhancement coefficient according to the original amplitude mean of the original speech 1 ⁇ ) and the current amplitude mean;; enhancing the denoised speech Q (0 according to the speech enhancement coefficient) .
  • the method further comprising: after determining the denoised speech Q (o greater than or equal to the mean amplitude of the amplitude of the original voice mean, after the denoising speech Q (o direct output.
  • the embodiment of the present invention further provides a voice communication device, where the device includes a receiving unit, a denoising unit, a processing unit, and an output unit, where the receiving unit is configured to receive the call voice, and the denoising unit is configured to Decoding the voice (0 to obtain the denoised voice Q (); the processing unit is configured to determine the amplitude of the denoised voice When the value is smaller than the amplitude mean value of the stored original voice, the denoised current voice Q (0 is enhanced; the output unit is configured to output the enhanced voice.
  • the device further includes: a storage unit and an extracting unit; wherein the storage unit is configured to store the original voice, and the extracting unit is configured to extract an average value of the original voice.
  • the denoising unit includes a first transform subunit, a first determining subunit, a second determining subunit, and a second transform subunit; wherein the first transform subunit is configured to respectively respectively pair the voice Performing a fast Fourier transform to obtain a frequency domain signal of the voice and a frequency domain signal of the original voice, the first determining subunit configured to be in accordance with a frequency domain of the voice And a frequency domain signal of the original voice, determining a frequency domain signal of the noise in the voice; the second determining subunit configured to roll the frequency domain signal of the voice and the frequency domain signal of the noise The frequency domain signal of the denoised speech is determined; the second transform subunit is configured to perform inverse fast Fourier transform on the frequency domain signal of the denoised speech to obtain the denoised speech. ( ).
  • the processing unit is configured to enhance the denoised voice according to an amplitude mean of the original voice, where the processing unit includes a third determining subunit, a fourth determining subunit, and an enhanced subunit,
  • the third determining subunit is configured to determine a current amplitude mean of the denoised speech;
  • the fourth determining subunit is configured to: according to an original amplitude mean value of the original speech and the current amplitude mean Determining a speech enhancement coefficient;
  • the enhancement subunit configured to enhance the denoised speech according to the speech enhancement coefficient ⁇ .
  • the processing unit is further configured to: when it is determined that an average value of the amplitude of the denoised speech is greater than or equal to an average value of the original speech, triggering the output unit; correspondingly, the output unit is further configured to The denoised speech is directly output.
  • the voice call method and device after receiving the call voice, first perform denoising on the voice (0 to obtain the denoised voice Q (0; and then determine the denoised voice Q (
  • the amplitude mean value of 0 is smaller than the amplitude mean value of the stored original voice, the denoised voice is enhanced and output; thus, the user can still obtain a better call effect when the voice is inconvenient to speak loudly, and at the same time It can also effectively remove the surrounding noise, and the receiving party will not be disturbed by the inaudible, and will not affect the people around.
  • FIG. 1 is a schematic flowchart of an implementation process of a voice call method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of an implementation process for enhancing the denoised voice according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a structure of a voice communication device according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a denoising unit in FIG. 3;
  • FIG. 5 is a schematic structural diagram of the reinforcing unit of FIG. 3. Detailed ways
  • the voice after receiving the call voice, the voice is denoised to obtain the denoised voice; and after determining that the amplitude of the denoised voice is smaller than the average amplitude of the stored original voice, The denoised speech is enhanced and output.
  • FIG. 1 is a schematic flowchart of a voice call method according to an embodiment of the present invention. As shown in FIG. 1 , the specific process of the voice call method is as follows:
  • Step 101 Store the original voice ( ), and extract the amplitude average of the original voice; specifically, the user may search for a quiet occasion without noise and noise, turn on the recording device, and input a sound of his normal speech as the original voice ⁇ ⁇ >.
  • the purpose of extracting the amplitude mean of the original voice t) is: when judging that the user is talking
  • the mean value of the speech in the process is smaller than the mean value of the original speech, in order to prevent the other party from hearing clearly, the mean value of the speech being spoken may be enhanced according to the amplitude mean of the original speech, so that the other party can clearly listen. To the user's speech.
  • the method for extracting the average value of the original voice may be implemented according to various prior art, and details are not described herein again.
  • Step 102 Receive a call voice (o, and perform denoising on the voice (o) to obtain a denoised voice Q (0;
  • the environment for receiving the call voice may be any occasion, especially when it is not convenient to speak loudly, for example, watching a movie, watching an opera, watching a drama, meeting, work, and the like.
  • the user cannot speak loudly after receiving an incoming call or making an outgoing call, and can only express the meaning through a small or low voice, so that the other party cannot hear the user's voice, thereby affecting The occasion of exchanges between the two sides.
  • the voice may be a voice received through a microphone or the like; the voice includes: a voice having a small volume of the user and a background noise far greater than a volume of the voice of the user.
  • the denoising the voice (o) includes:
  • Step A1 performing fast Fourier Transform (FFT) on the speech and the stored original speech, respectively, to obtain a frequency domain signal of the speech (» and a frequency domain signal of the original speech ( );
  • FFT fast Fourier Transform
  • Step A2 determining, according to the frequency domain signal of the voice and the frequency domain signal of the original voice, a frequency domain signal of the noise in the voice;
  • the frequency domain signal 'o' of the voice is subtracted from the frequency domain signal y'o of the original voice to obtain a frequency domain signal of the noise;
  • Step A3 convolving the frequency domain signal of the voice with the frequency domain signal of the noise to determine a frequency domain signal of the denoised voice;
  • Step A4 performing inverse fast Fourier transform on the frequency domain signal of the denominated speech (IFFT, Inverse Fast Fourier Transform), the denoised speech Q (t) is obtained.
  • IFFT Inverse Fast Fourier Transform
  • the denoising the voice further comprises: respectively performing the voice (0 and the original voice by adding Hamming ( Hanning) Window processing.
  • the denoising of the speech is to remove background noise that is much larger than the user's voice.
  • there are many methods for denoising speech and those skilled in the art can denoise speech according to various prior art techniques.
  • Step 103 Determine that an average amplitude of the denoised speech is smaller than a stored original voice.
  • the enhancement refers to amplifying the amplitude of the denoised speech to amplify the volume of the user's voice, so that in the case where loud speech is not allowed, both parties can make a normal and clear conversation.
  • the speech Q (0 after the denoising is enhanced: Mean voice Q (0 enhanced after denoising according to the amplitude of the original speech.
  • the voice call method of the embodiment of the present invention further includes: determining the denoised voice Q (the amplitude of the zero is greater than or equal to the amplitude average of the stored original voice, and the denoised voice Q (0 is directly Output.
  • FIG. 2 is a schematic diagram of an implementation process for enhancing the denoised voice according to an embodiment of the present invention. As shown in FIG. 2, the denoised voice Q (0 is enhanced according to the amplitude mean value of the original voice. Specifically, the following steps are included:
  • Step 201 Determine the denoised voice. Mean of the current magnitude
  • Step 202 Determine a speech enhancement coefficient according to an average amplitude of the original speech and an average value of the current amplitude.
  • the original amplitude of the stored original speech has an average amplitude of 11 ⁇ 011, assuming the current The amplitude average is II X 0 (t) II, and the II Y(t) II is divided by the II X 0 (t) II to obtain the speech enhancement coefficient n;
  • Step 203 Enhance the denoised speech according to the speech enhancement coefficient.
  • the denoised speech is multiplied by the speech enhancement coefficient to obtain voice data of a volume when the user is speaking normally; in an actual application process, the embodiment of the present invention further includes:
  • the normal speech obtained by the denoised speech is converted from a digital signal to an analog signal and output, and correspondingly, when receiving a speech-containing speech through a microphone or the like; T(t), the same should be said; t) is converted into a digital signal, where the analog signal to digital signal, and the digital signal to analog signal, can be implemented by various prior art by those skilled in the art, and details are not described herein.
  • a voice communication device includes a receiving unit 31, a denoising unit 32, a processing unit 33, and an output unit 34;
  • the receiving unit 31 is configured to receive a call voice
  • the denoising unit 32 is configured to denoise the voice to obtain a denoised voice ⁇ );
  • the processing unit 33 is configured to: when it is determined that the amplitude mean value of the denoised speech is smaller than the average value of the stored original speech, the denoised speech Q (0 is enhanced; the output unit 34, configured To output the enhanced voice.
  • the device further includes: a storage unit and an extracting unit; wherein the storage unit is configured to store the original voice, and the extracting unit is configured to extract an average value of the original voice.
  • the processing unit 33 is further configured to: when determining that the amplitude average of the denoised speech is greater than or equal to the amplitude mean of the original speech, triggering the output unit 34; Correspondingly, the output unit 34 is further configured to directly output the denoised speech.
  • the processing unit 33 enhances the denoised speech to: enhance the denoised speech Q (0 according to the amplitude mean of the original speech).
  • the denoising unit 32 further includes a first transform subunit 41, a first determining subunit 42, a second determining subunit 43, and a a second transform subunit 44; wherein
  • the first transform subunit 41 is configured to respectively respectively pair the voice and the stored original voice
  • Y(t) performs fast Fourier transform to obtain a frequency domain signal of the speech o) and a frequency domain signal of the original speech
  • the first determining subunit 42 is configured to determine a frequency domain signal of the noise in the voice according to the frequency domain signal of the voice and the frequency domain signal o) of the original voice;
  • the second determining subunit 43 is configured to convolute a frequency domain signal of the voice with a frequency domain signal of the noise to determine a frequency domain signal of the denoised voice;
  • the second transform subunit 44 is configured to perform inverse fast Fourier transform on the frequency domain signal of the denoised voice to obtain the denoised voice. ( ).
  • FIG. 5 is a schematic structural diagram of a processing unit of FIG. 3. As shown in FIG. 5, the processing unit 33 further includes a third determining subunit 51, a fourth determining subunit 52, and an enhanced subunit 53, wherein:
  • the third determining subunit 51 is configured to determine a current average value of the denoised speech
  • the fourth determining subunit 52 is configured to determine a speech enhancement coefficient n according to the original amplitude mean value of the original speech and the current amplitude mean value;
  • the enhancement subunit 53 is configured to enhance the denoised speech Q (0 according to the speech enhancement coefficient.
  • the embodiment of the present invention may also be correspondingly set as a call mode.
  • the call mode can be opened.
  • the processing procedure of the voice call method in the embodiment of the present invention can be executed. .
  • the voice recognition technology is used to identify the user's voice, filter the noise in the voice, and then amplify the output to the other party, so that the user can still get a better call effect when the user speaks in a small voice.
  • the surrounding noise can be effectively removed, no longer be bothered by the other party's inaudible, and it will not affect the people around.
  • the implementation functions of the processing units, subunits, and modules in the voice communication apparatus shown in Figs. 4 to 5 can be understood by referring to the related description of the voice call method. It should also be understood by those skilled in the art that the processing units, subunits, and modules in the voice communication device shown in FIG. 4 to FIG. 5 can be implemented by the processor of the mobile terminal, or can be implemented by a specific logic circuit.
  • the processor may be implemented by a central processing unit (CPU), a processor (MPU, a Micro Processor Unit), or a digital signal processor (DSP).
  • the voice is denoised first to obtain the denoised voice Q ( ); and then the amplitude of the denoised voice is determined to be smaller than the average amplitude of the stored original voice.
  • the denoised voice Q (o is enhanced and outputted; thus, the user can still get a better call effect when it is inconvenient to speak loudly, and at the same time, the surrounding noise can be effectively performed. The person who removes and answers will no longer be troubled by the inaudible, and will not affect the people around.

Abstract

本发明公开了一种语音通话方法及装置,所述方法包括:接收通话语音X(t),并对所述语音X(t)进行去噪,得到去噪后的语音X 0(t);确定所述去噪后的语音X 0(t)的幅度均值小于存储的原语音Y(t)的幅度均值时,对所述去噪后的语音X 0(t)进行增强后输出。

Description

一种语音通话方法及装置 技术领域
本发明涉及移动通信领域的语音识别技术, 尤其涉及了一种在周围环 境不允许用户大声讲话的场合中的语音通话方法及装置。 背景技术
随着移动通信技术的不断发展, 移动终端如手机等已经成为人们日常 生活中不可缺少的通信设备, 其最主要的作用就是进行通话, 人们通过通 话来增强和联络感情。 但是, 用户在通话时经常会受到周围环境的影响, 在某些环境中接到电话后不能大声讲话, 只能通过很小的声音来表达意思, 比如在看电影、 开会等场合, 这样, 对方可能无法听清楚用户的声音, 影 响双方交流。
目前, 一般的移动终端在通话时, 都只是通过麦克风将声音接收并传 输给对方, 但在不方便大声讲话的场合中接到电话的用户在接电话时只能 低头小声说, 同时还伴有其他声音, 如开会时演讲者的声音、 看电影时电 影屏幕的声音等, 这样, 如果直接将声音传输给对方就会使对方不好进行 辨认, 影响通话质量; 因此, 亟需一种语音通话方法来保证这种安静场合 的通话效果。 发明内容
有鉴于此, 本发明实施例为解决现有技术中存在的缺陷, 提供一种语 音通话方法及装置, 能够在周围环境不允许用户大声讲话的场合中也能进 行清晰地通话。
为达到上述目的, 本发明实施例的技术方案是这样实现的: 本发明实施例提供了一种语音通话方法, 所述方法包括: 接收通话语 音 (0, 并对所述语音 (0进行去噪, 得到去噪后的语音 Q(0; 确定所述 去噪后的语音 Q(0的幅度均值小于存储的原语音 (0的幅度均值时, 对所 述去噪后的语音 Q(0进行增强后输出。
优选地, 所述方法还包括: 存储原语音 y(), 并提取所述原语音 的 幅度均值。
优选地,所述对所述语音 (o进行去噪, 包括:分别对所述语音 (o和 存储的原语音 进行快速傅里叶变换, 得到所述语音的频域信号 o)和 所述原语音的频域信号 Ι »;根据所述语音的频域信号 和所述原语音 的频域信号 Ι », 确定所述语音中噪声的频域信号; 将所述语音的频域信 号 与所述噪声的频域信号进行卷积, 确定所述去噪后的语音的频域信 号; 对所述去燥后的语音的频域信号进行逆快速傅里叶变换, 得到去噪后 的语音 Q(0。
优选地,所述对所述去噪后的语音进行增强为:根据所述原语音 的 幅度均值对所述去噪后的语音 Q(0进行增强, 包括: 确定所述去噪后的语 音 Q(0的当前幅度均值; 根据所述原语音 1Ί )的原幅度均值和所述当前幅 度均值确定语音增强系数《; 根据所述语音增强系数《对所述去噪后的语音 Q(0进行增强。
优选地, 所述方法还包括: 确定所述去噪后的语音 Q(o的幅度均值大 于等于所述原语音 的幅度均值时,将所述去噪后的语音 Q(o直接输出。
本发明实施例还提供了一种语音通话装置, 所述装置包括接收单元、 去噪单元、 处理单元和输出单元; 其中, 所述接收单元, 配置为接收通话 语音 所述去噪单元, 配置为对所述语音 (0进行去噪, 得到去噪后 的语音 Q(); 所述处理单元, 配置为确定所述去噪后的语音 的幅度均 值小于存储的原语音 的幅度均值时, 对所述去噪后的当前语音 Q(0进 行增强; 所述输出单元, 配置为对增强后的语音进行输出。
优选地, 所述装置还包括: 存储单元和提取单元; 其中, 所述存储单 元, 配置为存储原语音 所述提取单元, 配置为提取所述原语音 的 幅度均值。
优选地, 所述去噪单元包括第一变换子单元、 第一确定子单元、 第二 确定子单元和第二变换子单元; 其中, 所述第一变换子单元, 配置为分别 对所述语音 (0和存储的原语音 ίΊ )进行快速傅里叶变换, 得到所述语音 的频域信号 和所述原语音的频域信号 所述第一确定子单元, 配 置为根据所述语音的频域信号 和所述原语音的频域信号 ,确定所 述语音中噪声的频域信号; 所述第二确定子单元, 配置为将所述语音的频 域信号 与所述噪声的频域信号进行卷积, 确定所述去噪后的语音的频 域信号; 所述第二变换子单元, 配置为对去燥后的语音的频域信号进行逆 快速傅里叶变换, 得到去噪后的语音 。( )。
优选地, 所述处理单元, 配置为根据所述原语音 的幅度均值对所 述去噪后的语音 进行增强, 所述处理单元包括第三确定子单元、 第四 确定子单元和增强子单元, 其中: 所述第三确定子单元, 配置为确定所述 去噪后的语音 的当前幅度均值; 所述第四确定子单元, 配置为根据所 述原语音 的原幅度均值和所述当前幅度均值确定语音增强系数《;所述 增强子单元, 配置为根据所述语音增强系数 η对所述去噪后的语音 进 行增强。
优选地, 所述处理单元还配置为: 确定所述去噪后的语音 的幅度 均值大于等于所述原语音 的幅度均值时, 触发所述输出单元; 对应地, 所述输出单元, 还配置为将所述去噪后的语音 直接输出。 本发明实施例提供的语音通话方法及装置, 接收通话语音 后, 先 对所述语音 (0进行去噪, 得到去噪后的语音 Q(0 ; 再在确定所述去噪后 的语音 Q(0的幅度均值小于存储的原语音 的幅度均值时, 对所述去噪 后的语音 进行增强后输出; 如此, 能够使用户在不方便大声说话的场 合下仍然能够得到较好的通话效果, 同时, 还能将周围的杂音进行有效去 除, 接听的对方不会再受听不清的困扰, 另外也不会影响到周围的人。 附图说明
图 1为本发明实施例语音通话方法的实现流程示意图;
图 2 为本发明实施例中对所述去噪后的语音进行增强的一种实现流程 示意图;
图 3为本发明实施例语音通话装置的组成结构示意图;
图 4为图 3中去噪单元的组成结构示意图;
图 5为图 3中增强单元的组成结构示意图。 具体实施方式
本发明实施例中, 接收通话语音后, 先对所述语音进行去噪, 得到去 噪后的语音; 再在确定所述去噪后的语音的幅度均值小于存储的原语音的 幅度均值时, 对所述去噪后的语音进行增强后输出。
下面结合附图和具体实施例对本发明的技术方案进一步详细阐述。 图 1为本发明实施例语音通话方法的实现流程示意图, 如图 1所示, 所述语音通话方法的具体流程如下:
步骤 101, 存储原语音 ( ), 并提取所述原语音 的幅度均值; 具体地, 用户可以寻找一个没有噪声和杂音的安静场合, 打开录音装 置, 录入一段自己正常说话时的声音作为原语音^ χ>。
这里, 所述提取原语音 t)的幅度均值的目的是: 当判断用户在通话过 程中的语音的幅度均值小于正常说话的原语音的幅度均值时, 为了防止对 方听不清楚, 可根据原语音的幅度均值对正在通话的语音的幅度均值进行 增强, 以使对方能够清楚的听到用户的讲话。
其中, 所述提取所述原语音 的幅度均值, 本领域的技术人员可以 根据各种现有技术来实现, 这里不再赘述。
步骤 102, 接收通话语音 (o, 并对所述语音 (o进行去噪, 得到去 噪后的语音 Q(0 ;
这里, 所述接收通话语音的环境可以是任何场合, 尤其是指一些不方 便大声讲话的场合, 例如: 看电影、 看歌剧、 看话剧、 开会、 工作等的场 合。 用户在这些不方便大声讲话的场合, 接到呼入电话或进行呼出电话后 不能大声讲话, 只能通过很小或低的声音来表达意思, 从而会使对方无法 听清楚用户的声音, 进而影响双方交流的场合。
这里,所述语音 可以是通过麦克风等接收的语音;所述语音 包 括: 用户音量很小的声音和远大于用户说话声音的音量的背景噪声。
这里, 所述对所述语音 (ο进行去噪, 包括:
步骤 A1, 分别对所述语音 和存储的原语音 进行快速傅里叶变 换(FFT, Fast Fourier Transform ), 得到所述语音的频域信号 (»和所述 原语音的频域信号 ( );
步骤 A2, 根据所述语音的频域信号 和原语音的频域信号 Ι ), 确定所述语音中噪声的频域信号;
具体地,将所述语音的频域信号 'o)与原语音的频域信号 y'o)相减, 得到噪声的频域信号;
步骤 A3, 将所述语音的频域信号 与所述噪声的频域信号进行卷 积, 确定所述去噪后的语音的频域信号;
步骤 A4, 对所述去燥后的语音的频域信号进行逆快速傅里叶变换 ( IFFT, Inverse Fast Fourier Transform ), 得到去噪后的语音 Q (t)。
这里, 在分别对所述语音 和存储的原语音 进行快速傅里叶变 换之前, 所述对所述语音进行去噪还包括: 分别将所述语音 (0和所述原 语音 进行加汉明 ( Hanning ) 窗处理。
这里, 所述对所述语音 进行去噪是为了去除远大于用户声音的背 景噪声。 在现有技术中, 对语音需进行去噪的方法有很多, 本领域的技术 人员可以根据各种现有技术对语音进行去噪。
步骤 103, 确定所述去噪后的语音 的幅度均值小于存储的原语音
Y(t)的幅度均值时, 对所述去噪后的语音 Q(o进行增强后输出。
这里, 所述增强是指对去噪后的语音 的幅度进行提升, 以将用户 语音的音量进行放大, 这样, 在不允许大声说话的场合, 通话双方就可以 进行正常清晰的通话。
优选地,所述对所述去噪后的语音 Q(0进行增强为:根据原语音 的 幅度均值对所述去噪后的语音 Q(0进行增强。
优选地, 本发明实施例语音通话方法还包括: 确定所述去噪后的语音 Q(0的幅度均值大于等于存储的原语音 的幅度均值时, 将所述去噪后 的语音 Q(0直接输出。
图 2 为本发明实施例中对所述去噪后的语音进行增强的一种实现流程 示意图,如图 2所示,根据原语音 的幅度均值对所述去噪后的语音 Q(0 进行增强, 具体包括以下步骤:
步骤 201, 确定所述去噪后的语音 。 ( 的当前幅度均值;
步骤 202,根据所述原语音 的原幅度均值和所述当前幅度均值确定 语音增强系数《;
具体地,假设存储的原语音 的原幅度均值为 11 ^011,假设所述当前 幅度均值为 II X0 (t) II, 用所述 II Y(t) II除以所述 II X0 (t) II得到所述语音增强系 数 n;
步骤 203, 根据所述语音增强系数《对所述去噪后的语音 进行增 强。
具体地, 将所述去噪后的语音 乘以所述语音增强系数《, 得到用 户正常说话时音量的语音数据; 在实际的应用过程中, 本发明实施例中还 应包括: 将对所述去噪后的语音 进行增强后得到的正常语音从数字信 号转化为模拟信号后进行输出, 相应的, 在通过麦克风等接收含有噪声的 语音; T(t)时, 还应该将所述; r(t)转化为数字信号, 这里, 所述模拟信号转数 字信号, 以及数字信号转模拟信号, 本领域的技术人员均可以采用各种现 有技术来实现, 这里不再赘述。
图 3为本发明实施例语音通话装置的组成结构示意图, 如图 3所示, 本发明实施例语音通话装置, 包括接收单元 31、 去噪单元 32、 处理单元 33 和输出单元 34;
所述接收单元 31, 配置为接收通话语音
所述去噪单元 32, 配置为对所述语音 进行去噪, 得到去噪后的语 音^);
所述处理单元 33,配置为确定所述去噪后的语音 的幅度均值小于 存储的原语音 的幅度均值时, 对所述去噪后的语音 Q(0进行增强; 所述输出单元 34, 配置为对增强后的语音进行输出。
优选地, 所述装置还包括: 存储单元和提取单元; 其中, 所述存储单 元, 配置为存储原语音 所述提取单元, 配置为提取所述原语音的幅 度均值。
优选地, 所述处理单元 33 还配置为: 确定所述去噪后的语音 的 幅度均值大于等于所述原语音 的幅度均值时, 触发所述输出单元 34; 相应地, 所述输出单元 34, 还配置为将所述去噪后的语音 直接输出。 优选地, 所述处理单元 33 对所述去噪后的语音 进行增强为: 根 据所述原语音 )的幅度均值对所述去噪后的语音 Q(0进行增强。
图 4为图 3中去噪单元的组成结构示意图, 如图 4所示, 所述去噪单 元 32进一步包括第一变换子单元 41、 第一确定子单元 42、 第二确定子单 元 43和第二变换子单元 44; 其中,
所述第一变换子单元 41, 配置为分别对所述语音 和存储的原语音
Y(t)进行快速傅里叶变换, 得到所述语音的频域信号 o)和所述原语音的 频域信号
所述第一确定子单元 42,配置为根据所述语音的频域信号 和所述 原语音的频域信号 o), 确定所述语音中噪声的频域信号;
所述第二确定子单元 43,配置为将语音的频域信号 与所述噪声的 频域信号进行卷积, 确定所述去噪后的语音的频域信号;
所述第二变换子单元 44, 配置为对去燥后的语音的频域信号进行逆快 速傅里叶变换, 得到去噪后的语音 。( )。
图 5为图 3中处理单元的组成结构示意图, 如图 5所示, 所述处理单 元 33进一步包括第三确定子单元 51、第四确定子单元 52和增强子单元 53, 其中:
所述第三确定子单元 51,配置为确定所述去噪后的语音 的当前幅 度均值;
所述第四确定子单元 52, 配置为根据所述原语音 的原幅度均值和 所述当前幅度均值确定语音增强系数 n;
所述增强子单元 53, 配置为根据所述语音增强系数《对所述去噪后的 语音 Q(0进行增强。
本发明实施例在具体实现的过程中, 还可以相应的设置为一种通话模 式, 当用户进入不方便说话的场合时, 便可以打开所述通话模式, 这时, 当用户有电话需要呼出或者有电话需要呼入时, 就可以执行本发明实施例 语音通话方法的处理流程。 与现有技术相比, 采用语音识别技术对用户的 声音进行识别, 将语音中的噪声过滤掉, 然后放大输出到对方, 使得用户 在小声说话的情况下, 对方仍能够得到较好的通话效果, 同时将周围的杂 音进行有效去除, 不用再受对方听不清的困扰, 同时也不会影响到周围的 人。
本领域的技术人员应当理解, 图 4至图 5所示的语音通话装置中的各 处理单元、 子单元以及模块的实现功能可参照前述语音通话方法的相关描 述而理解。 本领域技术人员还应当理解, 图 4至图 5所示的语音通话装置 中各处理单元、 子单元以及模块可通过所述移动终端的处理器而实现, 也 可通过具体的逻辑电路而实现。 比如, 在实际应用中, 处理器可由中央处 理器( CPU, Central Processing Unit )、 处理器( MPU, Micro Processor Unit )、 或数字信号处理器(DSP, Digital Signal Processor ) 实现。
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。 工业实用性
本发明实施例在接收通话语音 后, 先对所述语音 进行去噪, 得到去噪后的语音 Q( ) ;再在确定所述去噪后的语音 的幅度均值小于 存储的原语音 的幅度均值时, 对所述去噪后的语音 Q(o进行增强后输 出; 如此, 能够使用户在不方便大声说话的场合下仍然能够得到较好的通 话效果, 同时, 还能将周围的杂音进行有效去除, 接听的对方不会再受听 不清的困扰, 另外也不会影响到周围的人。

Claims

权利要求书
1、 一种语音通话方法, 所述方法包括:
接收通话语音 (0, 并对所述语音 (0进行去噪, 得到去噪后的语音 确定所述去噪后的语音 Q(0的幅度均值小于存储的原语音 (0的幅度 均值时, 对所述去噪后的语音 Q(o进行增强后输出。
2、 根据权利要求 1所述的方法, 其中, 所述方法还包括: 存储原语音 Y(t) , 并提取所述原语音 的幅度均值。
3、根据权利要求 1所述的方法,其中,所述对所述语音 进行去噪, 包括:
分别对所述语音 和存储的原语音 进行快速傅里叶变换, 得到 所述语音的频域信号 和所述原语音的频域信号 ;
根据所述语音的频域信号 和所述原语音的频域信号 ,确定所 述语音中噪声的频域信号;
将所述语音的频域信号 与所述噪声的频域信号进行卷积, 确定所 述去噪后的语音的频域信号;
对所述去燥后的语音的频域信号进行逆快速傅里叶变换, 得到去噪后 的语音 Q(0。
4、 根据权利要求 1所述的方法, 其中, 所述对所述去噪后的语音进行 增强为: 根据所述原语音 的幅度均值对所述去噪后的语音 Q(0进行增 强, 包括:
确定所述去噪后的语音 Q(0的当前幅度均值;
根据所述原语音 (0的原幅度均值和所述当前幅度均值确定语音增强 系数《;
根据所述语音增强系数 n对所述去噪后的语音 。 ( 进行增强。
5、 根据权利要求 1至 4任一项所述的方法, 其中, 所述方法还包括: 确定所述去噪后的语音 Q(0的幅度均值大于等于所述原语音 ίΊ )的幅度均 值时, 将所述去噪后的语音 Q(0直接输出。
6、 一种语音通话装置, 所述装置包括接收单元、 去噪单元、 处理单元 和输出单元; 其中,
所述接收单元, 配置为接收通话语音
所述去噪单元, 配置为对所述语音 进行去噪, 得到去噪后的语音 所述处理单元, 配置为确定所述去噪后的语音 的幅度均值小于存 储的原语音 的幅度均值时, 对所述去噪后的当前语音 Q(0进行增强; 所述输出单元, 配置为对增强后的语音进行输出。
7、 根据权利要求 6所述的装置, 其中, 所述装置还包括: 存储单元和 提取单元; 其中,
所述存储单元, 配置为存储原语音
所述提取单元, 配置为提取所述原语音 的幅度均值。
8、 根据权利要求 6所述的装置, 其中, 所述去噪单元包括第一变换子 单元、 第一确定子单元、 第二确定子单元和第二变换子单元; 其中,
所述第一变换子单元, 配置为分别对所述语音 和存储的原语音
Y(t)进行快速傅里叶变换, 得到所述语音的频域信号 o)和所述原语音的 频域信号
所述第一确定子单元, 配置为根据所述语音的频域信号 和所述原 语音的频域信号 y<», 确定所述语音中噪声的频域信号; 所述第二确定子单元, 配置为将所述语音的频域信号 o)与所述噪声 的频域信号进行卷积, 确定所述去噪后的语音的频域信号;
所述第二变换子单元, 配置为对去燥后的语音的频域信号进行逆快速 傅里叶变换, 得到去噪后的语音 。( )。
9、 根据权利要求 6所述的装置, 其中, 所述处理单元, 配置为根据所 述原语音 的幅度均值对所述去噪后的语音 进行增强, 所述处理单 元包括第三确定子单元、 第四确定子单元和增强子单元, 其中:
所述第三确定子单元, 配置为确定所述去噪后的语音 的当前幅度 均值;
所述第四确定子单元, 配置为根据所述原语音 的原幅度均值和所 述当前幅度均值确定语音增强系数 η;
所述增强子单元,配置为根据所述语音增强系数 η对所述去噪后的语音 Q(0进行增强。
10、 根据权利要求 6至 9任一项所述的装置, 其中, 所述处理单元, 还配置为确定所述去噪后的语音 的幅度均值大于等于所述原语音 的幅度均值时, 触发所述输出单元;
对应地,所述输出单元,还配置为将所述去噪后的语音 直接输出。
PCT/CN2013/087986 2013-09-06 2013-11-27 一种语音通话方法及装置 WO2014161334A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310404931.3A CN104427068B (zh) 2013-09-06 2013-09-06 一种语音通话方法及装置
CN201310404931.3 2013-09-06

Publications (1)

Publication Number Publication Date
WO2014161334A1 true WO2014161334A1 (zh) 2014-10-09

Family

ID=51657522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/087986 WO2014161334A1 (zh) 2013-09-06 2013-11-27 一种语音通话方法及装置

Country Status (2)

Country Link
CN (1) CN104427068B (zh)
WO (1) WO2014161334A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278164A (zh) * 2018-12-04 2020-06-12 中国移动通信集团安徽有限公司 语音业务迁移方法、装置、设备及介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654955B (zh) * 2016-03-18 2019-11-12 华为技术有限公司 语音识别方法及装置
CN105827618A (zh) * 2016-04-25 2016-08-03 四川联友电讯技术有限公司 改善碎片化异步会议系统通话质量的方法
CN106409309A (zh) * 2016-10-21 2017-02-15 深圳市音络科技有限公司 一种音质增强的方法和麦克风
CN106527478A (zh) * 2016-11-24 2017-03-22 深圳市道通智能航空技术有限公司 无人机现场声音获取方法与有声视频实现方法及相关装置
CN106887237A (zh) * 2017-02-09 2017-06-23 惠州Tcl移动通信有限公司 移动终端及其处于耳机模式下进行通话的降噪方法、系统
CN108766453A (zh) * 2018-05-24 2018-11-06 江西午诺科技有限公司 语音降噪方法、装置、可读存储介质及移动终端
CN109120790B (zh) * 2018-08-30 2021-01-15 Oppo广东移动通信有限公司 通话控制方法、装置、存储介质及穿戴式设备
CN115482830B (zh) * 2021-05-31 2023-08-04 华为技术有限公司 语音增强方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101370322A (zh) * 2008-09-12 2009-02-18 深圳华为通信技术有限公司 麦克风增益调节的方法及通信设备
CN101552823A (zh) * 2008-04-03 2009-10-07 华硕电脑股份有限公司 音量管理系统及方法
CN102006349A (zh) * 2010-11-25 2011-04-06 惠州Tcl移动通信有限公司 会议模式下增强手机通话质量的方法及其实现装置
JP4835611B2 (ja) * 2008-03-03 2011-12-14 岩崎通信機株式会社 エコー低減方法と装置
CN103237111A (zh) * 2013-04-28 2013-08-07 广东欧珀移动通信有限公司 一种扩大通话音量的方法及移动终端

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100617109B1 (ko) * 2004-12-29 2006-08-31 엘지전자 주식회사 통신 단말기용 잡음 제거 장치
CN101056322A (zh) * 2006-04-13 2007-10-17 中兴通讯股份有限公司 一种在移动通讯终端上叠加背景声的装置及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4835611B2 (ja) * 2008-03-03 2011-12-14 岩崎通信機株式会社 エコー低減方法と装置
CN101552823A (zh) * 2008-04-03 2009-10-07 华硕电脑股份有限公司 音量管理系统及方法
CN101370322A (zh) * 2008-09-12 2009-02-18 深圳华为通信技术有限公司 麦克风增益调节的方法及通信设备
CN102006349A (zh) * 2010-11-25 2011-04-06 惠州Tcl移动通信有限公司 会议模式下增强手机通话质量的方法及其实现装置
CN103237111A (zh) * 2013-04-28 2013-08-07 广东欧珀移动通信有限公司 一种扩大通话音量的方法及移动终端

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278164A (zh) * 2018-12-04 2020-06-12 中国移动通信集团安徽有限公司 语音业务迁移方法、装置、设备及介质

Also Published As

Publication number Publication date
CN104427068A (zh) 2015-03-18
CN104427068B (zh) 2019-07-12

Similar Documents

Publication Publication Date Title
WO2014161334A1 (zh) 一种语音通话方法及装置
US10074380B2 (en) System and method for performing speech enhancement using a deep neural network-based signal
WO2016184119A1 (zh) 一种音量调节方法、系统、设备和计算机存储介质
CN110602327B (zh) 语音通话方法、装置、电子设备及计算机可读存储介质
US8855295B1 (en) Acoustic echo cancellation using blind source separation
US10516941B2 (en) Reducing instantaneous wind noise
US8488805B1 (en) Providing background audio during telephonic communication
WO2012174790A1 (zh) 一种降低噪声的方法及移动终端
US8259954B2 (en) Enhancing comprehension of phone conversation while in a noisy environment
CN111556210B (zh) 通话语音处理方法与装置、终端设备和存储介质
TW201434040A (zh) 通訊裝置及其語音處理方法
WO2019143429A1 (en) Noise reduction in an audio system
CN101488992A (zh) 通话处理方法及移动终端
CN111199751B (zh) 一种麦克风的屏蔽方法、装置和电子设备
US8976956B2 (en) Speaker phone noise suppression method and apparatus
CN111210799A (zh) 一种回声消除方法及装置
US9392365B1 (en) Psychoacoustic hearing and masking thresholds-based noise compensator system
WO2022142984A1 (zh) 语音处理方法、装置、系统、智能终端以及电子设备
US10540984B1 (en) System and method for echo control using adaptive polynomial filters in a sub-band domain
Fukui et al. Acoustic echo and noise canceller for personal hands-free video IP phone
JP6396829B2 (ja) 情報処理装置、判定方法及びコンピュータプログラム
JP6945158B2 (ja) 通話装置、プログラム及び通話システム
JP6369192B2 (ja) エコー抑圧装置、エコー抑圧プログラム、エコー抑圧方法及び通信端末
CN111741396A (zh) 控制方法、装置、电子设备及可读存储介质
JP6369189B2 (ja) エコー抑圧装置、エコー抑圧プログラム、エコー抑圧方法及び通信端末

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13881040

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13881040

Country of ref document: EP

Kind code of ref document: A1