WO2014000658A1 - 消除噪音的方法和装置、以及移动终端 - Google Patents

消除噪音的方法和装置、以及移动终端 Download PDF

Info

Publication number
WO2014000658A1
WO2014000658A1 PCT/CN2013/078130 CN2013078130W WO2014000658A1 WO 2014000658 A1 WO2014000658 A1 WO 2014000658A1 CN 2013078130 W CN2013078130 W CN 2013078130W WO 2014000658 A1 WO2014000658 A1 WO 2014000658A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
audio fingerprint
party
calling party
voice
Prior art date
Application number
PCT/CN2013/078130
Other languages
English (en)
French (fr)
Inventor
彭伟刚
吴博
胡先
付红峰
李少博
蒋奎
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US14/410,602 priority Critical patent/US20150325252A1/en
Priority to KR20157001736A priority patent/KR20150032562A/ko
Publication of WO2014000658A1 publication Critical patent/WO2014000658A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • the present invention relates to computer technology, and more particularly to a method, apparatus, and mobile terminal for eliminating noise. Background of the invention
  • the quality of the call is affected by the background noise of the surrounding environment. For example, when a user uses a mobile phone to talk to a friend, if the user is in a relatively noisy environment, the voice transmitted by the user through the mobile phone may be disturbed by background noise, which may cause the voice received by the friend through the mobile phone to contain background noise, which affects the call. quality.
  • a hardware device that is, a noise canceling hardware device, is additionally added to the mobile terminal to reduce the impact of noise on the call quality.
  • the noise canceling hardware device includes a background noise canceling microphone, a noise canceling chip, and a generating device.
  • the background noise canceling microphone is different from the normal talk microphone on the mobile terminal for collecting noise sound waves.
  • the noise canceling chip is used to generate sound waves opposite to the noise based on the noise sound waves collected by the background noise canceling microphone.
  • the sounding device is configured to emit the sound wave opposite to the noise to utilize the cancellation principle to eliminate noise during the call, thereby improving the call quality.
  • Embodiments of the present invention provide a method, apparatus, and mobile terminal for eliminating noise, which can eliminate background noise during a call and avoid adding a noise canceling hardware device to the mobile terminal.
  • a method of eliminating noise including:
  • the sound matching the audio fingerprint is extracted from the current call voice, and the sound matching the audio fingerprint is matched. Send to the opposite party through the communication network.
  • a device for canceling noise comprising: at least a memory, and a processor in communication with the memory, wherein the memory includes an fetch instruction and a transfer instruction executable by the processor:
  • the extraction instruction is configured to extract and store an audio fingerprint of the party in advance from a voice of the party;
  • the transmission instruction is configured to: when the calling party and the opposite party are in a call, extract a sound matching the audio fingerprint from the current call voice according to the audio fingerprint of the party, and The voice matching the audio fingerprint is sent to the opposite party through the communication network.
  • a mobile terminal includes the above noise canceling device.
  • the audio fingerprint of the calling party is first extracted from the voice of the calling party, and when the calling party and the opposite party are talking, the audio of the calling party is a fingerprint, extracting a sound matching the audio fingerprint of the party from the current call voice, and transmitting the extracted voice to the opposite party through the communication network, thereby It ensures that the opposite party hears a clearer and more needed voice, which improves the quality of the call. Further, in the embodiment of the present invention, since the sound transmitted through the communication network is only the sound actually emitted by the calling party, other noise is not included, thereby reducing the load of the communication network.
  • FIG. 1 is a flowchart of a method for eliminating noise according to an embodiment of the present invention.
  • FIG. 2 is another flow chart of a method for eliminating noise according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an apparatus for eliminating noise according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another apparatus for eliminating noise according to an embodiment of the present invention. Mode for carrying out the invention
  • the method for eliminating the noise provided by the embodiment of the present invention can be applied to a mobile terminal, such as a mobile phone, and the like, and can also be applied to a fixed hardware device, such as a PC, etc., which is not described in the embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for eliminating noise according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps.
  • step 101 the audio fingerprinting of the party is extracted in advance from the voice of the party.
  • the audio fingerprint indicates the voice attribute of the party, and can be used to identify the voice of the party.
  • step 102 when the calling party and the opposite party are talking, according to the audio fingerprint of the party, the sound matching the audio fingerprint is extracted from the current call voice, and the The audio fingerprint matching sound is sent to the opposite party through the communication network.
  • the current call voice may include the actual voice of the party and the noise that affects the actual voice of the party.
  • the noise will be mixed with the actual voice of the party as a mixed party voice. If the mobile terminal transmits the mixed party voice directly through the communication network, the opposite party will receive both the noise and the actual voice of the party, which affects the quality of the call.
  • the actual voice of the party is extracted from the voice of the mixed party, and only the extracted voice is transmitted through the communication network, so that the opposite end is received. The party will receive the actual voice of the party, ensuring that the opposite party hears a clearer and more needed voice, which improves the quality of the call.
  • steps 101 to 102 can be implemented by software installed in the mobile terminal, and the flow shown in FIG. 1 is described in detail below.
  • FIG. 2 is a detailed flowchart of a method for eliminating noise according to an embodiment of the present invention.
  • the method is applied to a mobile terminal. As shown in FIG. 2, the method includes the following steps.
  • step 201 the mobile terminal extracts the audio fingerprint of the user from the voice of each user.
  • the audio fingerprint indicates the voice attribute of the user, and can be used to identify the voice of the user.
  • the mobile terminal extracts the audio fingerprint of the user from the voice of the user, including: dividing the user voice signal into multiple frames overlapping each other; performing feature calculation on each frame, and mapping the obtained result by using a classifier manner As a data, the obtained data is taken as the audio fingerprint of the user.
  • the user sound signal can be divided into a plurality of frames overlapping each other by the following manner. Starting from different starting times, the user sound signal is divided into a plurality of frames overlapping each other according to the set time interval; or, starting from different starting frequencies, the user sound signal is divided into a plurality of frames overlapping each other according to the set frequency interval.
  • the user sound signal is divided into a plurality of frames that overlap each other according to a set time interval. If the set time interval is 1 ms, the user sound signal of 1 ms length starting from the 0 ms is used as a frame, and the lms starts from 0.5 ms.
  • the feature operation performed on each frame may be implemented in any one or any combination of the following: Fourier transform (FFT), wavelet transform (WT), Meyer cepstral coefficient (MFCC), spectral smoothness, Sharpness, Linear Predictive Coding (LPC).
  • FFT Fourier transform
  • WT wavelet transform
  • MFCC Meyer cepstral coefficient
  • spectral smoothness Sharpness
  • LPC Linear Predictive Coding
  • the classifier mode in the embodiment of the present invention may be an existing hidden Markov model or a quantization technique, wherein the obtained result is mapped to a data by using a classifier manner, which may be used in the prior art.
  • the way of implicit Markov model or quantization technology mapping is similar, and will not be described here.
  • step 202 the mobile terminal stores the audio fingerprint of each user locally.
  • step 203 the mobile terminal finds the audio fingerprint of the user A from the audio fingerprint of the locally stored user when a user, such as user A, makes a call.
  • the current call voice of the user A includes: the actual sound of the user A and the noise affecting the actual sound of the user A, which may be the background noise around the user A or the like.
  • step 204 the mobile terminal extracts a sound matching the audio fingerprint of the user A from the current call voice of the user A by using the audio fingerprint of the user A.
  • the target sound collection and prediction mode is adopted, from the user.
  • a sound that matches the audio fingerprint of User A is predicted in the current call voice of A.
  • the predicted sound is extracted from the current call sound by the secondary positioning of the target sound in the time-frequency domain, and the extracted sound is used as the sound matching the audio fingerprint of the user A.
  • the target sound collection and prediction mode used in the embodiment and the secondary positioning of the target sound in the time-frequency domain can be similar to the prior art, and the present invention will not be described again.
  • step 205 the mobile terminal transmits the voice extracted in step 204 to the opposite party through the communication network.
  • the opposite party can hear the voice actually sent by the user A, thereby ensuring the quality of the call between the user A and the opposite party, and, due to the transmission through the communication network.
  • the sound is only the actual sound emitted by User A, and does not include other noise, thereby reducing the load on the communication network.
  • FIG. 3 is a schematic structural diagram of an apparatus for eliminating noise according to an embodiment of the present invention.
  • the apparatus includes an extraction module and a transmission module.
  • the extraction module is configured to extract and store the audio fingerprint of the party in advance from the voice of the party.
  • the transmission module is configured to: when the calling party and the opposite party are in a call, extract a sound matching the audio fingerprint from the current call voice according to the audio fingerprint of the party, and match the audio fingerprint.
  • the sound is sent to the opposite party through the communication network; wherein, the current call voice includes the sound actually emitted by the party and the noise that affects the actual sound of the party.
  • the extraction module includes a dividing unit and a mapping unit.
  • the dividing unit is configured to divide the voice signal of the party into a plurality of frames overlapping each other.
  • the mapping unit is configured to perform a feature operation on each frame, and use the classifier method to map the obtained result into a data, and use the obtained data as an audio fingerprint of the party.
  • the dividing unit divides the voice signal of the party into a plurality of frames that overlap each other, including: starting from different starting times, dividing the voice signal of the party into a plurality of frames overlapping each other according to the set time interval; Or, starting from different starting frequencies, the voice signal of the party is divided into a plurality of frames overlapping each other according to the set frequency interval.
  • the transmission module extracts a sound matching the audio fingerprint from the current call sound through the prediction unit and the extraction unit.
  • the prediction unit is configured to predict the sound matching the audio fingerprint of the party from the current call voice by using the target sound collection prediction mode.
  • the extracting unit is configured to extract the predicted sound from the current call sound by using the secondary positioning of the target sound in the time-frequency domain, and use the extracted sound as a sound matching the audio fingerprint of the party.
  • FIG. 4 is a schematic structural diagram of another apparatus for eliminating noise according to an embodiment of the present invention.
  • the apparatus includes at least a memory, and a processor in communication with the memory, wherein the memory includes fetch instructions and transfer instructions executable by the processor.
  • the fetch instruction is used to extract and store the audio fingerprint of the party in advance from the voice of the party.
  • the transmission instruction is used to extract a sound matching the audio fingerprint from the current call sound according to the audio fingerprint of the party when the party and the opposite party are talking, and match the sound of the audio fingerprint.
  • the extraction instruction includes a division sub-instruction and a mapping sub-instruction.
  • the dividing sub-instruction is used to divide the voice signal of the party into a plurality of frames overlapping each other.
  • the mapping sub-instruction is used to perform a feature operation on each frame, and the obtained result is mapped into a data by using a classifier method, and the obtained data is used as an audio fingerprint of the party.
  • the dividing sub-instruction divides the voice signal of the party into a plurality of frames that overlap each other includes: dividing the voice signal of the party into multiple frames overlapping each other according to the set time interval from different starting times Or, starting from different starting frequencies, divide the voice signal of the party into multiple frames that overlap each other according to the set frequency interval.
  • the transmission instruction extracts a sound matching the audio fingerprint from the current call sound by using the prediction sub-instruction and the extraction sub-instruction.
  • the prediction sub-instruction is used to predict the sound matching the audio fingerprint of the party from the current call voice by using the target sound collection prediction mode.
  • the extracting sub-instruction is for extracting the predicted sound from the current call sound by using the secondary positioning of the target sound in the time-frequency domain, and using the extracted sound as a sound matching the audio fingerprint of the party.
  • the embodiment of the present invention further provides a mobile terminal, where the mobile terminal may include the apparatus shown in FIG. 3 or FIG.
  • the audio fingerprint of the party is extracted from the voice of the party, and the party is called according to the party when the party and the opposite party are talking.
  • Audio fingerprint extracting a sound matching the audio fingerprint of the party from the current call voice, and transmitting the extracted voice to the opposite party through the communication network; wherein, the current call voice includes the actual party.
  • the sound emitted and the noise that affects the actual sound of the party can be used to ensure that the receiving party hears a clearer and more desired sound, and improves the quality of the call.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种消除噪音的方法和装置、以及移动终端,该方法包括:预先从通话方的声音中提取出所述通话方声音的音频指纹(101);在所述通话方和对端受话方通话时,根据所述通话方的音频指纹,从当前通话声音中提取出与所述音频指纹匹配的声音,并将与所述音频指纹匹配的声音通过通信网络发送至对端受话方(102)。

Description

消除噪音的方法和装置、 以及移动终端 技术领域
本发明涉及计算机技术, 特别涉及消除噪音的方法、 装置以及移动 终端。 发明背景
随着移动通信技术的发展, 移动终端的使用越来越普遍。 在使用移 动终端进行通话时, 通话质量会受到周围环境的背景噪音影响。 比如, 当用户使用手机与好友通话时, 如果该用户处于比较嘈杂的环境中, 则 该用户通过手机传输的声音会受到背景噪音干扰, 会导致好友通过手机 接收的声音含有背景噪音, 影响通话的质量。
为了提高通话质量, 在现有技术中, 在移动终端上额外增加硬件设 备即消噪硬件设备, 来降低噪音对通话质量的影响。 该消噪硬件设备包 括一个背景消噪麦克风、 一个消噪芯片和一个发生装置。 该背景消噪麦 克风区别于移动终端上正常的通话麦克风, 用于采集噪音声波。 该消噪 芯片用于基于该背景消噪麦克风采集的噪音声波来生成与噪音相反的 声波。 该发声装置用于发出该与噪音相反的声波, 以利用抵消原理来消 除通话过程中噪音, 从而提高通话质量。
但是, 由于在现有的提高通话质量的过程中, 需要在移动终端上额 外增加消噪硬件设备, 尤其是在手机上, 这会增加硬件成本。 另外, 上 述的消噪硬件设备并不能彻底消除噪音, 从而导致未被消除的噪音携带 在移动终端用户的音频数据中传输给对端, 这导致传输的音频数据过 大, 影响音频数据的传输速度和质量。 还有, 为了消除噪音, 背景消噪 麦克风不能被随意放置在移动终端中, 背景消噪麦克风需要和移动终端 上的通话麦克风保持足够的距离, 从而增加了移动终端的设计难度。 发明内容
本发明实施例提供了一种消除噪音的方法、 装置以及移动终端, 能 够消除通话过程中的背景噪音, 并且避免在移动终端中增加消噪硬件设 备。
本发明实施例提供的技术方案包括:
一种消除噪音的方法, 包括:
预先从通话方的声音中提取出所述通话方的音频指纹;
在所述通话方和对端受话方通话时, ^据所述通话方的音频指纹, 从当前通话声音中提取出与所述音频指纹匹配的声音, 并将与所述音频 指纹匹配的声音通过通信网络发送至对端受话方。
一种消除噪音的装置, 该装置包括: 至少包括存储器, 以及与所述 存储器通信的处理器, 其中所述存储器中包括可由处理器执行的提取指 令和传输指令:
所述提取指令, 用于预先从通话方的声音中提取并存储所述通话方 的音频指纹;
所述传输指令, 用于在所述通话方和对端受话方通话时, 根据所述 通话方的音频指纹, 从当前通话声音中提取出与所述音频指纹匹配的声 音, 并将与所述音频指纹匹配的声音通过通信网络发送至对端受话方。
一种移动终端, 包括上述的消除噪音的装置。
由以上技术方案可以看出, 本发明实施例中, 先从通话方的声音中 提取出该通话方的音频指纹, 在该通话方和对端受话方通话时, ^据该 通话方的音频指纹, 从当前通话声音中提取出与该通话方的音频指纹匹 配的声音, 并将该提取出的声音通过通信网络发送至对端受话方, 从而 保证了对端受话方听到更清楚的且自身需要的声音, 提高了通话质量。 进一步地, 本发明实施例中, 由于通过通信网络传输的声音仅为通 话方实际发出的声音, 不包括其他噪音, 从而减少了通信网络的负载。 附图简要说明
图 1为本发明实施例提供的消除噪音的方法的流程图。
图 2为本发明实施例提供的消除噪音的方法的另一流程图。
图 3为本发明实施例提供的消除噪音的装置的结构示意图。
图 4为本发明实施例提供的另一消除噪音的装置的结构示意图。 实施本发明的方式
为了使本发明的目的、 技术方案和优点更加清楚, 下面结合附图和 具体实施例对本发明进行详细描述。
本发明实施例提供的消除噪音的方法可以应用在移动终端比如手机 等上, 也可以应用于固定硬件设备比如 PC机等上, 本发明实施例并不 说明。
参见图 1 , 图 1为本发明实施例提供的消除噪音的方法的流程图。 如图 1所示, 该方法包括以下步骤。
在步骤 101中, 预先从通话方的声音中提取出该通话方的音频指纹 ( Audio fingerprinting )。
在本发明实施例中, 该音频指纹指示了该通话方的声音属性, 可以 用来标识该通话方的声音。
在步骤 102中, 在该通话方和对端受话方通话时, 根据该通话方的 音频指纹, 从当前通话声音中提取出与该音频指纹匹配的声音, 并将与 该音频指纹匹配的声音通过通信网络发送至对端受话方。
本发明实施例中, 当前通话声音中可以包含该通话方实际的声音和 影响该通话方实际发出声音的噪音。
通常, 如果通话方处于一个嘈杂的环境中, 噪音会跟着通话方的实 际的声音混杂在一起作为混合的通话方声音。 如果移动终端将该混合的 通话方声音直接通过通信网络传输, 则对端受话方就会同时收到噪音和 通话方实际发出的声音, 影响通话质量。 而本发明实施例中, 在通过通 信网络传输通话方声音之前, 先将通话方实际的声音从混合的通话方声 音中提取出来, 只将被提取的声音通过通信网络传输, 这样, 对端受话 方就会收到通话方实际的声音, 保证对端受话方听到更清楚的且自身需 要的声音, 提高了通话质量。
需要说明的是, 上述步骤 101至步骤 102可通过安装在移动终端的 软件实现, 下面对图 1所示流程进行详细描述。
参见图 2, 图 2为本发明实施例提供的消除噪音的方法的详细流程 图。 该方法应用于移动终端, 如图 2所示, 该方法包括以下步骤。
在步骤 201中, 移动终端从各个用户的声音中提取出该用户的音频 指纹。
在本发明实施例中, 该音频指纹指示了该用户的声音属性, 可以用 来标识该用户的声音。
在本步骤中, 移动终端从用户的声音中提取出该用户的音频指纹包 括: 将用户声音信号分成互相重叠的多个帧; 对每一帧进行特征运算, 使用分类器方式将得到的结果映射为一数据, 将该得到的数据作为该用 户的音频指纹。
在本发明实施例中, 可以通过如下方式将用户声音信号分成互相重 叠的多个帧。 从不同起始时间开始, 按照设定时间间隔将用户声音信号分成互相 重叠的多个帧; 或者, 从不同起始频率开始, 按照设定频率间隔将用户 声音信号分成互相重叠的多个帧。
以按照设定时间间隔将用户声音信号分成互相重叠的多个帧为例, 假如设定时间间隔为 lms,则从第 0ms开始的 1ms长度的用户声音信号 作为一个帧, 从 0.5ms开始的 lms长度的用户声音信号作为一个帧, 从 第 lms开始的 lms长度的将用户声音信号作为一个帧、 以及从第 1.5ms 开始的 lms长度的用户声音信号作为帧等, 通过这种划分方式, 这显然 使分成的多个帧之间有一部分互相重叠。
另外, 对每一帧进行的特征运算, 其具体实现时可为以下任一或者 任一组合:傅立叶变换(FFT )、小波变换(WT )、迈尔倒谱系数(MFCC )、 频谱平滑度、 尖锐度、 线性预测编码(LPC )。
还有, 在本发明实施例中的分类器方式可为现有的隐含马尔可夫模 型或量化技术, 其中, 使用分类器方式将得到的结果映射为一数据, 可 为现有技术中使用隐含马尔可夫模型或量化技术映射的方式类似, 这里 不再赘述。
在步骤 202中, 移动终端将每个用户的音频指纹存储在本地。
在步骤 203中, 移动终端在一用户比如用户 A进行通话时, 从本地 存储的用户的音频指纹中找到用户 A的音频指纹。
如果移动终端当前处于一个嘈杂的环境, 则用户 A当前的通话声音 就包括: 用户 A实际的声音和影响用户 A实际声音的噪音,该噪音可以 是用户 A周围的背景噪音等。
在步骤 204中, 移动终端利用用户 A的音频指纹,从用户 A当前的 通话声音中提取出与用户 A的音频指纹匹配的声音。
具体地, 在本步骤中, 首先, 采用目标声音采集预测方式, 从用户 A当前的通话声音中预测出与用户 A的音频指纹匹配的声音。 之后, 采 用时频域内目标声音的二次定位, 从当前的通话声音中提取出该预测的 声音, 将该提取出的声音作为与用户 A的音频指纹匹配的声音。
本实施例中采用的目标声音采集预测方式、 以及时频域内目标声音 的二次定位可与现有技术类似, 本发明不再赘述。
在步骤 205中, 移动终端通过通信网络发送步骤 204提取出的声音 至对端受话方。
如此, 通过上述步骤 201至步骤 205 , 对端受话方即可听到用户 A 实际发出的声音,从而保证用户 A与对端受话方之间的通话质量,并且, 由于通过通信网络传输的声音仅为用户 A实际发出的声音,不包括其他 噪音, 从而减少了通信网络的负载。
以上对本发明实施例提供的方法进行了描述, 下面对本发明实施例 提供的装置进行描述。
参见图 3 , 图 3为本发明实施例提供的消除噪音的装置的结构示意 图。 如图 3所示, 该装置包括提取模块和传输模块。
该提取模块用于预先从通话方的声音中提取并存储该通话方的音频 指纹。
该传输模块用于在该通话方和对端受话方通话时, ^据该通话方的 音频指纹, 从当前通话声音中提取出与该音频指纹匹配的声音, 并将与 该音频指纹匹配的声音通过通信网络发送至对端受话方; 其中, 该当前 通话声音包含该通话方实际发出的声音和影响该通话方实际发出声音 的噪音。
优选地, 本发明实施例中, 如图 3所示, 该提取模块包括划分单元 和映射单元。
该划分单元用于将通话方的声音信号分成互相重叠的多个帧。 该映射单元用于对每一帧进行特征运算, 使用分类器方式将得到的 结果映射为一数据, 将该得到的数据作为该通话方的音频指纹。
本发明实施例中, 该划分单元将通话方的声音信号分成互相重叠的 多个帧包括: 从不同起始时间开始, 按照设定时间间隔将通话方的声音 信号分成互相重叠的多个帧; 或者, 从不同起始频率开始, 按照设定频 率间隔将通话方的声音信号分成互相重叠的多个帧。
优选地, 本发明实施例中, 该传输模块通过预测单元和提取单元从 当前通话声音中提取出与该音频指纹匹配的声音。
预测单元用于采用目标声音采集预测方式, 从当前通话声音中预测 出与通话方的音频指纹匹配的声音。
提取单元用于采用时频域内目标声音的二次定位, 从当前通话声音 中提取出该预测的声音, 将该提取出的声音作为与该通话方的音频指纹 匹配的声音。
参见图 4, 图 4为本发明实施例提供的另一消除噪音的装置的结构 示意图。 如图 4所示, 该装置至少包括存储器, 以及与该存储器通信的 处理器, 其中该存储器中包括可由处理器执行的提取指令和传输指令。
该提取指令用于预先从通话方的声音中提取并存储该通话方的音频 指纹。
该传输指令用于在该通话方和对端受话方通话时, 根据该通话方的 音频指纹, 从当前通话声音中提取出与该音频指纹匹配的声音, 并将与 该音频指纹匹配的声音通过通信网络发送至对端受话方; 其中, 该当前 通话声音包含该通话方实际发出的声音和影响该通话方实际发出声音 的噪音。
优选地, 本发明实施例中, 该提取指令包括划分子指令和映射子指 令。 该划分子指令用于将该通话方的声音信号分成互相重叠的多个帧。 该映射子指令用于对每一帧进行特征运算, 使用分类器方式将得到 的结果映射为一数据, 将该得到的数据作为该通话方的音频指纹。
本发明实施例中, 该划分子指令将通话方的声音信号分成互相重叠 的多个帧包括: 从不同起始时间开始, 按照设定时间间隔将通话方的声 音信号分成互相重叠的多个帧; 或者, 从不同起始频率开始, 按照设定 频率间隔将通话方的声音信号分成互相重叠的多个帧。
优选地, 本发明实施例中, 该传输指令通过预测子指令和提取子指 令从当前通话声音中提取出与该音频指纹匹配的声音。
预测子指令用于采用目标声音采集预测方式, 从当前通话声音中预 测出与通话方的音频指纹匹配的声音。
提取子指令用于采用时频域内目标声音的二次定位, 从当前通话声 音中提取出该预测的声音, 将该提取出的声音作为与该通话方的音频指 纹匹配的声音。
优选地, 本发明实施例还提供了一种移动终端, 其中, 该移动终端 可包括图 3或者图 4所示的装置。
由以上技术方案可以看出, 在本发明实施例中, 先从通话方的声音 中提取出该通话方的音频指纹, 在该通话方和对端受话方通话时, 才艮据 该通话方的音频指纹, 从当前通话声音中提取出与该通话方的音频指纹 匹配的声音, 并将该提取出的声音通过通信网络发送至对端受话方; 其 中, 当前通话声音包含该通话方实际发出的声音和影响该通话方实际发 出声音的噪音, 应用本发明实施例可以保证对端受话方听到更清楚的且 自身需要的声音, 提高了通话质量。
进一步地, 本发明实施例中, 由于通过通信网络传输的声音仅为通 话方实际发出的声音, 不包括其他噪音, 从而减少了通信网络的负载。 以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡 在本发明的精神和原则之内, 所做的任何修改、 等同替换、 改进等, 均 应包含在本发明保护的范围之内。

Claims

权利要求书
1、 一种消除噪音的方法, 其特征在于, 该方法包括:
预先从通话方的声音中提取出所述通话方的音频指纹;
在所述通话方和对端受话方通话时, ^据所述通话方的音频指纹, 从当前通话声音中提取出与所述音频指纹匹配的声音, 并将与所述音频 指纹匹配的声音通过通信网络发送至对端受话方。
2、 根据权利要求 1所述的方法, 其特征在于, 进一步包括: 存储至少一个预先提取的音频指纹;
根据所述通话方声音的音频指纹, 从当前通话声音中提取出与所述 音频指纹匹配的声音包括:
从存储的至少一个音频指纹中获取所述通话方的音频指纹, 从当前 通话声音中提取出与所述音频指纹匹配的声音。
3、根据权利要求 1或 2所述的方法, 其特征在于, 所述从通话方的 声音中提取出所述通话方的音频指纹包括:
将所述通话方的声音信号分成互相重叠的多个帧;
对每一帧进行特征运算, 使用分类器方式将得到的结果映射为一数 据, 将所述得到的数据作为所述通话方的音频指纹。
4、根据权利要求 3所述的方法, 其特征在于, 所述特征运算包括以 下任一个或者任一组合:
傅立叶变换 FFT、 小波变换 WT、 迈尔倒谱系数 MFCC、 频谱平滑 度、 尖锐度、 线性预测编码 LPC。
5、根据权利要求 3所述的方法, 其特征在于, 所述将所述通话方的 声音信号分成互相重叠的多个帧包括:
从不同起始时间开始, 按照设定时间间隔将通话方的声音信号分成 互相重叠的多个帧; 或者,
从不同起始频率开始, 按照设定频率间隔将通话方的声音信号分成 互相重叠的多个帧。
6、根据权利要求 3所述的方法, 其特征在于, 所述根据所述通话方 的音频指纹, 从当前通话声音中提取出与所述音频指纹匹配的声音包 括:
采用目标声音采集预测方式, 从所述当前通话声音中预测出与所述 通话方的音频指纹匹配的声音;
采用时频域内目标声音的二次定位, 从当前通话声音中提取出所述 预测的声音, 将所述提取出的声音作为与所述通话方的音频指纹匹配的 声音。
7、 一种消除噪音的装置, 其特征在于, 该装置至少包括存储器, 以 及与所述存储器通信的处理器, 其中所述存储器中包括可由处理器执行 的提取指令和传输指令:
所述提取指令, 用于预先从通话方的声音中提取并存储所述通话方 的音频指纹;
所述传输指令, 用于在所述通话方和对端受话方通话时, 根据所述 通话方的音频指纹, 从当前通话声音中提取出与所述音频指纹匹配的声 音, 并将与所述音频指纹匹配的声音通过通信网络发送至对端受话方。
8、根据权利要求 7所述的装置, 其特征在于, 所述提取指令包括划 分子指令和映射子指令;
所述划分子指令, 用于将所述通话方的声音信号分成互相重叠的多 个帧;
所述映射子指令, 用于对每一帧进行特征运算, 使用分类器方式将 得到的结果映射为一数据, 将所述得到的数据作为所述通话方的音频指 纹。
9、根据权利要求 8所述的装置, 其特征在于, 所述划分子指令具体 用于:
从不同起始时间开始, 按照设定时间间隔将通话方的声音信号分成 互相重叠的多个帧; 或者,
从不同起始频率开始, 按照设定频率间隔将通话方的声音信号分成 互相重叠的多个帧。
10、 根据权利要求 7所述的装置, 其特征在于, 所述传输指令通过 预测子指令和提取子指令从当前通话声音中提取出与所述音频指纹匹 配的声音;
所述预测子指令, 用于采用目标声音采集预测方式, 从当前通话声 音中预测出与所述通话方的音频指纹匹配的声音;
所述提取子指令, 用于采用时频域内目标声音的二次定位, 从当前 通话声音中提取出所述预测的声音, 将所述提取出的声音作为与所述通 话方的音频指纹匹配的声音。
11、 一种移动终端, 其特征在于, 所述移动终端包括权利要求 7至 10任一所述的装置。
PCT/CN2013/078130 2012-06-28 2013-06-27 消除噪音的方法和装置、以及移动终端 WO2014000658A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/410,602 US20150325252A1 (en) 2012-06-28 2013-06-27 Method and device for eliminating noise, and mobile terminal
KR20157001736A KR20150032562A (ko) 2012-06-28 2013-06-27 소음을 제거하기 위한 방법, 장치 및 모바일 단말

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210217760.9A CN103514876A (zh) 2012-06-28 2012-06-28 噪音消除方法和装置、以及移动终端
CN201210217760.9 2012-06-28

Publications (1)

Publication Number Publication Date
WO2014000658A1 true WO2014000658A1 (zh) 2014-01-03

Family

ID=49782256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/078130 WO2014000658A1 (zh) 2012-06-28 2013-06-27 消除噪音的方法和装置、以及移动终端

Country Status (4)

Country Link
US (1) US20150325252A1 (zh)
KR (1) KR20150032562A (zh)
CN (1) CN103514876A (zh)
WO (1) WO2014000658A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601825A (zh) * 2015-02-16 2015-05-06 联想(北京)有限公司 一种控制方法及装置
WO2016127506A1 (zh) * 2015-02-09 2016-08-18 宇龙计算机通信科技(深圳)有限公司 语音处理方法、语音处理装置和终端

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871417A (zh) * 2014-03-25 2014-06-18 北京工业大学 一种移动手机特定连续语音过滤方法及过滤装置
CN107094196A (zh) * 2017-04-21 2017-08-25 维沃移动通信有限公司 一种通话消噪的方法及移动终端
CN107172256B (zh) * 2017-07-27 2020-05-05 Oppo广东移动通信有限公司 耳机通话自适应调整方法、装置、移动终端及存储介质
CN111696565B (zh) * 2020-06-05 2023-10-10 北京搜狗科技发展有限公司 语音处理方法、装置和介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000032269A (ko) * 1998-11-13 2000-06-05 구자홍 음향 기기의 음성인식장치
CN101321387A (zh) * 2008-07-10 2008-12-10 中国移动通信集团广东有限公司 基于通信系统的声纹识别方法及系统
CN101345055A (zh) * 2007-07-11 2009-01-14 雅马哈株式会社 语音处理器和通信终端设备
CN102694891A (zh) * 2011-03-21 2012-09-26 鸿富锦精密工业(深圳)有限公司 通话噪音去除系统及方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070219801A1 (en) * 2006-03-14 2007-09-20 Prabha Sundaram System, method and computer program product for updating a biometric model based on changes in a biometric feature of a user
US8700194B2 (en) * 2008-08-26 2014-04-15 Dolby Laboratories Licensing Corporation Robust media fingerprints
CN101847409B (zh) * 2010-03-25 2012-01-25 北京邮电大学 一种基于数字指纹的语音完整性保护方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000032269A (ko) * 1998-11-13 2000-06-05 구자홍 음향 기기의 음성인식장치
CN101345055A (zh) * 2007-07-11 2009-01-14 雅马哈株式会社 语音处理器和通信终端设备
CN101321387A (zh) * 2008-07-10 2008-12-10 中国移动通信集团广东有限公司 基于通信系统的声纹识别方法及系统
CN102694891A (zh) * 2011-03-21 2012-09-26 鸿富锦精密工业(深圳)有限公司 通话噪音去除系统及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016127506A1 (zh) * 2015-02-09 2016-08-18 宇龙计算机通信科技(深圳)有限公司 语音处理方法、语音处理装置和终端
CN104601825A (zh) * 2015-02-16 2015-05-06 联想(北京)有限公司 一种控制方法及装置

Also Published As

Publication number Publication date
US20150325252A1 (en) 2015-11-12
KR20150032562A (ko) 2015-03-26
CN103514876A (zh) 2014-01-15

Similar Documents

Publication Publication Date Title
CN105027541B (zh) 基于内容的噪声抑制
US8972251B2 (en) Generating a masking signal on an electronic device
WO2014000658A1 (zh) 消除噪音的方法和装置、以及移动终端
US8983844B1 (en) Transmission of noise parameters for improving automatic speech recognition
US9704478B1 (en) Audio output masking for improved automatic speech recognition
US9202455B2 (en) Systems, methods, apparatus, and computer program products for enhanced active noise cancellation
US8824666B2 (en) Noise cancellation for phone conversation
WO2015184893A1 (zh) 移动终端通话语音降噪方法及装置
CN107240405B (zh) 一种音箱及告警方法
JP2015135494A (ja) 音声認識方法及び装置
WO2014117722A1 (zh) 语音处理方法、装置及终端设备
CN110708625A (zh) 基于智能终端的环境声抑制与增强可调节耳机系统与方法
CN111883182B (zh) 人声检测方法、装置、设备及存储介质
WO2022135340A1 (zh) 一种主动降噪的方法、设备及系统
WO2014154057A1 (zh) 用户语音通话预警方法、装置及计算机存储介质
KR20100068188A (ko) 신호 분리 방법, 상기 신호 분리 방법을 이용한 통신 시스템 및 음성인식시스템
CN107026950A (zh) 一种频域自适应回声消除方法
JP2019184809A (ja) 音声認識装置、音声認識方法
CN113176870B (zh) 音量调整方法、装置、电子设备及存储介质
US11386911B1 (en) Dereverberation and noise reduction
CN103370741A (zh) 处理音频信号
CN112133324A (zh) 通话状态检测方法、装置、计算机系统和介质
GB2516208B (en) Noise reduction in voice communications
CN104078049B (zh) 信号处理设备和信号处理方法
WO2021150647A1 (en) System and method for data analytics for communications in walkie-talkie network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13808541

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14410602

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20157001736

Country of ref document: KR

Kind code of ref document: A

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205N DATED 29-05-2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13808541

Country of ref document: EP

Kind code of ref document: A1