WO2014117722A1 - 语音处理方法、装置及终端设备 - Google Patents

语音处理方法、装置及终端设备 Download PDF

Info

Publication number
WO2014117722A1
WO2014117722A1 PCT/CN2014/071621 CN2014071621W WO2014117722A1 WO 2014117722 A1 WO2014117722 A1 WO 2014117722A1 CN 2014071621 W CN2014071621 W CN 2014071621W WO 2014117722 A1 WO2014117722 A1 WO 2014117722A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
original
original voice
voiceprint
voiceprint information
Prior art date
Application number
PCT/CN2014/071621
Other languages
English (en)
French (fr)
Inventor
任艳辉
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2014117722A1 publication Critical patent/WO2014117722A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a voice processing method, apparatus, and terminal device. Background technique
  • an embodiment of the present invention provides a voice processing method, apparatus, and terminal device.
  • the technical solution is as follows:
  • an embodiment of the present invention provides a voice processing method, where the method includes: acquiring an original voice signal;
  • the method further includes: :
  • the method after determining whether the original voice signal includes a voice signal of a preset person, according to the voiceprint information of the original voice signal, the method also includes:
  • the voice signal including the preset person in the original voice signal When determining, according to the voiceprint information of the original voice signal, the voice signal including the preset person in the original voice signal, performing voiceprint filtering on the original voice signal according to the voiceprint information of the preset human voice signal, acquiring only And a second voice signal including the preset human voice signal.
  • the method after determining whether the original voice signal includes a voice signal of a preset person, according to the voiceprint information of the original voice signal, the method also includes:
  • determining whether the original voice signal includes a preset person's voice signal according to the voiceprint information of the original voice signal includes:
  • an embodiment of the present invention provides a voice processing device, where the device includes: a voice acquiring module, configured to acquire an original voice signal;
  • a voiceprint analysis module configured to perform voiceprint analysis processing on the original voice signal to obtain voiceprint information of the original voice signal
  • a determining module configured to determine, according to the voiceprint information of the original voice signal, whether the original voice signal includes a voice signal of a preset person
  • a first voice signal acquiring module configured to: when determining, according to the voiceprint information of the original voice signal, that the preset voice signal is included in the original voice signal, other than the preset voice signal in the original voice signal
  • the ambient noise signal is subjected to noise reduction processing to obtain a first voice signal.
  • a preset human voice signal acquisition module configured to acquire a preset human voice signal
  • the preset human voice signal analysis module is configured to perform voiceprint analysis processing on the preset human voice signal to obtain voiceprint information of the preset human voice signal.
  • the device further includes:
  • a voiceprint filtering module configured to: when determining, according to the voiceprint information of the original voice signal, the voice signal of the preset person in the original voice signal, according to the voiceprint information of the preset voice signal, the original voice signal Performing voiceprint filtering to obtain a second voice signal containing only the preset human voice signal.
  • a voice gain module configured to perform voice gain processing on the original voice signal to obtain a third voice signal when determining a voice signal of the preset voice signal in the original voice signal according to the voiceprint information of the original voice signal.
  • the voiceprint recognition module is configured to compare the voiceprint information of the original voice signal with the voiceprint information of the preset voice signal.
  • an embodiment of the present invention provides a terminal device, where the terminal device includes: a receiver, configured to acquire an original voice signal;
  • a processor configured to perform voiceprint analysis processing on the original voice signal, to acquire voiceprint information of the original voice signal;
  • the processor is further configured to determine, according to the voiceprint information of the original voice signal, whether the original voice signal includes a voice signal of a preset person;
  • the processor is further configured to: when determining, according to the voiceprint information of the original voice signal, that the preset voice signal is included in the original voice signal, ambient noise other than the preset human voice signal in the original voice signal The signal is subjected to noise reduction processing to obtain a first speech signal.
  • the voice processing method, device and terminal device provided by the embodiment of the present invention obtain the original voice signal; perform voiceprint analysis processing on the original voice signal, and acquire voiceprint information of the original voice signal; according to the original voice signal a voiceprint information, determining whether the original voice signal includes a voice signal of a preset person; and when determining, according to the voiceprint information of the original voice signal, that the original voice signal includes a preset person voice signal, the original voice signal
  • the ambient noise signal other than the preset human voice signal is subjected to noise reduction processing to acquire the first voice signal.
  • the voice clarity of the targeted group is improved, the purpose of the voice of a specific person is enhanced, and the voice call noise reduction is improved and improved. s level.
  • FIG. 1 is a flowchart of a voice processing method according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a voice processing method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a voice processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. detailed description
  • the terminal device points to a device for providing voice and/or data connectivity, including a wireless terminal or a wired terminal.
  • the wireless terminal can be a handheld device with wireless connectivity, or other processing device connected to the wireless modem, via the wireless access network and one or more core networks.
  • Mobile terminal for communication can be a mobile phone (or "cellular" phone) and a computer with a mobile terminal.
  • the wireless terminal can also be a portable, pocket, handheld, computer built-in or in-vehicle mobile device.
  • FIG. 1 is a flowchart of a voice processing method according to an embodiment of the present invention.
  • the executor of the embodiment is a terminal device. Referring to FIG. 1 , the embodiment specifically includes:
  • the original voice signal refers to a voice signal received by a microphone of the terminal device, and the voice signal has not been subjected to noise reduction, gain, and the like.
  • the original voice signal may also be a voice signal that is sent by the communication peer end of the terminal device through the voice channel and transmitted through the voice channel and received by the radio unit of the terminal device.
  • the above steps 102 and 103 can be regarded as a process of performing voiceprint recognition on the original voice signal to determine whether or not the voice signal of the preset person is included in the original voice signal.
  • Voiceprint analysis based on voice signals to obtain voiceprint information, and voiceprint recognition based on voiceprint information is a well-known technique in the prior art, and is mainly implemented by feature extraction techniques, pattern matching criteria, and model training techniques, and will not be described herein.
  • the ambient noise signal may be a signal of a person other than the preset person in the original voice signal, and the ambient noise signal may also include signals of other sound sources, such as background sounds in a person's conversation, etc., in the embodiment of the present invention, the environmental noise signal is not The specific composition is limited.
  • FIG. 2 is a flowchart of a voice processing method according to an embodiment of the present invention. Execution of this embodiment
  • the main body is a terminal device. Referring to FIG. 2, the embodiment specifically includes:
  • the terminal device acquires a preset human voice signal.
  • the terminal device user can record a preset human voice signal and save it on the terminal device; or, the terminal device user can obtain a preset human voice signal from the voice signal already stored on the terminal device.
  • the speech signal needs to have a certain length and a sound element to acquire the voiceprint information of the speech signal based on the speech signal.
  • the terminal device performs voiceprint analysis processing on the preset human voice signal, and acquires voiceprint information of the preset human voice signal.
  • the voiceprint information is a sound wave spectrum carrying speech information.
  • the voiceprint information is not only specific but also relatively stable. Therefore, the voiceprint information obtained by voiceprint analysis of the preset human voice signal can be used to uniquely identify a specific person. It is precisely because of this that the voice signal of a specific person in the voice signal can be determined through the comparison of voiceprint information, so that the voice signal of a specific person can be processed in a targeted manner.
  • the processing manner including the preset human voice signal may also be set, and when the acquired original voice signal includes the voice corresponding to the preset human voice signal, When the signal is processed, the original speech signal is processed according to the processing mode of the setting.
  • the processing manner may include: but is not limited to: performing noise reduction on a voice signal other than the preset human voice signal in the original voice signal to obtain a first voice signal; performing voiceprint filtering on the obtained first voice signal to obtain a second voice Signal; performing voice enhancement on the preset human voice signal in the second voice signal.
  • the noise reduction, voiceprint filtering and speech enhancement can be achieved by adjusting the processing parameters.
  • the terminal device acquires an original voice signal.
  • the original voice signal may be a voice signal received by the terminal device during the conversation, or may be a voice signal sent by the user of the communication peer device.
  • the original voice signal is that the terminal device receives the voice signal through the microphone, and the original voice signal is processed and sent to the communication peer device through the radio frequency unit.
  • the terminal device performs voiceprint analysis processing on the original voice signal, and acquires voiceprint information of the original voice signal.
  • the voiceprint analysis process is performed on the obtained original voice signal, and the voiceprint information of the original voice signal is obtained, and the preset voice signal may be included in the original voice signal to determine whether to follow the original voice signal.
  • Voiceprint filtering 205.
  • the terminal device compares the voiceprint information of the original voice signal with the voiceprint information of the preset voice signal.
  • the voiceprint information of the original voice signal includes voiceprint information of the preset human voice signal, determining that the original voice signal includes a preset human voice signal; when the voiceprint information of the original voice signal does not include the When the voiceprint information of the voice signal of the person is preset, determining that the original voice signal does not include the preset voice signal;
  • the predicted voice length may also be analyzed according to words and moods in the voice to determine a manner of processing the subsequent voice. If the duration of the voice is known by prediction, the acquired voice may not be processed for any duration, or the function of adjusting the parameter may be turned off, or the default parameter may be adjusted to reduce the intensity of voice processing, and reduce The power consumption during the voice processing phase.
  • the terminal device decreases an ambient noise signal other than the preset human voice signal in the original voice signal. Noise processing, obtaining a first voice signal;
  • the signal other than the preset voice signal in the original voice signal may be targeted to perform noise reduction processing, which may result in noise reduction processing.
  • the attenuation of the voice signal therefore, only the noise reduction processing of the signal other than the preset human voice signal can ensure that the preset human voice signal is not affected and maintain the originality to the greatest extent.
  • the first speech signal after the noise reduction process is subjected to processing such as gain processing and modulation and demodulation, and the processing such as the gain processing and the modulation and demodulation are both existing speech processing methods, which are not limited in the embodiment of the present invention.
  • the original voice signal includes at least the voice signals of the people A, B, and C, and since the mobile terminal pre-stores the voice signal of the person A, it can be known according to the voiceprint comparison that the voice signal of the A needs to be highlighted, and The speech signal is subjected to noise reduction processing.
  • the noise reduction process may also be performed on the original voice signal, and the first noise reduction parameter is applied to the preset voice signal in the original voice signal, and the original voice signal is preset.
  • the speech signal other than the human voice signal is subjected to noise reduction processing using the second noise reduction parameter.
  • the first noise reduction parameter is smaller than the second noise reduction parameter.
  • the first and second noise reduction parameters can be set by the technician at the time of development, or can be set by the user according to their own needs.
  • the terminal device performs voiceprint filtering on the first voice signal according to the voiceprint information of the preset voice signal, and acquires a second voice signal that only includes the preset human voice signal.
  • the signal other than the preset human voice signal in the first voice signal after the noise reduction may be filtered out to the maximum extent.
  • the interference of the ambient noise signal on the preset human voice signal is avoided. Therefore, the voiceprint filtering of the first voice signal according to the voiceprint information of the preset human voice signal can ensure that the preset human voice signal is not affected to the greatest extent, but at the maximum The degree of environmental noise is filtered out.
  • the first voice signal includes at least the voice signals of the personnel A, B, and C, and since the mobile terminal pre-stores the voice signal of the person A, most of the B, C, and other environments can be filtered according to the voiceprint filtering.
  • the noise signal is used to achieve the purpose of highlighting the A's voice signal.
  • the terminal device performs voice enhancement processing on the second voice signal to obtain a third voice signal.
  • the preset voice signal may be specifically performed on the original voice signal, and the voice enhancement process may further improve the preset personnel.
  • the quality of the voice signal therefore, only the voice enhancement processing of the signal other than the preset human voice signal can maximize the definition of the voice signal of the preset person.
  • the third voice signal includes at least a voice signal of the person A and some environmental noise signals, and in order to highlight the voice signal of the A, the voice signal of the voice signal of A is subjected to voice enhancement processing.
  • the voice enhancement processing may also be performed on the original voice signal as a whole, and the first gain parameter is applied to the voice signal of the preset voice signal in the original voice signal, and the preset voice is used in the original voice signal.
  • the speech signal other than the speech signal is subjected to speech enhancement processing using the second gain parameter.
  • the first gain parameter is greater than the second gain parameter.
  • the purpose of improving the speech definition of the preset person is achieved, and the purpose of enhancing the voice of a specific person is achieved.
  • the step 208 may further include: outputting the third voice signal.
  • the third voice signal may be transmitted to the communication peer end through the voice channel, and when the method is applied to the receiving end of the call process, the third voice signal may be passed through the speaker. Output.
  • the first voice signal is obtained only after the noise reduction process is performed according to the original voice signal, and then the voice signal is filtered according to the first voice signal to obtain the second voice signal, and according to the second voice signal.
  • the speech enhancement processing is performed by taking a third speech signal as an example. And in The embodiment of the present invention may also be performed in any of the following manners: (1) performing any one of noise reduction processing, voiceprint filtering or voice enhancement on the original voice signal to obtain the processed voice signal; (2) The speech signal performs any two of noise reduction processing, voiceprint filtering or speech enhancement to obtain a processed speech signal, and the two processings are sequential processing, and the specific order is not limited. (3) The original speech signal is subjected to noise reduction processing, voiceprint filtering, and voice enhancement, and the order of the processing sequence is not limited.
  • the technical solution provided by the embodiment of the present invention can be applied to the sending end of the call process, and the original voice signal acquired by the local microphone is included and the preset person is obtained by the voiceprint recognition of the original voice signal acquired by the local microphone.
  • the voice signal corresponding to the voiceprint information of the voice signal is subjected to enhanced processing such as voiceprint filtering and voice enhancement for the preset voice signal, so that the voice clarity of the preset person in the voice received by the communication peer end of the call process is highlighted and recognized. High degree.
  • the technical solution provided by the embodiment of the present invention is also applicable to the receiving end of the call process, and the voice signal of the original voice signal received by the opposite end is used to learn that the received original voice signal includes the voice signal of the preset personnel.
  • the voice signal corresponding to the voiceprint information is subjected to enhanced processing such as voiceprint filtering and voice enhancement on the preset human voice signal, so that the voice clarity of the preset person in the voice received by the local end of the call process is prominent, and the recognition degree is high.
  • the acquired original voice signal does not include a voice signal that matches the voiceprint information of the preset human voice signal
  • functions such as voice noise reduction processing, voiceprint filtering, and voice enhancement may be turned off, and the acquisition is not performed.
  • the original speech signal is subjected to differential processing such as different noise reduction processing, voiceprint filtering, and voice enhancement, thereby reducing the power consumption of the terminal device.
  • FIG. 3 is a schematic structural diagram of a voice processing apparatus according to an embodiment of the present invention. Referring to Figure 3, the device includes:
  • a voice acquiring module 301 configured to acquire an original voice signal
  • the voiceprint analysis module 302 is configured to perform voiceprint analysis processing on the original voice signal to obtain voiceprint information of the original voice signal;
  • the determining module 303 is configured to determine, according to the voiceprint information of the original voice signal, whether the original voice signal includes a voice signal of a preset person;
  • the first voice signal acquiring module 304 is configured to determine, according to the voiceprint information of the original voice signal When the original voice signal includes the preset human voice signal, the ambient noise signal other than the preset human voice signal in the original voice signal is subjected to noise reduction processing to acquire the first voice signal.
  • the device further includes:
  • a preset human voice signal acquisition module configured to acquire a preset human voice signal
  • the preset human voice signal analysis module is configured to perform voiceprint analysis processing on the preset human voice signal to obtain voiceprint information of the preset human voice signal.
  • the device further includes:
  • a voiceprint filtering module configured to: when determining, according to the voiceprint information of the original voice signal, the voice signal of the preset person in the original voice signal, according to the voiceprint information of the preset voice signal, the original voice signal Performing voiceprint filtering to obtain a second voice signal containing only the preset human voice signal.
  • the device further includes:
  • a voice gain module configured to perform voice gain processing on the original voice signal to obtain a third voice signal when determining a voice signal of the preset voice signal in the original voice signal according to the voiceprint information of the original voice signal.
  • the voiceprint recognition module is configured to compare the voiceprint information of the original voice signal with the voiceprint information of the preset voice signal, where the voiceprint information of the original voice signal includes the preset voice signal of the person When the voiceprint information is used, it is determined that the original voice signal includes a preset human voice signal; when the voiceprint information of the original voice signal does not include the voiceprint information of the preset human voice signal, determining the original voice signal Does not include preset human voice signals.
  • the voice processing device provided by the foregoing embodiment is only illustrated by the division of each functional module. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the voice processing device and the voice processing method embodiment are provided in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
  • FIG. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
  • the final End devices include:
  • a receiver 401 configured to acquire an original voice signal
  • the processor 402 is configured to perform voiceprint analysis processing on the original voice signal to obtain voiceprint information of the original voice signal.
  • the processor 402 is further configured to determine, according to the voiceprint information of the original voice signal, whether the original voice signal includes a voice signal of a preset person;
  • the processor 402 is further configured to: when determining, according to the voiceprint information of the original voice signal, that the preset voice signal is included in the original voice signal, the environment other than the preset voice signal in the original voice signal The noise signal is subjected to noise reduction processing to obtain a first voice signal.
  • the receiver 401 is further configured to acquire a preset human voice signal
  • the processor 402 is further configured to perform voiceprint analysis processing on the preset human voice signal to obtain voiceprint information of the preset human voice signal.
  • the processor 402 is further configured to: when determining, according to the voiceprint information of the original voice signal, the voice signal of the preset person in the original voice signal, according to the voiceprint information of the preset person voice signal, Performing voiceprint filtering on the original voice signal to obtain a second voice signal that only includes the preset human voice signal.
  • the processor 402 is further configured to perform voice gain processing on the original voice signal when determining, according to the voiceprint information of the original voice signal, a voice signal that includes a preset person in the original voice signal.
  • the third voice signal is further configured to perform voice gain processing on the original voice signal when determining, according to the voiceprint information of the original voice signal, a voice signal that includes a preset person in the original voice signal.
  • the processor 402 is further configured to compare, according to the voiceprint information of the original voice signal and the voiceprint information of the preset voice signal, when the voiceprint information of the original voice signal includes the preset voice signal. Determining, in the voiceprint information, the preset voice signal in the original voice signal; and determining the original voice signal when the voiceprint information of the original voice signal does not include the voiceprint information of the preset human voice signal The preset human voice signal is not included.
  • the voice processing device further includes: a radio frequency circuit, an audio circuit, and a power circuit, where the radio frequency circuit is configured to establish communication between the mobile phone and the wireless network, and implement data receiving and sending by the mobile phone and the wireless network;
  • the audio circuit is configured to collect sound and convert the collected sound into sound data, so that the mobile phone sends the sound data to the wireless network through the radio frequency circuit, and/or pass the mobile phone through the radio frequency
  • the sound data received by the circuit from the wireless network is restored to sound and played to the user;
  • the power circuit is used to supply power to each circuit or device of the mobile phone to ensure normal operation of the mobile phone jobs.
  • the terminal device may be a mobile phone, a human-computer interaction terminal, an e-book or other terminal device having a voice recognition function.
  • the mobile phone further includes: a casing, a circuit board, a microphone, and a speaker to complete the basic functions of the mobile phone. The following describes the casing, the circuit board, the microphone, and the speaker separately:
  • the circuit board is disposed inside the outer casing.
  • the microphone is configured to collect sounds and convert the collected sounds into sound data, so that the mobile phone sends the sound data to the wireless network through the radio frequency circuit;
  • the speaker is configured to restore sound data received by the mobile phone from the wireless network through the radio frequency circuit to sound and play the sound to a user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

一种语音处理方法、装置及终端设备,属于通讯技术领域。所述方法包括:获取原始语音信号(101);对所述原始语音信号进行声纹分析处理,获取所述原始语音信号的声纹信息(102);根据所述原始语音信号的声纹信息,判断所述原始语音信号是否包括预设人员的语音信号(103);当根据所述原始语音信号的声纹信息确定所述原始语音信号中包括预设人员语音信号时,对所述原始语音信号中预设人员语音信号以外的环境噪音信号进行降噪处理,获取第一语音信号(104)。采用该技术方案,通过在语音处理过程中结合声纹识别技术,提高了针对性人群的语音清晰度,实现了对特定人员的语音进行加强的目的,完善并提高语音通话降噪的水平。

Description

语音处理方法、 装置及终端设备 技术领域
本发明涉及通讯技术领域, 特别涉及一种语音处理方法、 装置及终端设备。 背景技术
随着通讯技术的发展, 对通话过程中手机所获取的语音的处理一般包括降 噪处理、 增益处理等。 目前, 多数手机支持双麦克风(MIC ) 降噪或者单 MIC 的降噪技术和语音增益技术, 只要是在降噪范围以外的部分声音都会被削减, 不能针对某些特定人员进行语音加强。 发明内容
为了解决现有技术的问题, 本发明实施例提供了一种语音处理方法、 装置 及终端设备。 所述技术方案如下:
第一方面, 本发明实施例提供了一种语音处理方法, 所述方法包括: 获取原始语音信号;
对所述原始语音信号进行声紋分析处理, 获取所述原始语音信号的声紋信 息;
根据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包括预设 人员的语音信号;
当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预设人 员的语音信号时, 对所述原始语音信号中所述预设人员语音信号以外的环境噪 音信号进行降噪处理, 获取第一语音信号。 结合第一方面, 在本发明实施例的第一种可能实现方式中, 根据所述原始 语音信号的声紋信息, 判断所述原始语音信号是否包括预设人员的语音信号之 前, 所述方法还包括:
获取预设人员的语音信号;
对所述预设人员语音信号进行声紋分析处理, 获取所述预设人员语音信号 的声故信息。 结合第一种可能实现方式, 在本发明实施例的第二种可能实现方式中, 根 据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包括预设人员的 语音信号之后, 所述方法还包括:
当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预设人 员的语音信号时, 根据所述预设人员语音信号的声紋信息, 对所述原始语音信 号进行声紋滤波, 获取仅包含所述预设人员语音信号的第二语音信号。 结合第一种可能实现方式, 在本发明实施例的第三种可能实现方式中, 根 据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包括预设人员的 语音信号之后, 所述方法还包括:
当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预设人 员的语音信号时, 对所述原始语音信号进行语音增益处理, 获得第三语音信号。 结合第一种可能实现方式, 在本发明实施例的第四种可能实现方式中, 根 据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包括预设人员的 语音信号, 包括:
根据所述原始语音信号的声紋信息和预设人员语音信号的声紋信息进行比 较, 当所述原始语音信号的声紋信息包括所述预设人员语音信号的声紋信息时, 则确定所述原始语音信号中包括预设人员语音信号; 当所述原始语音信号的声 紋信息不包括所述预设人员语音信号的声紋信息时, 则确定所述原始语音信号 中不包括预设人员语音信号。 第二方面, 本发明实施例提供了一种语音处理装置, 所述装置包括: 语音获取模块, 用于获取原始语音信号;
声紋分析模块, 用于对所述原始语音信号进行声紋分析处理, 获取所述原 始语音信号的声紋信息;
判断模块, 用于根据所述原始语音信号的声紋信息, 判断所述原始语音信 号是否包括预设人员的语音信号;
第一语音信号获取模块, 用于当根据所述原始语音信号的声紋信息确定所 述原始语音信号中包括预设人员语音信号时, 对所述原始语音信号中所述预设 人员语音信号以外的环境噪音信号进行降噪处理, 获取第一语音信号。 结合第二方面, 在本发明实施例的第一种可能实现方式中, 所述装置还包 括:
预设人员语音信号获取模块, 用于获取预设人员语音信号;
预设人员语音信号分析模块, 用于对所述预设人员语音信号进行声紋分析 处理, 获取所述预设人员语音信号的声紋信息。 结合第一种可能实现方式, 在本发明实施例的第二种可能实现方式中, 所 述装置还包括:
声紋滤波模块, 用于当根据所述原始语音信号的声紋信息确定所述原始语 音信号中包括预设人员的语音信号时, 根据所述预设人员语音信号的声紋信息, 对所述原始语音信号进行声紋滤波, 获取仅包含所述预设人员语音信号的第二 语音信号。 结合第一种可能实现方式, 在本发明实施例的第三种可能实现方式中, 所 述装置还包括:
语音增益模块, 用于当根据所述原始语音信号的声紋信息确定所述原始语 音信号中包括预设人员的语音信号时, 对所述原始语音信号进行语音增益处理, 获得第三语音信号。 结合第一种可能实现方式, 在本发明实施例的第四种可能实现方式中, 所 述声紋识别模块用于根据所述原始语音信号的声紋信息和预设人员语音信号的 声紋信息进行比较, 当所述原始语音信号的声紋信息包括所述预设人员语音信 号的声紋信息时, 则确定所述原始语音信号中包括预设人员语音信号; 当所述 原始语音信号的声紋信息不包括所述预设人员语音信号的声紋信息时, 则确定 所述原始语音信号中不包括预设人员语音信号。 第三方面, 本发明实施例提供了一种终端设备, 所述终端设备包括: 接收器, 用于获取原始语音信号;
处理器, 用于对所述原始语音信号进行声紋分析处理, 获取所述原始语音 信号的声紋信息; 所述处理器还用于根据所述原始语音信号的声紋信息, 判断所述原始语音 信号是否包括预设人员的语音信号;
所述处理器还用于当根据所述原始语音信号的声紋信息确定所述原始语音 信号中包括预设人员语音信号时, 对所述原始语音信号中所述预设人员语音信 号以外的环境噪音信号进行降噪处理, 获取第一语音信号。
本发明实施例提供的一种语音处理方法、 装置及终端设备, 通过获取原始 语音信号; 对所述原始语音信号进行声紋分析处理, 获取所述原始语音信号的 声紋信息; 根据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包 括预设人员的语音信号; 当根据所述原始语音信号的声紋信息确定所述原始语 音信号中包括预设人员语音信号时, 对所述原始语音信号中所述预设人员语音 信号以外的环境噪音信号进行降噪处理, 获取第一语音信号。 釆用本发明实施 例的技术方案, 通过在语音处理过程中结合声紋识别技术, 提高了针对性人群 的语音清晰度, 实现了对特定人员的语音进行加强的目的, 完善并提高语音通 话降噪的水平。 附图说明
为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述中所 需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明 的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1是本发明实施例提供的一种语音处理方法的流程图;
图 2是本发明实施例提供的一种语音处理方法的流程图;
图 3是本发明实施例提供的一种语音处理装置的结构示意图;
图 4是本发明实施例提供的一种终端设备的结构示意图。 具体实施方式
为使本发明的目的、 技术方案和优点更加清楚, 下面将结合附图对本发明 实施方式作进一步地详细描述。
本发明实施例中, 终端设备指向用户提供语音和 /或数据连通性的设备, 包 括无线终端或有线终端。 无线终端可以是具有无线连接功能的手持式设备、 或 连接到无线调制解调器的其他处理设备, 经无线接入网与一个或多个核心网进 行通信的移动终端。 例如, 无线终端可以是移动电话 (或称为 "蜂窝" 电话) 和具有移动终端的计算机。 又如, 无线终端也可以是便携式、 袖珍式、 手持式、 计算机内置的或者车载的移动装置。
图 1 是本发明实施例提供的一种语音处理方法的流程图。 该实施例的执行 主体为终端设备, 参见图 1 , 该实施例具体包括:
101、 获取原始语音信号;
其中, 该原始语音信号是指终端设备的麦克风接收到的语音信号, 该语音 信号尚未经过降噪、 增益等处理。
在另一发明实施例中, 该原始语音信号还可以是该终端设备的通信对端通 过语音通道发送, 经过语音通道传输, 而由终端设备的射频单元接收到的语音 信号。
102、 对所述原始语音信号进行声紋分析处理, 获取所述原始语音信号的声 紋信息;
103、 根据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包括 预设人员的语音信号;
上述步骤 102和 103可以整体被看做是对原始语音信号进行声紋识别, 以 便确定该原始语音信号中是否包括预设人员的语音信号的过程。 根据语音信号 进行声紋分析以获取声紋信息, 并根据声紋信息进行声紋识别是现有技术中公 知的技术, 主要通过特征提取技术、 模式匹配准则及模型训练技术等实现, 在 此不再赘述。
104、 当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预 设人员语音信号时, 对所述原始语音信号中所述预设人员语音信号以外的环境 噪音信号进行降噪处理, 获取第一语音信号。
环境噪音信号可以是原始语音信号中预设人员以外人员的信号, 该环境噪 音信号中还可以包含其他音源的信号, 如人员对话中的背景音等, 在本发明实 施例中不对环境噪音信号的具体构成进行限定。
本发明实施例提供的方法, 通过在语音处理过程中结合声紋识别技术, 排 除了环境噪音信号对预设人员语音信号的干扰, 实现了对特定人员的语音进行 加强的目的, 完善并提高语音通话降噪的水平。 图 2是本发明实施例提供的一种语音处理方法的流程图。 该实施例的执行 主体为终端设备, 参见图 2 , 该实施例具体包括:
201、 终端设备获取预设人员语音信号;
终端设备用户可以录制一段预设人员语音信号, 保存在终端设备上; 或, 终端设备用户可以从已经存储在终端设备上的语音信号中, 获取预设人员语音 信号。
本领域技术人员可以获知, 优选地, 该语音信号需具有一定的长度和声音 元素, 以便根据该语音信号获取该语音信号的声紋信息。
202、 终端设备对所述预设人员语音信号进行声紋分析处理, 获取所述预设 人员语音信号的声紋信息;
声紋信息是携带言语信息的声波频谱, 声紋信息不仅具有特定性, 而且有 相对稳定性的特点, 因此, 通过对预设人员语音信号进行声紋分析得到的声紋 信息可以用于唯一确定某个特定人员, 也正是因为如此, 可以通过声紋信息的 比较, 可以确定语音信号中特定人员的语音信号, 从而有针对性的对特定人员 的语音信号进行处理。
需要说明的是, 在获取预设人员语音信号时, 还可以对包含该预设人员语 音信号的处理方式进行设置, 而当获取到的原始语音信号包括与所述预设人员 语音信号相符的语音信号时, 按照该设置的处理方式对原始语音信号进行处理。
优选地, 处理方式可以包括但不限于: 对原始语音信号中预设人员语音信 号以外的语音信号进行降噪, 得到第一语音信号; 对得到的第一语音信号进行 声紋滤波, 得到第二语音信号; 对第二语音信号中的预设人员语音信号进行语 音增强等。 该降噪、 声紋滤波和语音增强可以通过对处理参数的调整实现。
203、 终端设备获取原始语音信号;
该原始语音信号可以是终端设备在通话过程中麦克风接收到的语音信号, 还可以是通信对端设备用户发来的语音信号。
优选地, 该原始语音信号是终端设备通过麦克风接收到语音信号, 该原始 语音信号经过处理, 将通过射频单元发送至通信对端设备。
204、 终端设备对所述原始语音信号进行声紋分析处理, 获取所述原始语音 信号的声紋信息;
具体地, 对获取到的原始语音信号进行声紋分析处理, 获取所述原始语音 信号的声紋信息, 可以确定该原始语音信号中是否包括预设人员语音信号, 从 而确定是否对该原始语音信号进行后续的声紋滤波处理。 205、 终端设备根据所述原始语音信号的声紋信息和预设人员语音信号的声 紋信息进行比较;
当所述原始语音信号的声紋信息包括所述预设人员语音信号的声紋信息 时, 则确定所述原始语音信号中包括预设人员语音信号; 当所述原始语音信号 的声紋信息不包括所述预设人员语音信号的声紋信息时, 则确定所述原始语音 信号中不包括预设人员语音信号;
本领域技术人员可以获知, 根据不同语音信号的声紋信息进行比较, 以获 知语音信号是否包含同一人的语音为现有技术所公开, 在此不做赞述。
进一步地, 当所述获取到的语音包括与所述预设语音的声紋相符的第一语 音时, 还可以根据语音中的词语和语气等分析预测语音长度, 以确定对后续语 音的处理方式, 如果通过预测获知该语音的持续时长, 则可在该持续时长内不 对获取到的语音进行任何处理, 或是关闭该调整参数的功能, 或是调整默认参 数, 以降低对语音处理的力度, 减少了语音处理阶段的耗电量。
206、 当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预 设人员语音信号时, 终端设备对所述原始语音信号中所述预设人员语音信号以 外的环境噪音信号进行降噪处理, 获取第一语音信号;
具体地, 当确定所述原始语音信号中包括预设人员语音信号时, 可以有针 对性的对原始语音信号中该预设人员语音信号以外的信号进行降噪处理, 由于 降噪处理可能会导致语音信号的衰减, 因此, 仅对预设人员语音信号以外的信 号进行降噪处理, 可以最大程度的保证预设人员语音信号不受影响, 保持原始 度。 降噪处理后的第一语音信号还需进行增益处理和调制解调等处理, 该增益 处理和调制解调等处理均为现有的语音处理方法, 在本发明实施例中不对此进 行限定。
如, 该原始语音信号中至少包含人员 A、 B和 C的语音信号, 而由于移动 终端预存了人员 A的语音信号, 则根据声紋比较可以获知, 需要突出 A的语音 信号, 则对 A以外的语音信号进行降噪处理。
需要说明的是, 该降噪处理还可以是针对原始语音信号整体的, 对原始语 音信号中的预设人员语音信号, 应用第一降噪参数进行降噪处理, 而对原始语 音信号中预设人员语音信号以外的语音信号, 应用第二降噪参数进行降噪处理。 其中, 第一降噪参数小于第二降噪参数。 该第一和第二降噪参数均可以由技术 人员在开发时设置, 也可以由用户根据自身需求进行设置。 207、 终端设备根据所述预设人员语音信号的声紋信息, 对所述第一语音信 号进行声紋滤波, 获取仅包含所述预设人员语音信号的第二语音信号;
具体地, 当确定所述原始语音信号中包括预设人员语音信号时, 可以有针 对性的滤除掉降噪后的第一语音信号中该预设人员语音信号以外的信号, 以最 大程度的避免环境噪音信号对预设人员语音信号的干扰, 因此, 根据预设人员 语音信号的声紋信息对第一语音信号进行声紋滤波, 可以最大程度的保证预设 人员语音信号不受影响, 而在最大程度滤除掉环境噪音信号。
如, 该第一语音信号中至少包含人员 A、 B和 C降噪后的语音信号, 而由 于移动终端预存了人员 A的语音信号, 则根据声紋滤波可以滤除大部分 B、 C 以及其他环境噪音信号, 以达到突出 A的语音信号的目的。
208、终端设备对所述第二语音信号进行语音增强处理,获得第三语音信号。 具体地, 当确定所述原始语音信号中包括预设人员语音信号时, 可以有针 对性的对原始语音信号中该预设人员语音信号进行语音增强处理, 由于语音增 强处理可以进一步提高预设人员语音信号的质量, 因此, 仅对预设人员语音信 号以外的信号进行语音增强处理, 可以最大程度的提高预设人员语音信号的清 晰度。
如,该第三语音信号中至少包含人员 A的语音信号以及一些环境噪音信号, 为了突出 A的语音信号, 对 A的语音信号进行语音增强处理。
需要说明的是, 该语音增强处理还可以是针对原始语音信号整体的, 对原 始语音信号中的预设人员语音信号, 应用第一增益参数进行语音增强处理, 而 对原始语音信号中预设人员语音信号以外的语音信号, 应用第二增益参数进行 语音增强处理。 其中, 第一增益参数大于第二增益参数。
通过提高对预设人员语音信号进行语音增强所应用的放大增益参数, 达到 了有针对性的提高预设人员的语音清晰度的目的, 实现了对特定人员的语音进 行加强的目的。
该步骤 208之后还可以包括: 输出该第三语音信号。 当该方法应用于通话 过程的发送端时, 可以是将第三语音信号通过语音通道传输给通信对端, 而当 该方法应用于通话过程的接收端时, 可以是将第三语音信号通过扬声器输出。
另外, 在本发明实施例中, 仅是以根据原始语音信号经过降噪处理, 得到 第一语音信号, 再根据第一语音信号进行声紋滤波, 得到第二语音信号, 又根 据第二语音信号进行语音增强处理, 得到第三语音信号为例进行说明的。 而在 本发明实施例还可以釆用以下任一方式进行: ( 1 )对原始语音信号进行降噪处 理、 声紋滤波或语音增强中的任一项, 以得到处理后的语音信号; (2 )对原始 语音信号进行降噪处理、 声紋滤波或语音增强中的任两项, 以得到处理后的语 音信号, 且该两项处理为顺序处理, 其具体前后顺序不限。 (3 )对原始语音信 号进行降噪处理、 声紋滤波以及语音增强, 则该处理顺序的前后顺序不限。
本发明实施例提供的技术方案, 可应用于通话过程的发送端, 通过对本端 麦克风所获取到的原始语音信号的声紋识别, 获知本端麦克风获取到的原始语 音信号包含与所述预设人员语音信号的声紋信息相符的语音信号, 则对预设人 员语音信号进行声紋滤波、 语音增强等强化处理, 使得通话过程的通信对端接 收到的语音中的预设人员的语音清晰度突出, 辨识度高。
而本发明实施例提供的技术方案, 还可应用于通话过程的接收端, 通过对 接收到的对端的原始语音信号的声紋识别, 获知接收到的原始语音信号包含与 所述预设人员语音信号的声紋信息相符的语音信号, 则对预设人员语音信号进 行声紋滤波、 语音增强等强化处理, 使得通话过程本端接收到的语音中的预设 人员的语音清晰度突出, 辨识度高。
进一步地, 当所述获取到的原始语音信号不包括与所述预设人员语音信号 的声紋信息相符的语音信号时, 可以关闭语音降噪处理、 声紋滤波以及语音增 强等功能, 不对所述获取到的原始语音信号进行有区别的降噪处理、 声紋滤波 以及语音增强等区别处理, 而降低了终端设备的耗电量。
釆用本发明实施例的技术方案, 通过在语音处理过程中结合声紋识别技术, 提高了针对性人群的语音清晰度, 实现了对特定人员的语音进行加强的目的, 完善并提高语音通话降噪的水平。 图 3是本发明实施例提供的一种语音处理装置的结构示意图。 参见图 3 , 所 述装置包括:
语音获取模块 301 , 用于获取原始语音信号;
声紋分析模块 302 , 用于对所述原始语音信号进行声紋分析处理, 获取所述 原始语音信号的声紋信息;
判断模块 303 , 用于根据所述原始语音信号的声紋信息, 判断所述原始语音 信号是否包括预设人员的语音信号;
第一语音信号获取模块 304 ,用于当根据所述原始语音信号的声紋信息确定 所述原始语音信号中包括预设人员语音信号时, 对所述原始语音信号中所述预 设人员语音信号以外的环境噪音信号进行降噪处理, 获取第一语音信号。
可选地, 所述装置还包括:
预设人员语音信号获取模块, 用于获取预设人员语音信号;
预设人员语音信号分析模块, 用于对所述预设人员语音信号进行进行声紋 分析处理, 获取所述预设人员语音信号的声紋信息。
可选地, 所述装置还包括:
声紋滤波模块, 用于当根据所述原始语音信号的声紋信息确定所述原始语 音信号中包括预设人员的语音信号时, 根据所述预设人员语音信号的声紋信息, 对所述原始语音信号进行声紋滤波, 获取仅包含所述预设人员语音信号的第二 语音信号。
可选地, 所述装置还包括:
语音增益模块, 用于当根据所述原始语音信号的声紋信息确定所述原始语 音信号中包括预设人员的语音信号时, 对所述原始语音信号进行语音增益处理, 获得第三语音信号。
可选地, 所述声紋识别模块用于根据所述原始语音信号的声紋信息和预设 人员语音信号的声紋信息进行比较, 当所述原始语音信号的声紋信息包括所述 预设人员语音信号的声紋信息时, 则确定所述原始语音信号中包括预设人员语 音信号; 当所述原始语音信号的声紋信息不包括所述预设人员语音信号的声紋 信息时, 则确定所述原始语音信号中不包括预设人员语音信号。
需要说明的是: 上述实施例提供的语音处理装置在语音处理时, 仅以上述 各功能模块的划分进行举例说明, 实际应用中, 可以根据需要而将上述功能分 配由不同的功能模块完成, 即将设备的内部结构划分成不同的功能模块, 以完 成以上描述的全部或者部分功能。 另外, 上述实施例提供的语音处理装置与语 音处理方法实施例属于同一构思, 其具体实现过程详见方法实施例, 这里不再 赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过 硬件来完成, 也可以通过程序来指令相关的硬件完成, 所述的程序可以存储于 一种计算机可读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或 光盘等。
图 4是本发明实施例提供的一种终端设备的结构示意图。 参见图 4, 所述终 端设备包括:
接收器 401 , 用于获取原始语音信号;
处理器 402 , 用于对所述原始语音信号进行声紋分析处理, 获取所述原始语 音信号的声紋信息;
所述处理器 402还用于根据所述原始语音信号的声紋信息, 判断所述原始 语音信号是否包括预设人员的语音信号;
所述处理器 402还用于当根据所述原始语音信号的声紋信息确定所述原始 语音信号中包括预设人员语音信号时, 对所述原始语音信号中所述预设人员语 音信号以外的环境噪音信号进行降噪处理, 获取第一语音信号。
可选地, 所述接收器 401还用于获取预设人员语音信号;
所述处理器 402还用于对所述预设人员语音信号进行声紋分析处理, 获取 所述预设人员语音信号的声紋信息。
可选地, 所述处理器 402还用于当根据所述原始语音信号的声紋信息确定 所述原始语音信号中包括预设人员的语音信号时, 根据所述预设人员语音信号 的声紋信息, 对所述原始语音信号进行声紋滤波, 获取仅包含所述预设人员语 音信号的第二语音信号。
可选地, 所述处理器 402还用于当根据所述原始语音信号的声紋信息确定 所述原始语音信号中包括预设人员的语音信号时, 对所述原始语音信号进行语 音增益处理, 获得第三语音信号。
可选地, 所述处理器 402还用于根据所述原始语音信号的声紋信息和预设 人员语音信号的声紋信息进行比较, 当所述原始语音信号的声紋信息包括所述 预设人员语音信号的声紋信息时, 则确定所述原始语音信号中包括预设人员语 音信号; 当所述原始语音信号的声紋信息不包括所述预设人员语音信号的声紋 信息时, 则确定所述原始语音信号中不包括预设人员语音信号。
具体地, 该语音处理设备还包括: 射频电路、 音频电路和电源电路, 所述射频电路, 用于建立手机与无线网络的通信, 实现手机与无线网络的 数据接收和发送;
所述音频电路, 用于釆集声音并将釆集的声音转化为声音数据, 以便所述 手机通过所述射频电路向无线网络发送所述声音数据, 和 /或将所述手机通过所 述射频电路从无线网络接收的声音数据, 还原为声音并向用户播放该声音; 所述电源电路, 用于为所述手机的各个电路或器件供电, 保证手机的正常 工作。
上述终端设备可以为手机、 人机交互终端、 电子书或其他具有语音识别功 能的终端设备。 在终端设备为手机的情下, 该手机还包括: 外壳, 电路板、 麦 克风、 和扬声器以便完成手机的基本功能, 下面对外壳, 电路板、 麦克风、 和 扬声器分别进行介绍:
所述电路板安置在所述外壳内部。
所述麦克风, 用于釆集声音并将釆集的声音转化为声音数据, 以便所述手 机通过所述射频电路向无线网络发送所述声音数据;
所述扬声器, 用于将所述手机通过所述射频电路从无线网络接收的声音数 据, 还原为声音并向用户播放该声音。
以上所述仅为本发明的较佳实施例, 并不用以限制本发明, 凡在本发明的 精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的 保护范围之内。

Claims

权 利 要 求 书
1、 一种语音处理方法, 其特征在于, 所述方法包括:
获取原始语音信号;
对所述原始语音信号进行声紋分析处理 , 获取所述原始语音信号的声紋 信息;
根据所述原始语音信号的声紋信息, 判断所述原始语音信号是否包括预 设人员的语音信号;
当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预设 人员的语音信号时, 对所述原始语音信号中所述预设人员语音信号以外的环 境噪音信号进行降噪处理, 获取第一语音信号。
2、 根据权利要求 1所述的方法, 其特征在于, 根据所述原始语音信号的 声紋信息, 判断所述原始语音信号是否包括预设人员的语音信号之前, 所述 方法还包括:
获取预设人员的语音信号;
对所述预设人员语音信号进行声紋分析处理, 获取所述预设人员语音信 号的声紋信息。
3、 根据权利要求 2所述的方法, 其特征在于, 根据所述原始语音信号的 声紋信息, 判断所述原始语音信号是否包括预设人员的语音信号之后, 所述 方法还包括:
当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预设 人员的语音信号时, 根据所述预设人员语音信号的声紋信息, 对所述原始语 音信号进行声紋滤波, 获取仅包含所述预设人员语音信号的第二语音信号。
4、 根据权利要求 2所述的方法, 其特征在于, 根据所述原始语音信号的 声紋信息, 判断所述原始语音信号是否包括预设人员的语音信号之后, 所述 方法还包括:
当根据所述原始语音信号的声紋信息确定所述原始语音信号中包括预设 人员的语音信号时, 对所述原始语音信号进行语音增益处理, 获得第三语音 信号。
5、 根据权利要求 2所述的方法, 其特征在于, 根据所述原始语音信号的 声紋信息, 判断所述原始语音信号是否包括预设人员的语音信号, 包括: 根据所述原始语音信号的声紋信息和预设人员语音信号的声紋信息进行 比较, 当所述原始语音信号的声紋信息包括所述预设人员语音信号的声紋信 息时, 则确定所述原始语音信号中包括预设人员语音信号; 当所述原始语音 信号的声紋信息不包括所述预设人员语音信号的声紋信息时, 则确定所述原 始语音信号中不包括预设人员语音信号。
6、 一种语音处理装置, 其特征在于, 所述装置包括:
语音获取模块, 用于获取原始语音信号;
声紋分析模块, 用于对所述原始语音信号进行声紋分析处理, 获取所述 原始语音信号的声紋信息;
判断模块, 用于根据所述原始语音信号的声紋信息, 判断所述原始语音 信号是否包括预设人员的语音信号;
第一语音信号获取模块, 用于当根据所述原始语音信号的声紋信息确定 所述原始语音信号中包括预设人员语音信号时, 对所述原始语音信号中所述 预设人员语音信号以外的环境噪音信号进行降噪处理, 获取第一语音信号。
7、 根据权利要求 6所述的装置, 其特征在于, 所述装置还包括: 预设人员语音信号获取模块, 用于获取预设人员语音信号;
预设人员语音信号分析模块, 用于对所述预设人员语音信号进行声紋分 析处理, 获取所述预设人员语音信号的声紋信息。
8、 根据权利要求 7所述的装置, 其特征在于, 所述装置还包括: 声紋滤波模块, 用于当根据所述原始语音信号的声紋信息确定所述原始 语音信号中包括预设人员的语音信号时, 根据所述预设人员语音信号的声紋 信息, 对所述原始语音信号进行声紋滤波, 获取仅包含所述预设人员语音信 号的第二语音信号。
9、 根据权利要求 7所述的装置, 其特征在于, 所述装置还包括: 语音增益模块, 用于当根据所述原始语音信号的声紋信息确定所述原始 语音信号中包括预设人员的语音信号时, 对所述原始语音信号进行语音增益 处理, 获得第三语音信号。
10、 根据权利要求 7所述的装置, 其特征在于, 所述声紋识别模块用于 根据所述原始语音信号的声紋信息和预设人员语音信号的声紋信息进行比 较, 当所述原始语音信号的声紋信息包括所述预设人员语音信号的声紋信息 时, 则确定所述原始语音信号中包括预设人员语音信号; 当所述原始语音信 号的声紋信息不包括所述预设人员语音信号的声紋信息时, 则确定所述原始 语音信号中不包括预设人员语音信号。
11、 一种终端设备, 其特征在于, 所述终端设备包括:
接收器, 用于获取原始语音信号;
处理器, 用于对所述原始语音信号进行声紋分析处理, 获取所述原始语 音信号的声紋信息;
所述处理器还用于根据所述原始语音信号的声紋信息, 判断所述原始语 音信号是否包括预设人员的语音信号;
所述处理器还用于当根据所述原始语音信号的声紋信息确定所述原始语 音信号中包括预设人员语音信号时, 对所述原始语音信号中所述预设人员语 音信号以外的环境噪音信号进行降噪处理, 获取第一语音信号。
PCT/CN2014/071621 2013-01-30 2014-01-28 语音处理方法、装置及终端设备 WO2014117722A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310036167.9 2013-01-30
CN201310036167.9A CN103971696A (zh) 2013-01-30 2013-01-30 语音处理方法、装置及终端设备

Publications (1)

Publication Number Publication Date
WO2014117722A1 true WO2014117722A1 (zh) 2014-08-07

Family

ID=51241112

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/071621 WO2014117722A1 (zh) 2013-01-30 2014-01-28 语音处理方法、装置及终端设备

Country Status (2)

Country Link
CN (1) CN103971696A (zh)
WO (1) WO2014117722A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597500A (zh) * 2018-03-30 2018-09-28 四川斐讯信息技术有限公司 一种智能穿戴设备及基于智能穿戴设备的语音识别方法

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105374364B (zh) * 2014-08-25 2019-08-27 联想(北京)有限公司 信号处理方法及电子设备
CN104811559B (zh) * 2015-05-05 2018-11-20 上海青橙实业有限公司 降噪方法、通信方法及移动终端
CN106486130B (zh) * 2015-08-25 2020-03-31 百度在线网络技术(北京)有限公司 噪声消除、语音识别方法及装置
CN105448301B (zh) * 2015-11-30 2019-09-24 惠州Tcl移动通信有限公司 一种基于声纹识别的音频处理方法及系统
CN105719659A (zh) * 2016-02-03 2016-06-29 努比亚技术有限公司 基于声纹识别的录音文件分离方法及装置
CN105979084A (zh) * 2016-04-29 2016-09-28 维沃移动通信有限公司 一种语音通话处理方法及通信终端
CN106816155B (zh) * 2016-12-23 2020-04-24 维沃移动通信有限公司 一种提升语音传输信噪比的方法及装置
CN106920559B (zh) * 2017-03-02 2020-10-30 奇酷互联网络科技(深圳)有限公司 通话音的优化方法、装置及通话终端
CN107172256B (zh) * 2017-07-27 2020-05-05 Oppo广东移动通信有限公司 耳机通话自适应调整方法、装置、移动终端及存储介质
CN107979790A (zh) * 2017-11-28 2018-05-01 上海与德科技有限公司 一种通话降噪方法、装置、设备及介质
CN108520751A (zh) * 2018-03-30 2018-09-11 四川斐讯信息技术有限公司 一种语音智能识别设备及语音智能识别方法
CN109065066B (zh) * 2018-09-29 2020-03-31 广东小天才科技有限公司 一种通话控制方法、装置及设备
CN109087661A (zh) * 2018-10-23 2018-12-25 南昌努比亚技术有限公司 语音处理方法、装置、系统及可读存储介质
CN109272996B (zh) * 2018-11-09 2021-11-30 广州长嘉电子有限公司 一种降噪方法及系统
CN110265038B (zh) * 2019-06-28 2021-10-22 联想(北京)有限公司 一种处理方法及电子设备
CN112188019B (zh) * 2020-09-30 2021-10-22 联想(北京)有限公司 一种处理方法及电子设备
WO2022253003A1 (zh) * 2021-05-31 2022-12-08 华为技术有限公司 语音增强方法及相关设备
CN115482830B (zh) * 2021-05-31 2023-08-04 华为技术有限公司 语音增强方法及相关设备
CN113724692B (zh) * 2021-10-08 2023-07-14 广东电力信息科技有限公司 一种基于声纹特征的电话场景音频获取与抗干扰处理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3512398B2 (ja) * 2001-09-25 2004-03-29 独立行政法人電子航法研究所 音声処理装置
US20090157399A1 (en) * 2007-12-18 2009-06-18 Electronics And Telecommunications Research Institute Apparatus and method for evaluating performance of speech recognition
CN101472017A (zh) * 2007-12-27 2009-07-01 华为技术有限公司 实现会议电话通话的方法及网元设备
CN102270451A (zh) * 2011-08-18 2011-12-07 安徽科大讯飞信息科技股份有限公司 说话人识别方法及系统
CN102694891A (zh) * 2011-03-21 2012-09-26 鸿富锦精密工业(深圳)有限公司 通话噪音去除系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3512398B2 (ja) * 2001-09-25 2004-03-29 独立行政法人電子航法研究所 音声処理装置
US20090157399A1 (en) * 2007-12-18 2009-06-18 Electronics And Telecommunications Research Institute Apparatus and method for evaluating performance of speech recognition
CN101472017A (zh) * 2007-12-27 2009-07-01 华为技术有限公司 实现会议电话通话的方法及网元设备
CN102694891A (zh) * 2011-03-21 2012-09-26 鸿富锦精密工业(深圳)有限公司 通话噪音去除系统及方法
CN102270451A (zh) * 2011-08-18 2011-12-07 安徽科大讯飞信息科技股份有限公司 说话人识别方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597500A (zh) * 2018-03-30 2018-09-28 四川斐讯信息技术有限公司 一种智能穿戴设备及基于智能穿戴设备的语音识别方法

Also Published As

Publication number Publication date
CN103971696A (zh) 2014-08-06

Similar Documents

Publication Publication Date Title
WO2014117722A1 (zh) 语音处理方法、装置及终端设备
CN103650533B (zh) 在电子装置上产生掩蔽信号
US9756422B2 (en) Noise estimation in a mobile device using an external acoustic microphone signal
US10657945B2 (en) Noise control method and device
US11605372B2 (en) Time-based frequency tuning of analog-to-information feature extraction
US9294834B2 (en) Method and apparatus for reducing noise in voices of mobile terminal
CN107172256B (zh) 耳机通话自适应调整方法、装置、移动终端及存储介质
US9923535B2 (en) Noise control method and device
CN103903606B (zh) 一种噪声控制方法及设备
CN108198569A (zh) 一种音频处理方法、装置、设备及可读存储介质
CN108763901B (zh) 耳纹信息获取方法和装置、终端、耳机及可读存储介质
CN105657110B (zh) 语音通信的回声消除方法及装置
CN108494954B (zh) 语音通话数据检测方法、装置、存储介质及移动终端
CN110364156A (zh) 语音交互方法、系统、终端及可读存储介质
CN113542960B (zh) 音频信号处理方法、系统、装置、电子设备和存储介质
WO2019228329A1 (zh) 个人听力装置、外部声音处理装置及相关计算机程序产品
WO2015180249A1 (zh) 音频信号的消噪方法及系统
CN108172237A (zh) 语音通话数据处理方法、装置、存储介质及移动终端
CN207603881U (zh) 一种智能语音无线音箱
WO2022199405A1 (zh) 一种语音控制方法和装置
CN110191397B (zh) 一种降噪方法及蓝牙耳机
CN114333886A (zh) 音频处理方法、装置、电子设备及存储介质
CN110232909A (zh) 一种音频处理方法、装置、设备及可读存储介质
CN107370898B (zh) 铃音播放方法、终端及其存储介质
CN113921013A (zh) 语音增强方法、设备、系统以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14745804

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14745804

Country of ref document: EP

Kind code of ref document: A1