CN110970020A - Method for extracting effective voice signal by using voiceprint - Google Patents

Method for extracting effective voice signal by using voiceprint Download PDF

Info

Publication number
CN110970020A
CN110970020A CN201811149356.6A CN201811149356A CN110970020A CN 110970020 A CN110970020 A CN 110970020A CN 201811149356 A CN201811149356 A CN 201811149356A CN 110970020 A CN110970020 A CN 110970020A
Authority
CN
China
Prior art keywords
voice
voiceprint
signal
recognition
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811149356.6A
Other languages
Chinese (zh)
Inventor
何云鹏
高君效
张来
刘兵
余杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chipintelli Technology Co Ltd
Original Assignee
Chipintelli Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipintelli Technology Co Ltd filed Critical Chipintelli Technology Co Ltd
Priority to CN201811149356.6A priority Critical patent/CN110970020A/en
Publication of CN110970020A publication Critical patent/CN110970020A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

Step 1. under the state that the apparatus is not woken up, discern and wake up the word at first, analyze and record the vocal print characteristic which wakes up the word; step 2, after the equipment enters an awakening state, collecting environmental voice and carrying out voiceprint recognition processing on the collected voice signals of the person, wherein the recognition processing process comprises the following steps: reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to; and the voice signal after the voiceprint recognition processing enters the next step for continuous recognition. According to the voice recognition method and device, the voice signal of the control user can be accurately found out from the peripheral complex sound environment when the control command word is recognized by extracting the sound source of the control user and other sound sources of the voiceprint feature recognition device of the awakening word, and the recognition accuracy rate of the voice control command in the complex sound environment is improved.

Description

Method for extracting effective voice signal by using voiceprint
Technical Field
The invention belongs to the field of artificial intelligence, relates to a voice recognition technology, and particularly relates to a method for extracting an effective voice signal by utilizing voiceprint.
Background
In recent years, the related technology of intelligent voice recognition is deeply and widely applied in the fields of artificial intelligence, intelligent hardware, wearable equipment, unmanned driving and the like, so that the problem that both hands and eyes are not far away is really solved; however, the ideal speech recognition distance technology expected by consumers still has a certain gap, especially the complexity and noise of the speech application environment are easy to cause the misoperation of the equipment, the current speech recognition equipment adopts the speech recognition control method that the user firstly speaks the awakening word, awakens the equipment, then speaks the speech control command word or sentence to the equipment, and the equipment executes the corresponding function after speech recognition. In practical application, the device is required to extract the voice signal of the user through various noise reduction technologies, and then identify and determine the corresponding function. Because the environment of the equipment is often more complicated, for example, in a living room or an automobile, a user may have other people speaking at the same time when controlling the equipment, so that the traditional voice noise reduction suppresses the environmental noise according to the characteristics of sound, but the same voice suppression effect is poor, the effective voice of the user is controlled to be mixed in the voice of the surrounding people, the effective voice is difficult to be accurately extracted, the voice recognition effect is poor, and the experience of the user is reduced.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses a method for extracting an effective voice signal by utilizing voiceprint.
The invention relates to a method for extracting effective voice signals by utilizing voiceprints, which comprises the following steps:
step 1: under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;
step 2: after the equipment enters the awakening state, the environmental voice is collected, and the collected voice sound signals of people are subjected to voiceprint recognition processing, wherein the recognition processing process comprises the following steps:
reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;
and the voice signal after the voiceprint processing identification enters the next step for continuous identification.
Preferably, in the identification processing, the sound signal is divided into a plurality of sub-signals according to the spatial direction of the sound source, and voiceprint identification is performed on each sub-signal.
Further, the device is provided with a plurality of microphones, and the specific way of performing sound source identification on each sub-signal is as follows: the sound sources with different azimuth angles are collected by adopting a directional pickup method and a beam forming method of a plurality of microphones.
Preferably, the suppression is a digital attenuation of the non-target signal.
According to the voice control command recognition method and device, the voice signal of the control user can be accurately found out from the peripheral complex sound environment when the control command word is recognized by extracting the voice print characteristic of the awakening word from the control user sound source and other sound sources of the screening device, so that the recognition accuracy rate of the voice control command in the complex sound environment is improved, the hardware cost is not increased, and the voice control command recognition method and device have the advantages of being obvious in effect, convenient and easy to use.
Drawings
Fig. 1 is a flow chart illustrating an embodiment of the method for extracting an effective speech signal using voiceprint according to the present invention.
Detailed Description
The following provides a more detailed description of the present invention.
The invention relates to a method for extracting effective voice signals by utilizing voiceprints, which comprises the following steps:
step 1: under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;
step 2: after the equipment enters the awakening state, the environmental voice is collected, and the collected voice sound signals of people are subjected to voiceprint recognition processing, wherein the recognition processing process comprises the following steps:
reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;
and the voice signal after the voiceprint processing identification enters the next step for continuous identification.
The basic principle of speech recognition is that after a sound signal of a command word is collected by a microphone and converted into an electrical signal, the electrical signal is decoded and calculated with a stored data model, and the sound signal is recognized through calculation and a command corresponding to the sound signal is called to perform corresponding operation on equipment.
The awakening word is a special command word for starting a voice recognition module in the voice recognition equipment, the voice recognition equipment is in a conventional standby working state before being awakened by the awakening word, and the voice recognition module usually does not respond to other command words before being awakened except for responding to the awakening word.
After the control user sends out the awakening word, the equipment detects that the awakening word enters an awakening state, and calculates and records the voiceprint of the awakening word.
The device enters an awake state, and under a more complex voice environment, the device is likely to receive the voice of the control user, the voice of other personnel and the environmental noise at the same time. Thus, after determining the sound source for controlling the user, other sound sources and ambient noise not conforming to the voiceprint of the wake-up word are suppressed. The method for suppressing other sound sources and environmental noise can adopt a digital attenuation mode, specifically: the electric signals of the rest sound signals which contain other sound sources and environmental noise and are extracted from the sound source of the control user are directly attenuated to signals which are different from the energy before attenuation by more than orders of magnitude by using an attenuation algorithm, so that the signals cannot cause interference influence on the sound source of the control user.
In actual operation, the sound signals can be collected at one time aiming at the full space direction, and the method is applied to processing. Meanwhile, in consideration of the complexity of the environment, for equipment adopting two or more microphones, a mode of partitioning the spatial direction according to angles and traversing each partition for respective identification can be adopted, namely, a sound signal is divided into a plurality of sub-signals related to the spatial direction angles according to the spatial direction of a sound source, and voiceprint identification is carried out on each sub-signal. Preferably, the initial sound production direction of the awakening word or the sound production direction of the last control user can be identified, subsequent voice identification is carried out if the identification is in accordance with the voice production direction, the distance is selected according to the condition that the identification is not in accordance with the voice production direction, and the sound production direction of the last control user is identified from near to far to each direction. This allows for greater accuracy and immediacy.
When the equipment is provided with two or more microphones, the microphone array can be adopted for directional pickup, the directional pickup is to pick up target signals in mixed signals according to the direction of sound sources, namely, only the sound signals transmitted from a specific direction are picked up, and the noise and interference signals in other directions are not picked up or shielded, so that the effect of enhancing the target voice is achieved. The method comprises the steps of collecting sound sources with different azimuth angles by adopting a directional sound collection method through beam forming of a plurality of microphones, and carrying out weighted synthesis on sound collection signals of the microphones to obtain audio signals in a certain direction.
The foregoing is a description of preferred embodiments of the present invention, and the preferred embodiments in the preferred embodiments may be combined and combined in any combination, if not obviously contradictory or prerequisite to a certain preferred embodiment, and the specific parameters in the examples and the embodiments are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the patent protection scope of the present invention, which is defined by the claims and the equivalent structural changes made by the content of the description of the present invention are also included in the protection scope of the present invention.

Claims (4)

1. A method for extracting a valid speech signal using voiceprints, comprising the steps of:
step 1, under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;
step 2, after the equipment enters an awakening state, collecting environmental voice and carrying out voiceprint recognition processing on the collected voice signals of the person, wherein the recognition processing process comprises the following steps:
reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;
and the voice signal after the voiceprint processing identification enters the next step for continuous identification.
2. The method of claim 1, wherein the voice recognition process divides the voice signal into a plurality of sub-signals according to the spatial orientation of the sound source, and performs voice print recognition on each sub-signal.
3. The method of claim 2, wherein the device has a plurality of microphones, and the sound source recognition is performed on each sub-signal by using directional sound pickup to acquire sound sources at different azimuth angles by means of beamforming with the plurality of microphones.
4. The method for extracting a valid speech signal using voiceprints according to claim 1 wherein said suppressing is a digital attenuation of non-target signals.
CN201811149356.6A 2018-09-29 2018-09-29 Method for extracting effective voice signal by using voiceprint Pending CN110970020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811149356.6A CN110970020A (en) 2018-09-29 2018-09-29 Method for extracting effective voice signal by using voiceprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811149356.6A CN110970020A (en) 2018-09-29 2018-09-29 Method for extracting effective voice signal by using voiceprint

Publications (1)

Publication Number Publication Date
CN110970020A true CN110970020A (en) 2020-04-07

Family

ID=70028074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811149356.6A Pending CN110970020A (en) 2018-09-29 2018-09-29 Method for extracting effective voice signal by using voiceprint

Country Status (1)

Country Link
CN (1) CN110970020A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017649A (en) * 2020-09-02 2020-12-01 上海仙视电子科技有限公司 Audio processing method and device, electronic equipment and readable storage medium
CN112770224A (en) * 2020-12-30 2021-05-07 上海移远通信技术股份有限公司 In-vehicle sound source acquisition system and method
CN113921016A (en) * 2021-10-15 2022-01-11 阿波罗智联(北京)科技有限公司 Voice processing method, device, electronic equipment and storage medium
WO2024051199A1 (en) * 2022-09-09 2024-03-14 青岛海尔空调器有限总公司 Method and apparatus for controlling voice control device, and device for controlling voice control

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160035349A1 (en) * 2014-07-29 2016-02-04 Samsung Electronics Co., Ltd. Electronic apparatus and method of speech recognition thereof
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN108062949A (en) * 2017-12-11 2018-05-22 广州朗国电子科技有限公司 The method and device of voice control treadmill
CN108159702A (en) * 2017-12-06 2018-06-15 广东欧珀移动通信有限公司 Based on multi-person speech game processing method and device
CN108447471A (en) * 2017-02-15 2018-08-24 腾讯科技(深圳)有限公司 Audio recognition method and speech recognition equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160035349A1 (en) * 2014-07-29 2016-02-04 Samsung Electronics Co., Ltd. Electronic apparatus and method of speech recognition thereof
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN108447471A (en) * 2017-02-15 2018-08-24 腾讯科技(深圳)有限公司 Audio recognition method and speech recognition equipment
CN108159702A (en) * 2017-12-06 2018-06-15 广东欧珀移动通信有限公司 Based on multi-person speech game processing method and device
CN108062949A (en) * 2017-12-11 2018-05-22 广州朗国电子科技有限公司 The method and device of voice control treadmill

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017649A (en) * 2020-09-02 2020-12-01 上海仙视电子科技有限公司 Audio processing method and device, electronic equipment and readable storage medium
CN112770224A (en) * 2020-12-30 2021-05-07 上海移远通信技术股份有限公司 In-vehicle sound source acquisition system and method
CN112770224B (en) * 2020-12-30 2022-07-05 上海移远通信技术股份有限公司 In-vehicle sound source acquisition system and method
CN113921016A (en) * 2021-10-15 2022-01-11 阿波罗智联(北京)科技有限公司 Voice processing method, device, electronic equipment and storage medium
WO2024051199A1 (en) * 2022-09-09 2024-03-14 青岛海尔空调器有限总公司 Method and apparatus for controlling voice control device, and device for controlling voice control

Similar Documents

Publication Publication Date Title
CN110970020A (en) Method for extracting effective voice signal by using voiceprint
CN108597496B (en) Voice generation method and device based on generation type countermeasure network
CN106251874B (en) A kind of voice gate inhibition and quiet environment monitoring method and system
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
CN102298443B (en) Smart home voice control system combined with video channel and control method thereof
EP3923273B1 (en) Voice recognition method and device, storage medium, and air conditioner
WO2016150001A1 (en) Speech recognition method, device and computer storage medium
CN108597505B (en) Voice recognition method and device and terminal equipment
CN110232933B (en) Audio detection method and device, storage medium and electronic equipment
CN106599866A (en) Multidimensional user identity identification method
CN110570873B (en) Voiceprint wake-up method and device, computer equipment and storage medium
CN109147763B (en) Audio and video keyword identification method and device based on neural network and inverse entropy weighting
CN109272991B (en) Voice interaction method, device, equipment and computer-readable storage medium
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN109887496A (en) Orientation confrontation audio generation method and system under a kind of black box scene
CN108091340B (en) Voiceprint recognition method, voiceprint recognition system, and computer-readable storage medium
CN110689887B (en) Audio verification method and device, storage medium and electronic equipment
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
CN111326152A (en) Voice control method and device
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
CN109065026B (en) Recording control method and device
CN110946554A (en) Cough type identification method, device and system
CN112420056A (en) Speaker identity authentication method and system based on variational self-encoder and unmanned aerial vehicle
CN112420063A (en) Voice enhancement method and device
CN111192569B (en) Double-microphone voice feature extraction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407