CN110970020A - Method for extracting effective voice signal by using voiceprint - Google Patents
Method for extracting effective voice signal by using voiceprint Download PDFInfo
- Publication number
- CN110970020A CN110970020A CN201811149356.6A CN201811149356A CN110970020A CN 110970020 A CN110970020 A CN 110970020A CN 201811149356 A CN201811149356 A CN 201811149356A CN 110970020 A CN110970020 A CN 110970020A
- Authority
- CN
- China
- Prior art keywords
- voice
- voiceprint
- signal
- recognition
- awakening
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Abstract
Step 1. under the state that the apparatus is not woken up, discern and wake up the word at first, analyze and record the vocal print characteristic which wakes up the word; step 2, after the equipment enters an awakening state, collecting environmental voice and carrying out voiceprint recognition processing on the collected voice signals of the person, wherein the recognition processing process comprises the following steps: reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to; and the voice signal after the voiceprint recognition processing enters the next step for continuous recognition. According to the voice recognition method and device, the voice signal of the control user can be accurately found out from the peripheral complex sound environment when the control command word is recognized by extracting the sound source of the control user and other sound sources of the voiceprint feature recognition device of the awakening word, and the recognition accuracy rate of the voice control command in the complex sound environment is improved.
Description
Technical Field
The invention belongs to the field of artificial intelligence, relates to a voice recognition technology, and particularly relates to a method for extracting an effective voice signal by utilizing voiceprint.
Background
In recent years, the related technology of intelligent voice recognition is deeply and widely applied in the fields of artificial intelligence, intelligent hardware, wearable equipment, unmanned driving and the like, so that the problem that both hands and eyes are not far away is really solved; however, the ideal speech recognition distance technology expected by consumers still has a certain gap, especially the complexity and noise of the speech application environment are easy to cause the misoperation of the equipment, the current speech recognition equipment adopts the speech recognition control method that the user firstly speaks the awakening word, awakens the equipment, then speaks the speech control command word or sentence to the equipment, and the equipment executes the corresponding function after speech recognition. In practical application, the device is required to extract the voice signal of the user through various noise reduction technologies, and then identify and determine the corresponding function. Because the environment of the equipment is often more complicated, for example, in a living room or an automobile, a user may have other people speaking at the same time when controlling the equipment, so that the traditional voice noise reduction suppresses the environmental noise according to the characteristics of sound, but the same voice suppression effect is poor, the effective voice of the user is controlled to be mixed in the voice of the surrounding people, the effective voice is difficult to be accurately extracted, the voice recognition effect is poor, and the experience of the user is reduced.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses a method for extracting an effective voice signal by utilizing voiceprint.
The invention relates to a method for extracting effective voice signals by utilizing voiceprints, which comprises the following steps:
step 1: under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;
step 2: after the equipment enters the awakening state, the environmental voice is collected, and the collected voice sound signals of people are subjected to voiceprint recognition processing, wherein the recognition processing process comprises the following steps:
reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;
and the voice signal after the voiceprint processing identification enters the next step for continuous identification.
Preferably, in the identification processing, the sound signal is divided into a plurality of sub-signals according to the spatial direction of the sound source, and voiceprint identification is performed on each sub-signal.
Further, the device is provided with a plurality of microphones, and the specific way of performing sound source identification on each sub-signal is as follows: the sound sources with different azimuth angles are collected by adopting a directional pickup method and a beam forming method of a plurality of microphones.
Preferably, the suppression is a digital attenuation of the non-target signal.
According to the voice control command recognition method and device, the voice signal of the control user can be accurately found out from the peripheral complex sound environment when the control command word is recognized by extracting the voice print characteristic of the awakening word from the control user sound source and other sound sources of the screening device, so that the recognition accuracy rate of the voice control command in the complex sound environment is improved, the hardware cost is not increased, and the voice control command recognition method and device have the advantages of being obvious in effect, convenient and easy to use.
Drawings
Fig. 1 is a flow chart illustrating an embodiment of the method for extracting an effective speech signal using voiceprint according to the present invention.
Detailed Description
The following provides a more detailed description of the present invention.
The invention relates to a method for extracting effective voice signals by utilizing voiceprints, which comprises the following steps:
step 1: under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;
step 2: after the equipment enters the awakening state, the environmental voice is collected, and the collected voice sound signals of people are subjected to voiceprint recognition processing, wherein the recognition processing process comprises the following steps:
reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;
and the voice signal after the voiceprint processing identification enters the next step for continuous identification.
The basic principle of speech recognition is that after a sound signal of a command word is collected by a microphone and converted into an electrical signal, the electrical signal is decoded and calculated with a stored data model, and the sound signal is recognized through calculation and a command corresponding to the sound signal is called to perform corresponding operation on equipment.
The awakening word is a special command word for starting a voice recognition module in the voice recognition equipment, the voice recognition equipment is in a conventional standby working state before being awakened by the awakening word, and the voice recognition module usually does not respond to other command words before being awakened except for responding to the awakening word.
After the control user sends out the awakening word, the equipment detects that the awakening word enters an awakening state, and calculates and records the voiceprint of the awakening word.
The device enters an awake state, and under a more complex voice environment, the device is likely to receive the voice of the control user, the voice of other personnel and the environmental noise at the same time. Thus, after determining the sound source for controlling the user, other sound sources and ambient noise not conforming to the voiceprint of the wake-up word are suppressed. The method for suppressing other sound sources and environmental noise can adopt a digital attenuation mode, specifically: the electric signals of the rest sound signals which contain other sound sources and environmental noise and are extracted from the sound source of the control user are directly attenuated to signals which are different from the energy before attenuation by more than orders of magnitude by using an attenuation algorithm, so that the signals cannot cause interference influence on the sound source of the control user.
In actual operation, the sound signals can be collected at one time aiming at the full space direction, and the method is applied to processing. Meanwhile, in consideration of the complexity of the environment, for equipment adopting two or more microphones, a mode of partitioning the spatial direction according to angles and traversing each partition for respective identification can be adopted, namely, a sound signal is divided into a plurality of sub-signals related to the spatial direction angles according to the spatial direction of a sound source, and voiceprint identification is carried out on each sub-signal. Preferably, the initial sound production direction of the awakening word or the sound production direction of the last control user can be identified, subsequent voice identification is carried out if the identification is in accordance with the voice production direction, the distance is selected according to the condition that the identification is not in accordance with the voice production direction, and the sound production direction of the last control user is identified from near to far to each direction. This allows for greater accuracy and immediacy.
When the equipment is provided with two or more microphones, the microphone array can be adopted for directional pickup, the directional pickup is to pick up target signals in mixed signals according to the direction of sound sources, namely, only the sound signals transmitted from a specific direction are picked up, and the noise and interference signals in other directions are not picked up or shielded, so that the effect of enhancing the target voice is achieved. The method comprises the steps of collecting sound sources with different azimuth angles by adopting a directional sound collection method through beam forming of a plurality of microphones, and carrying out weighted synthesis on sound collection signals of the microphones to obtain audio signals in a certain direction.
The foregoing is a description of preferred embodiments of the present invention, and the preferred embodiments in the preferred embodiments may be combined and combined in any combination, if not obviously contradictory or prerequisite to a certain preferred embodiment, and the specific parameters in the examples and the embodiments are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the patent protection scope of the present invention, which is defined by the claims and the equivalent structural changes made by the content of the description of the present invention are also included in the protection scope of the present invention.
Claims (4)
1. A method for extracting a valid speech signal using voiceprints, comprising the steps of:
step 1, under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;
step 2, after the equipment enters an awakening state, collecting environmental voice and carrying out voiceprint recognition processing on the collected voice signals of the person, wherein the recognition processing process comprises the following steps:
reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;
and the voice signal after the voiceprint processing identification enters the next step for continuous identification.
2. The method of claim 1, wherein the voice recognition process divides the voice signal into a plurality of sub-signals according to the spatial orientation of the sound source, and performs voice print recognition on each sub-signal.
3. The method of claim 2, wherein the device has a plurality of microphones, and the sound source recognition is performed on each sub-signal by using directional sound pickup to acquire sound sources at different azimuth angles by means of beamforming with the plurality of microphones.
4. The method for extracting a valid speech signal using voiceprints according to claim 1 wherein said suppressing is a digital attenuation of non-target signals.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811149356.6A CN110970020A (en) | 2018-09-29 | 2018-09-29 | Method for extracting effective voice signal by using voiceprint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811149356.6A CN110970020A (en) | 2018-09-29 | 2018-09-29 | Method for extracting effective voice signal by using voiceprint |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110970020A true CN110970020A (en) | 2020-04-07 |
Family
ID=70028074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811149356.6A Pending CN110970020A (en) | 2018-09-29 | 2018-09-29 | Method for extracting effective voice signal by using voiceprint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110970020A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112017649A (en) * | 2020-09-02 | 2020-12-01 | 上海仙视电子科技有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN112770224A (en) * | 2020-12-30 | 2021-05-07 | 上海移远通信技术股份有限公司 | In-vehicle sound source acquisition system and method |
CN113921016A (en) * | 2021-10-15 | 2022-01-11 | 阿波罗智联(北京)科技有限公司 | Voice processing method, device, electronic equipment and storage medium |
WO2024051199A1 (en) * | 2022-09-09 | 2024-03-14 | 青岛海尔空调器有限总公司 | Method and apparatus for controlling voice control device, and device for controlling voice control |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160035349A1 (en) * | 2014-07-29 | 2016-02-04 | Samsung Electronics Co., Ltd. | Electronic apparatus and method of speech recognition thereof |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN108062949A (en) * | 2017-12-11 | 2018-05-22 | 广州朗国电子科技有限公司 | The method and device of voice control treadmill |
CN108159702A (en) * | 2017-12-06 | 2018-06-15 | 广东欧珀移动通信有限公司 | Based on multi-person speech game processing method and device |
CN108447471A (en) * | 2017-02-15 | 2018-08-24 | 腾讯科技(深圳)有限公司 | Audio recognition method and speech recognition equipment |
-
2018
- 2018-09-29 CN CN201811149356.6A patent/CN110970020A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160035349A1 (en) * | 2014-07-29 | 2016-02-04 | Samsung Electronics Co., Ltd. | Electronic apparatus and method of speech recognition thereof |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN108447471A (en) * | 2017-02-15 | 2018-08-24 | 腾讯科技(深圳)有限公司 | Audio recognition method and speech recognition equipment |
CN108159702A (en) * | 2017-12-06 | 2018-06-15 | 广东欧珀移动通信有限公司 | Based on multi-person speech game processing method and device |
CN108062949A (en) * | 2017-12-11 | 2018-05-22 | 广州朗国电子科技有限公司 | The method and device of voice control treadmill |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112017649A (en) * | 2020-09-02 | 2020-12-01 | 上海仙视电子科技有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN112770224A (en) * | 2020-12-30 | 2021-05-07 | 上海移远通信技术股份有限公司 | In-vehicle sound source acquisition system and method |
CN112770224B (en) * | 2020-12-30 | 2022-07-05 | 上海移远通信技术股份有限公司 | In-vehicle sound source acquisition system and method |
CN113921016A (en) * | 2021-10-15 | 2022-01-11 | 阿波罗智联(北京)科技有限公司 | Voice processing method, device, electronic equipment and storage medium |
WO2024051199A1 (en) * | 2022-09-09 | 2024-03-14 | 青岛海尔空调器有限总公司 | Method and apparatus for controlling voice control device, and device for controlling voice control |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110970020A (en) | Method for extracting effective voice signal by using voiceprint | |
CN108597496B (en) | Voice generation method and device based on generation type countermeasure network | |
CN106251874B (en) | A kind of voice gate inhibition and quiet environment monitoring method and system | |
CN110310623B (en) | Sample generation method, model training method, device, medium, and electronic apparatus | |
CN102298443B (en) | Smart home voice control system combined with video channel and control method thereof | |
EP3923273B1 (en) | Voice recognition method and device, storage medium, and air conditioner | |
WO2016150001A1 (en) | Speech recognition method, device and computer storage medium | |
CN108597505B (en) | Voice recognition method and device and terminal equipment | |
CN110232933B (en) | Audio detection method and device, storage medium and electronic equipment | |
CN106599866A (en) | Multidimensional user identity identification method | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
CN109147763B (en) | Audio and video keyword identification method and device based on neural network and inverse entropy weighting | |
CN109272991B (en) | Voice interaction method, device, equipment and computer-readable storage medium | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
CN109887496A (en) | Orientation confrontation audio generation method and system under a kind of black box scene | |
CN108091340B (en) | Voiceprint recognition method, voiceprint recognition system, and computer-readable storage medium | |
CN110689887B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN109215634A (en) | A kind of method and its system of more word voice control on-off systems | |
CN111326152A (en) | Voice control method and device | |
CN111489763A (en) | Adaptive method for speaker recognition in complex environment based on GMM model | |
CN109065026B (en) | Recording control method and device | |
CN110946554A (en) | Cough type identification method, device and system | |
CN112420056A (en) | Speaker identity authentication method and system based on variational self-encoder and unmanned aerial vehicle | |
CN112420063A (en) | Voice enhancement method and device | |
CN111192569B (en) | Double-microphone voice feature extraction method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200407 |