CN110970020A

CN110970020A - Method for extracting effective voice signal by using voiceprint

Info

Publication number: CN110970020A
Application number: CN201811149356.6A
Authority: CN
Inventors: 何云鹏; 高君效; 张来; 刘兵; 余杰
Original assignee: Chipintelli Technology Co Ltd
Current assignee: Chipintelli Technology Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2020-04-07

Abstract

Step 1. under the state that the apparatus is not woken up, discern and wake up the word at first, analyze and record the vocal print characteristic which wakes up the word; step 2, after the equipment enters an awakening state, collecting environmental voice and carrying out voiceprint recognition processing on the collected voice signals of the person, wherein the recognition processing process comprises the following steps: reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to; and the voice signal after the voiceprint recognition processing enters the next step for continuous recognition. According to the voice recognition method and device, the voice signal of the control user can be accurately found out from the peripheral complex sound environment when the control command word is recognized by extracting the sound source of the control user and other sound sources of the voiceprint feature recognition device of the awakening word, and the recognition accuracy rate of the voice control command in the complex sound environment is improved.

Description

Method for extracting effective voice signal by using voiceprint

Technical Field

The invention belongs to the field of artificial intelligence, relates to a voice recognition technology, and particularly relates to a method for extracting an effective voice signal by utilizing voiceprint.

Background

In recent years, the related technology of intelligent voice recognition is deeply and widely applied in the fields of artificial intelligence, intelligent hardware, wearable equipment, unmanned driving and the like, so that the problem that both hands and eyes are not far away is really solved; however, the ideal speech recognition distance technology expected by consumers still has a certain gap, especially the complexity and noise of the speech application environment are easy to cause the misoperation of the equipment, the current speech recognition equipment adopts the speech recognition control method that the user firstly speaks the awakening word, awakens the equipment, then speaks the speech control command word or sentence to the equipment, and the equipment executes the corresponding function after speech recognition. In practical application, the device is required to extract the voice signal of the user through various noise reduction technologies, and then identify and determine the corresponding function. Because the environment of the equipment is often more complicated, for example, in a living room or an automobile, a user may have other people speaking at the same time when controlling the equipment, so that the traditional voice noise reduction suppresses the environmental noise according to the characteristics of sound, but the same voice suppression effect is poor, the effective voice of the user is controlled to be mixed in the voice of the surrounding people, the effective voice is difficult to be accurately extracted, the voice recognition effect is poor, and the experience of the user is reduced.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention discloses a method for extracting an effective voice signal by utilizing voiceprint.

The invention relates to a method for extracting effective voice signals by utilizing voiceprints, which comprises the following steps:

step 1: under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;

step 2: after the equipment enters the awakening state, the environmental voice is collected, and the collected voice sound signals of people are subjected to voiceprint recognition processing, wherein the recognition processing process comprises the following steps:

reserving the target signal of which the recognized voiceprint information conforms to the voiceprint characteristics of the awakening words, and inhibiting the non-target signal of which the voiceprint characteristics of the awakening words do not conform to;

and the voice signal after the voiceprint processing identification enters the next step for continuous identification.

Preferably, in the identification processing, the sound signal is divided into a plurality of sub-signals according to the spatial direction of the sound source, and voiceprint identification is performed on each sub-signal.

Further, the device is provided with a plurality of microphones, and the specific way of performing sound source identification on each sub-signal is as follows: the sound sources with different azimuth angles are collected by adopting a directional pickup method and a beam forming method of a plurality of microphones.

Preferably, the suppression is a digital attenuation of the non-target signal.

According to the voice control command recognition method and device, the voice signal of the control user can be accurately found out from the peripheral complex sound environment when the control command word is recognized by extracting the voice print characteristic of the awakening word from the control user sound source and other sound sources of the screening device, so that the recognition accuracy rate of the voice control command in the complex sound environment is improved, the hardware cost is not increased, and the voice control command recognition method and device have the advantages of being obvious in effect, convenient and easy to use.

Drawings

Fig. 1 is a flow chart illustrating an embodiment of the method for extracting an effective speech signal using voiceprint according to the present invention.

Detailed Description

The following provides a more detailed description of the present invention.

The basic principle of speech recognition is that after a sound signal of a command word is collected by a microphone and converted into an electrical signal, the electrical signal is decoded and calculated with a stored data model, and the sound signal is recognized through calculation and a command corresponding to the sound signal is called to perform corresponding operation on equipment.

The awakening word is a special command word for starting a voice recognition module in the voice recognition equipment, the voice recognition equipment is in a conventional standby working state before being awakened by the awakening word, and the voice recognition module usually does not respond to other command words before being awakened except for responding to the awakening word.

After the control user sends out the awakening word, the equipment detects that the awakening word enters an awakening state, and calculates and records the voiceprint of the awakening word.

The device enters an awake state, and under a more complex voice environment, the device is likely to receive the voice of the control user, the voice of other personnel and the environmental noise at the same time. Thus, after determining the sound source for controlling the user, other sound sources and ambient noise not conforming to the voiceprint of the wake-up word are suppressed. The method for suppressing other sound sources and environmental noise can adopt a digital attenuation mode, specifically: the electric signals of the rest sound signals which contain other sound sources and environmental noise and are extracted from the sound source of the control user are directly attenuated to signals which are different from the energy before attenuation by more than orders of magnitude by using an attenuation algorithm, so that the signals cannot cause interference influence on the sound source of the control user.

In actual operation, the sound signals can be collected at one time aiming at the full space direction, and the method is applied to processing. Meanwhile, in consideration of the complexity of the environment, for equipment adopting two or more microphones, a mode of partitioning the spatial direction according to angles and traversing each partition for respective identification can be adopted, namely, a sound signal is divided into a plurality of sub-signals related to the spatial direction angles according to the spatial direction of a sound source, and voiceprint identification is carried out on each sub-signal. Preferably, the initial sound production direction of the awakening word or the sound production direction of the last control user can be identified, subsequent voice identification is carried out if the identification is in accordance with the voice production direction, the distance is selected according to the condition that the identification is not in accordance with the voice production direction, and the sound production direction of the last control user is identified from near to far to each direction. This allows for greater accuracy and immediacy.

When the equipment is provided with two or more microphones, the microphone array can be adopted for directional pickup, the directional pickup is to pick up target signals in mixed signals according to the direction of sound sources, namely, only the sound signals transmitted from a specific direction are picked up, and the noise and interference signals in other directions are not picked up or shielded, so that the effect of enhancing the target voice is achieved. The method comprises the steps of collecting sound sources with different azimuth angles by adopting a directional sound collection method through beam forming of a plurality of microphones, and carrying out weighted synthesis on sound collection signals of the microphones to obtain audio signals in a certain direction.

The foregoing is a description of preferred embodiments of the present invention, and the preferred embodiments in the preferred embodiments may be combined and combined in any combination, if not obviously contradictory or prerequisite to a certain preferred embodiment, and the specific parameters in the examples and the embodiments are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the patent protection scope of the present invention, which is defined by the claims and the equivalent structural changes made by the content of the description of the present invention are also included in the protection scope of the present invention.

Claims

1. A method for extracting a valid speech signal using voiceprints, comprising the steps of:

step 1, under the state that the equipment is not awakened, firstly identifying an awakening word, and analyzing and recording the voiceprint characteristics of the awakening word; the device enters a wake-up state;

step 2, after the equipment enters an awakening state, collecting environmental voice and carrying out voiceprint recognition processing on the collected voice signals of the person, wherein the recognition processing process comprises the following steps:

2. The method of claim 1, wherein the voice recognition process divides the voice signal into a plurality of sub-signals according to the spatial orientation of the sound source, and performs voice print recognition on each sub-signal.

3. The method of claim 2, wherein the device has a plurality of microphones, and the sound source recognition is performed on each sub-signal by using directional sound pickup to acquire sound sources at different azimuth angles by means of beamforming with the plurality of microphones.

4. The method for extracting a valid speech signal using voiceprints according to claim 1 wherein said suppressing is a digital attenuation of non-target signals.