CN112233656A

CN112233656A - Artificial intelligent voice awakening method

Info

Publication number: CN112233656A
Application number: CN202011075074.3A
Authority: CN
Inventors: 程松林
Original assignee: Anhui Fastcall Information Technology Co ltd
Current assignee: Anhui Fastcall Information Technology Co ltd
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2021-01-15

Abstract

The invention relates to voice awakening, in particular to an artificial intelligent voice awakening method, which comprises the steps of obtaining voice data, determining energy characteristics corresponding to the voice data by using a voice detection model, determining text data corresponding to the voice data according to the energy characteristics, judging whether the voice data contains awakening keywords or not, if the voice data contains the awakening keywords, utilizing an awakening judgment model to carry out awakening judgment, and if not, outputting a current situation maintaining instruction by the awakening judgment model; the technical scheme provided by the invention can effectively overcome the defects of low accuracy of voice awakening recognition and incapability of flexibly adjusting awakening words in the prior art.

Description

Artificial intelligent voice awakening method

Technical Field

The invention relates to voice awakening, in particular to an artificial intelligence voice awakening method.

Background

In an intelligent home or a voice interactive system, a voice wake-up (wakeword) technology is widely applied. However, the accuracy and the computation load of voice wakeup recognition greatly reduce the actual application experience, and the requirements on the hardware of the device are improved, for example: if the false wake-up rate of voice wake-up is greater than a threshold in the application, such wake-up false trigger frequency may cause the user to feel the discomfort; on the other hand, if the computation amount of voice wakeup exceeds the computation capability of some low-end chips, the voice wakeup technology is restricted to be used by many products.

In the existing related technologies, a keyword recognition (keyword-spotting) technology is used for voice awakening, a smart small decoding network is built by designing a deep neural network model, and a voice awakening function is realized by matching with some keyword detection technologies.

However, the above voice wake-up technology based on the keyword recognition method has a large number of model parameters, and the design of the filling words needs to be changed for different wake-up words, and the corresponding decoding parameters and the keyword detection technology need to be adjusted, so that it is difficult to have a uniform algorithm to ensure that the wake-up effect of each wake-up word is at a stable level.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects in the prior art, the invention provides an artificial intelligent voice awakening method which can effectively overcome the defects that the voice awakening recognition accuracy rate is low and the awakening words cannot be flexibly adjusted in the prior art.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

an artificial intelligence voice awakening method comprises the following steps:

s1, acquiring voice data;

s2, determining energy characteristics corresponding to the voice data by using the voice detection model, and determining text data corresponding to the voice data according to the energy characteristics;

s3, judging whether the voice data contains the awakening keyword;

and S4, if the voice data contains the awakening keyword, utilizing the awakening judgment model to perform awakening judgment, otherwise, outputting a current status maintaining instruction by the awakening judgment model.

Preferably, the determining the energy feature corresponding to the voice data by using the voice detection model includes:

preprocessing voice data, and determining energy values of filter banks in the voice data;

and processing the energy values of each filter bank in the voice data by using the trained voice detection model.

Preferably, the processing the energy values of the filter banks in the speech data by using the trained speech detection model includes:

determining the energy value of each filter group in a certain frame of voice data and corresponding smoothing parameters;

acquiring the energy value of each smoothed filter bank in the previous frame of voice data;

and determining the corresponding smoothed filter bank energy value in certain frame of voice data according to the filter bank energy value, the smoothed filter bank energy value and the smoothing parameter.

Preferably, the training step of the speech detection model comprises:

acquiring training voice data and initializing a voice detection model;

and training the voice detection model by using the training voice data, and determining the trained voice detection model and the corresponding smoothing parameters.

Preferably, the determining whether the voice data includes a wake-up keyword includes:

carrying out feature analysis on voice data of continuous multiple frames, and caching audio features;

comparing each frame of cached audio features with the awakening keywords by using a keyword comparison model, and determining the association degree between the frame of audio features and the awakening keywords;

and determining the confidence level of the voice data containing the awakening key words according to the association degree between each audio feature and the awakening key words.

Preferably, the keyword comparison model is trained first, and then the wake-up decision model is trained.

Preferably, the keyword comparison model is trained using a first speech data training sample set, where the first speech data training sample set includes a wake-up keyword.

Preferably, the awakening decision model is trained by using a second speech data training sample set and awakening keyword text data, and background noise in the second speech data training sample set is greater than background noise in the first speech data training sample set.

Preferably, the making of the wake decision by using the wake decision model includes:

and inputting the awakening keywords detected by the keyword comparison model and the text data in the S2 into the trained awakening judgment model, and comprehensively carrying out awakening judgment by the awakening judgment model according to the detection result of the keyword comparison model and the text data.

Preferably, if the wake-up decision model determines that both the detection result of the keyword comparison model and the text data contain the wake-up keyword, the wake-up decision model outputs a wake-up instruction; otherwise, the awakening judgment model outputs a current status maintaining instruction.

(III) advantageous effects

Compared with the prior art, the artificial intelligence voice awakening method provided by the invention has the advantages that the voice awakening identification accuracy can be effectively improved through the comparison detection among the voice data, the text data converted from the voice data and the awakening keywords, the voice detection model is constructed to convert the voice data into the text data, the keyword comparison model is constructed to judge whether the voice data contains the awakening keywords, the awakening judgment model is constructed to carry out awakening judgment, and the awakening keywords can be flexibly adjusted conveniently through setting the artificial intelligence model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic flow chart of voice wake-up according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An artificial intelligence voice awakening method, as shown in fig. 1, acquires voice data; and determining energy characteristics corresponding to the voice data by using the voice detection model, and determining text data corresponding to the voice data according to the energy characteristics.

In the technical scheme of the application, the training step of the voice detection model comprises the following steps:

acquiring training voice data and initializing a voice detection model;

The method for determining the energy characteristics corresponding to the voice data by using the voice detection model comprises the following steps:

The method for processing the energy values of the filter banks in the voice data by using the trained voice detection model comprises the following steps:

Judging whether the voice data contains a wake-up keyword or not; if the voice data contains the awakening keyword, the awakening judgment model is used for awakening judgment, and otherwise, the awakening judgment model outputs a current status maintaining instruction.

Wherein, judge whether contain the key word of awakening up in the voice data, include:

According to the technical scheme, the keyword comparison model is trained, and then the awakening judgment model is trained.

The keyword comparison model is trained by utilizing a first voice data training sample set, and the first voice data training sample set contains awakening keywords; and the awakening judgment model is trained by utilizing a second voice data training sample set and awakening keyword text data, wherein the background noise in the second voice data training sample set is greater than that in the first voice data training sample set.

Wherein, the awakening judgment is carried out by utilizing the awakening judgment model, and the awakening judgment comprises the following steps:

If the awakening judgment model judges that the detection result of the keyword comparison model and the text data both contain the awakening keyword, the awakening judgment model outputs an awakening instruction; otherwise, the awakening judgment model outputs a current status maintaining instruction.

According to the technical scheme, the accuracy of voice awakening recognition can be effectively improved through comparison detection between the text data converted from the voice data and the awakening keywords, and the voice awakening recognition device is provided with the voice detection model, the keyword comparison model and the awakening judgment model which can be independently trained, so that the awakening keywords can be flexibly adjusted.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An artificial intelligence voice awakening method is characterized in that: the method comprises the following steps:

s1, acquiring voice data;

s3, judging whether the voice data contains the awakening keyword;

2. The artificial intelligence voice wake-up method according to claim 1, characterized in that: the determining the energy characteristics corresponding to the voice data by using the voice detection model comprises the following steps:

3. The artificial intelligence voice wake-up method according to claim 2, characterized in that: the processing of the energy values of the filter banks in the speech data by using the trained speech detection model comprises:

4. An artificial intelligence voice wake-up method according to any one of claims 1-3, characterised in that: the training step of the voice detection model comprises the following steps:

acquiring training voice data and initializing a voice detection model;

5. The artificial intelligence voice wake-up method according to claim 1, characterized in that: the judging whether the voice data contains the awakening keyword includes:

6. The artificial intelligence voice wake-up method of claim 5, wherein: and training the keyword comparison model, and then training the awakening judgment model.

7. The artificial intelligence voice wake-up method of claim 6, wherein: the keyword comparison model is trained by using a first voice data training sample set, and the first voice data training sample set contains awakening keywords.

8. The artificial intelligence voice wake-up method of claim 7, wherein: the awakening judgment model is trained by utilizing a second voice data training sample set and awakening keyword text data, and background noise in the second voice data training sample set is larger than background noise in the first voice data training sample set.

9. The artificial intelligence voice wake-up method of claim 8, wherein: the method for performing wake-up judgment by using the wake-up judgment model comprises the following steps:

10. The artificial intelligence voice wake-up method of claim 9, wherein: if the awakening judgment model judges that the detection result of the keyword comparison model and the text data both contain the awakening keyword, the awakening judgment model outputs an awakening instruction; otherwise, the awakening judgment model outputs a current status maintaining instruction.