CN105702253A

CN105702253A - Voice awakening method and device

Info

Publication number: CN105702253A
Application number: CN201610009102.9A
Authority: CN
Inventors: 朱辉; 田伟; 李鹏
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Beijing Yunzhisheng Information Technology Co Ltd
Priority date: 2016-01-07
Filing date: 2016-01-07
Publication date: 2016-06-22

Abstract

The present invention discloses a voice awakening method and device used for improving the accuracy of utilizing the voice to awaken a terminal device. The method comprises the steps of when the terminal device receives the first voice datacontaining a preset awakening word inputted by a user, matching the first voice data and a preset language model to obtain the confidence of the first voice data; determining whether the confidence is less than a preset confidence threshold value;when the confidence is less than the preset confidence threshold value, executing a preset operation; when the confidence is greater than or equal to the preset confidence threshold value, awakening a voice control function of the terminal device. According to the technical scheme of the present invention, when the user utilizes the voice to awaken the terminal device unsuccessfully, the terminal device can execute the preset operation to improve the confidence of the first voice data, thereby improving the accuracy that the user utilizes the voice to awaken the terminal device and improving the user experience degree.

Description

A kind of voice awakening method and device

Technical field

The present invention relates to voice processing technology field, particularly relate to a kind of voice awakening method and device。

Background technology

Speech recognition technology achieved significant progress in recent years, and this technology has been enter into the every field such as industry, household electrical appliances, Smart Home。Namely voice wakes up is a kind of form of speech recognition technology, and it is not directly contacted with hardware device, can wake equipment up operation by voice。Generally, most equipment is all realize waking up or running of equipment by physical button。But, this is for Consumer's Experience and bad。Voice, as the most natural exchange way of people, wakes this contactless mode starting device up by voice and is undoubtedly more friendly。

Summary of the invention

The embodiment of the present invention provides a kind of voice awakening method and device, for improving the accuracy utilizing voice to wake terminal unit up。

A kind of voice awakening method, comprises the following steps:

When terminal unit receives when comprising default the first speech data waking word up of user's input, described first speech data and preset language model are mated, it is thus achieved that the confidence level of described first speech data；

Judge that whether described confidence level is less than pre-seting confidence threshold；

When described confidence level less than described pre-set confidence threshold time, perform predetermined registration operation；

When described confidence level more than or equal to described pre-set confidence threshold time, wake the voice control function of described terminal unit up。

Some beneficial effects of the embodiment of the present invention may include that

Technique scheme, it is determined by comprising the confidence level presetting the first speech data waking word up, and perform predetermined registration operation at this confidence level less than when pre-seting confidence threshold, simultaneously at this confidence level more than or equal to the voice control function waking terminal unit when pre-seting confidence threshold up, when making user utilize voice to wake terminal unit failure up, terminal unit can improve the confidence level of the first speech data by performing predetermined registration operation, utilizes voice to wake the accuracy of terminal unit and the Experience Degree of user up thus improving user。

In one embodiment, after described execution predetermined registration operation, described method also includes:

Exporting the first information, described first information is used for pointing out described user again to input described first speech data, until the confidence level of described first speech data received pre-sets confidence threshold more than or equal to described。

In this embodiment, prompting user speech data can be again inputted after performing predetermined registration operation, the confidence level making the speech data that user re-enters can reach to pre-set confidence threshold, utilizes voice to wake the accuracy of terminal unit and the Experience Degree of user up thus improving user。

In one embodiment, described execution predetermined registration operation, including:

Judge described terminal unit currently whether positive output second speech data；

When second speech data described in the current positive output of described terminal unit, turn down the volume value of described second speech data。

In this embodiment, the volume value of this speech data can be turned down when the current positive output speech data of terminal unit, so that the confidence level of the speech data of user's input can reach to pre-set confidence threshold, improve user and utilize voice to wake the accuracy of terminal unit and the Experience Degree of user up。

Exporting the second information, described second information is for pointing out described user the volume value improving described first speech data。

In this embodiment, by pointing out user to improve the volume value of input speech data so that the confidence level of the speech data of user's input can reach to pre-set confidence threshold, improves user and utilizes voice to wake the accuracy of terminal unit and the Experience Degree of user up。

Confidence threshold is pre-seted described in reduction。

In this embodiment, pre-set confidence threshold by reducing so that the confidence level of the speech data of user's input more easily reachs and pre-sets confidence threshold, improves user and utilizes voice to wake the accuracy of terminal unit and the Experience Degree of user up。

A kind of voice Rouser, including:

Matching module, is used for, when terminal unit receives when comprising default the first speech data waking word up of user's input, described first speech data and preset language model being mated, it is thus achieved that the confidence level of described first speech data；

Judge module, is used for judging that whether described confidence level is less than pre-seting confidence threshold；

Perform module, for when described confidence level less than described pre-set confidence threshold time, perform predetermined registration operation；

Wake module, for when described confidence level more than or equal to described pre-set confidence threshold time, wake the voice control function of described terminal unit up。

In one embodiment, described device also includes:

Output module, after described execution predetermined registration operation, exporting the first information, described first information is used for pointing out described user again to input described first speech data, until the confidence level of described first speech data received pre-sets confidence threshold more than or equal to described。

In one embodiment, described execution module includes:

Judge submodule, be used for judging described terminal unit currently whether positive output second speech data；

Turn down submodule, for when second speech data described in the current positive output of described terminal unit, turning down the volume value of described second speech data。

In one embodiment, described execution module includes:

Output sub-module, is used for exporting the second information, and described second information is for pointing out described user the volume value improving described first speech data。

In one embodiment, described execution module includes:

Reduce submodule, described in being used for reducing, pre-set confidence threshold。

Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from description, or understand by implementing the present invention。The purpose of the present invention and other advantages can be realized by structure specifically noted in the description write, claims and accompanying drawing and be obtained。

Below by drawings and Examples, technical scheme is described in further detail。

Accompanying drawing explanation

Accompanying drawing is for providing a further understanding of the present invention, and constitutes a part for description, is used for together with embodiments of the present invention explaining the present invention, is not intended that limitation of the present invention。In the accompanying drawings:

Fig. 1 is the flow chart of a kind of voice awakening method in the embodiment of the present invention；

Fig. 2 is the flow chart of step S13 in a kind of voice awakening method in the embodiment of the present invention；

Fig. 3 is the block diagram of a kind of voice Rouser in the embodiment of the present invention；

Fig. 4 is the block diagram of a kind of voice Rouser in the embodiment of the present invention；

Fig. 5 is the block diagram performing module in the embodiment of the present invention in a kind of voice Rouser；

Fig. 6 is the block diagram performing module in the embodiment of the present invention in a kind of voice Rouser；

Fig. 7 is the block diagram performing module in the embodiment of the present invention in a kind of voice Rouser。

Detailed description of the invention

Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are illustrated, it will be appreciated that preferred embodiment described herein is merely to illustrate and explains the present invention, is not intended to limit the present invention。

Fig. 1 is the flow chart of a kind of voice awakening method in the embodiment of the present invention。This voice awakening method is applied in terminal unit, and this terminal unit can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, armarium, body-building equipment, arbitrary equipment with voice control function such as personal digital assistant。As it is shown in figure 1, the method comprises the following steps S11-S14:

Step S11, when terminal unit receives when comprising default the first speech data waking word up of user's input, mates the first speech data and preset language model, it is thus achieved that the confidence level of the first speech data。

Wherein, presetting and waking word up is the word relevant to the voice control function of terminal unit, user preset。Such as, if the voice control function of terminal unit includes controlling Smart Home, preset and wake word up and can include the words relevant with Smart Home such as air-conditioning, TV, curtain；Again such as, if the voice control function of terminal unit includes being connected to cloud server and during by the cloud server search network information, preset and wake word up and can include the words relevant to network service such as search, inquiry, weather, train ticket。

When performing this step, first the speech data of user's input can be identified by terminal unit, identify whether this speech data to comprise preset and wake word up, if this speech data comprising preset and waking word up, then continue executing with step S11-S14, if not comprising in this speech data to preset and waking word up, illustrating that user does not wake the wish of the voice control function of terminal unit up, now the speech data of user's input is not made any feedback by terminal unit。

Preset language model can be general language model。

Step S12, it is judged that whether confidence level is less than pre-seting confidence threshold。

Step S13, when confidence level is less than, when pre-seting confidence threshold, performing predetermined registration operation。

Step S14, when confidence level is more than or equal to, when pre-seting confidence threshold, waking the voice control function of terminal unit up。

In one embodiment, the confidence level of the first speech data can be determined by least one of the following characteristics of the first speech data:

(1) word speed；The i.e. duration of unit word。

(2) N-best feature。

(3) position；I.e. each word location in sentence, neutralizes end of the sentence including beginning of the sentence, sentence。

(4) word is long；Namely the character number that each word includes。

(5) duration；Namely the frame number that each word is lasting。

(6) competing words number: arc number between two neighborhood of nodes on confusion network, namely has several word in competition in a period of time。

(7) the ngram language model scores of word。

(8) difference of competing words posterior probability；The i.e. difference of the posterior probability of the competing words that two posterior probability between two neighborhood of nodes are maximum on confusion network。

(9) sentence is long。

For the features above of the first speech data, the method by the method classified based on predicted characteristics or based on posterior probability can determine and owing to these two kinds of methods are prior art, therefore repeat no more the confidence level of the first speech data。

In above-described embodiment, the value of confidence level is between the scope of 0～1, and owing to confidence level is used to the reliability of assessment voice identification result, therefore confidence level is more high, illustrates that voice identification result is more accurate。Pre-set the value of confidence threshold between the scope of 0～1。

In one embodiment, after step S13, said method is further comprising the steps of:

Exporting the first information, this first information is used for pointing out user again to input the first speech data, until the confidence level of the first speech data received is more than or equal to pre-seting confidence threshold。

Terminal unit can export the first information by the mode of voice output, for instance voice output " please inputs voice content " again。When user inputs the first speech data again, the confidence level of the first speech data, according to the result performed after predetermined registration operation, is determined, until the confidence level of the first speech data is more than or equal to pre-seting confidence threshold by terminal unit again。

In this embodiment, it is possible to after performing predetermined registration operation, prompting user inputs speech data again so that the confidence level of the speech data that user re-enters can reach to pre-set confidence threshold, utilizes voice to wake the success rate of terminal unit up thus improving user。

In above-mentioned steps S13, terminal unit can perform different predetermined registration operation according to different situations。Below by way of several embodiments, the concrete operations performed by terminal unit are described。

In one embodiment, as in figure 2 it is shown, step S13 comprises the following steps S21-S23:

Step S21, it is judged that terminal unit currently whether positive output second speech data；If the current positive output second speech data of terminal unit, then perform step S22；If terminal unit does not currently export second speech data, then perform step S23。

Step S22, turns down the volume value of second speech data。

Wherein, volume value can be characterized by decibel value。Terminal unit can determine that the decibel value of sound in the first speech data and second speech data。

The reduction amplitude of volume value can be turned down according to predetermined amplitude, such as, predetermined amplitude is 25 decibels, terminal unit is playing music, and have determined that the decibel value of this music is 60 decibels, then according to predetermined amplitude, the decibel value of music being reduced by 25 decibels, the decibel value of the music after reduction is 35 decibels。The reduction amplitude of volume value can be turned down according to the difference between the sound decibel value of second speech data and the sound decibel value of the first speech data, such as, terminal unit is playing music, and have determined that the decibel value of this music (i.e. second speech data) is 60 decibels, and the sound decibel value of the first speech data of user's input is 40 decibels, then the decibel value of music can be reduced to less than 40 decibels, so that the sound decibel value of the first speech data is higher than the decibel value of music, thus increasing the accuracy rate of the identification to the first speech data, improve the confidence level of the first speech data。

Step S23, exports information；This information is for pointing out user the volume value improving the first speech data。

Terminal unit can export this information by the mode of voice output, for instance, terminal unit voice output " your sound is too small, please speak up "。

In this embodiment, the volume value of this speech data can be turned down when the current positive output speech data of terminal unit, and point out user to reduce volume when terminal unit does not currently export second speech data, so that the confidence level of the speech data of user's input can reach to pre-set confidence threshold, improve user and utilize voice to wake the accuracy of terminal unit and the Experience Degree of user up。

In one embodiment, when performing step S13, no matter terminal unit currently whether positive output speech data, all can directly export information, to point out user to improve the volume value of the first speech data。

In one embodiment, step S13 also can be embodied as following steps: reduces and pre-sets confidence threshold。

In this embodiment, confidence threshold is pre-seted by reducing, the confidence level making the speech data that user inputs more easily reachs and pre-sets confidence threshold, when positive output second speech data current particularly in terminal unit, second speech data makes the first speech data that user inputs be interfered, it is not easy to be identified successfully, therefore reducing and pre-set confidence threshold and can make terminal unit that the success rate of the first speech data identification is increased, utilizing voice to wake the accuracy of terminal unit and the Experience Degree of user up thus improve user。

Fig. 3 is the block diagram of a kind of voice Rouser in the embodiment of the present invention。As it is shown on figure 3, this device includes:

Matching module 31, is used for, when terminal unit receives when comprising default the first speech data waking word up of user's input, the first speech data and preset language model being mated, it is thus achieved that the confidence level of the first speech data；

Judge module 32, is used for judging that whether confidence level is less than pre-seting confidence threshold；

Perform module 33, for when confidence level is less than, when pre-seting confidence threshold, performing predetermined registration operation；

Wake module 34, for when confidence level is more than or equal to, when pre-seting confidence threshold, waking the voice control function of terminal unit up。

In one embodiment, as shown in Figure 4, said apparatus also includes:

Output module 35, after being used for performing predetermined registration operation, exports the first information, and the first information is used for pointing out user again to input the first speech data, until the confidence level of the first speech data received is more than or equal to pre-seting confidence threshold。

In one embodiment, as it is shown in figure 5, perform module 33 and include:

Judge submodule 331, be used for judging terminal unit currently whether positive output second speech data；

Turn down submodule 332, for when the current positive output second speech data of terminal unit, turning down the volume value of second speech data。

In one embodiment, as shown in Figure 6, perform module 33 to include:

Output sub-module 333, is used for exporting the second information, and the second information is for pointing out user the volume value improving the first speech data。

In one embodiment, as it is shown in fig. 7, perform module 33 and include:

Reduce submodule 334, pre-set confidence threshold for reduction。

Said apparatus, it is determined by comprising the confidence level presetting the first speech data waking word up, and perform predetermined registration operation at this confidence level less than when pre-seting confidence threshold, simultaneously at this confidence level more than or equal to the voice control function waking terminal unit when pre-seting confidence threshold up, when making user utilize voice to wake terminal unit failure up, terminal unit can improve the confidence level of the first speech data by performing predetermined registration operation, utilizes voice to wake the accuracy of terminal unit and the Experience Degree of user up thus improving user。

Those skilled in the art are it should be appreciated that embodiments of the invention can be provided as method, system or computer program。Therefore, the present invention can adopt the form of complete hardware embodiment, complete software implementation or the embodiment in conjunction with software and hardware aspect。And, the present invention can adopt the form at one or more upper computer programs implemented of computer-usable storage medium (including but not limited to disk memory and optical memory etc.) wherein including computer usable program code。

The present invention is that flow chart and/or block diagram with reference to method according to embodiments of the present invention, equipment (system) and computer program describe。It should be understood that can by the combination of the flow process in each flow process in computer program instructions flowchart and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame。These computer program instructions can be provided to produce a machine to the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device so that the instruction performed by the processor of computer or other programmable data processing device is produced for realizing the device of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame。

These computer program instructions may be alternatively stored in and can guide in the computer-readable memory that computer or other programmable data processing device work in a specific way, the instruction making to be stored in this computer-readable memory produces to include the manufacture of command device, and this command device realizes the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame。

These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices provides for realizing the step of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame。

Obviously, the present invention can be carried out various change and modification without deviating from the spirit and scope of the present invention by those skilled in the art。So, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification。

Claims

1. a voice awakening method, it is characterised in that including:

2. method according to claim 1, it is characterised in that after described execution predetermined registration operation, described method also includes:

3. method according to claim 1, it is characterised in that described execution predetermined registration operation, including:

4. the method according to claim 1 or 3, it is characterised in that described execution predetermined registration operation, including:

5. method according to claim 1, it is characterised in that described execution predetermined registration operation, including:

Confidence threshold is pre-seted described in reduction。

6. a voice Rouser, it is characterised in that including:

7. device according to claim 6, it is characterised in that described device also includes:

8. device according to claim 6, it is characterised in that described execution module includes:

9. the device according to claim 6 or 8, it is characterised in that described execution module includes:

10. device according to claim 6, it is characterised in that described execution module includes: