CN109545207A

CN109545207A - A kind of voice awakening method and device

Info

Publication number: CN109545207A
Application number: CN201811369606.7A
Authority: CN
Inventors: 林亚男
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2019-03-29

Abstract

The invention belongs to voices to wake up field, disclose a kind of voice awakening method and device, and voice awakening method includes: to carry out speech feature extraction to the current input voice of acquisition；According to the phonetic feature extracted, according to the keyword detection model constructed in advance, determine that the keyword in keyword detection model includes at least preset wake-up word with the presence or absence of word is waken up in current input voice；When determining current input in voice in the presence of word is waken up, further judging, which whether there is in preset time threshold, instructs word；Keyword in keyword detection model includes at least preset instruction word；Current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold；Corresponding feedback is made according to voice wake-up mode.The present invention forms new wake-up mode in conjunction with the advantage of both the through wake-up mode of a language and common wake-up mode, more natural when allowing user to interact with voice system.

Description

A kind of voice awakening method and device

Technical field

The invention belongs to voices to wake up field, in particular to a kind of voice awakening method and device.

Background technique

With the development of voice technology, many smart machines can be interacted by voice and user.Smart machine Voice interactive system identified by the voice to user, complete the instruction of user.In traditional interactive voice, user Usual manual actuation voice, for example record button is pressed, it can just carry out interactive voice.In order to make user more smoothly cut language Sound simulates the behavior for starting to call other side in person to person's interaction, devises voice arousal function.

Currently, existing voice wake-up mode is main are as follows: before carrying out interactive voice with smart machine, user needs first Say that wake-up word, wake-up word can be pre-set for smart machine.The wake-up module of voice interactive system to voice into Row detection, extracts phonetic feature, determines whether the phonetic feature extracted matches with the preset phonetic feature for waking up word, if Matching wakes up identification module, carries out speech recognition to the phonetic order of subsequent input and semanteme parses.Such as: user wants to make With the voice interactive system of TV, indication TV is transformed into sports channel.User, which needs to say, first wakes up word, for example " you are good electric Depending on ", after wake-up module detects wake-up word, activate identification module.Identification module starts to detect phonetic order, at this point, user says " seeing sports channel ", identification module identify phonetic order, and according to instruction by channel switch to sports channel.Know completing instruction After not, identification module closing no longer works, if user wants to issue instruction again, needs to say that waking up word wakes up identification mould again Block.

In above-mentioned existing voice wake-up mode, since user is before each sending instruction, require to carry out voice wake-up, It needs first to say wake-up word, then issues the voice of instruction, so that being needed again after voice interactive system completes an instruction operation Keyword detection is carried out, system resource is wasted: and for a user, before issuing instruction every time, require once to call out Awake word, voice wake-up mode is cumbersome, and user experience is poor.

It is proposed One-shot (a language joint control) mode for this defect Google, but also defective in terms of wake-up, The deficiency of One-shot wake-up mode is that voice system does not timely feedback, and may not be identified after user finishes a word It is intended to out, needs to say whole sentence again, common wake-up mode deficiency is to say that instruction will say a wake-up word, but advantage every time It is to timely feedback.

Summary of the invention

The object of the present invention is to provide a kind of voice awakening method and device, goes directly and wake-up mode and commonly call out in conjunction with a language The advantage of both the mode of waking up forms new wake-up mode, more natural when allowing user to interact with voice system.

Technical solution provided by the invention is as follows:

The present invention provides a kind of voice awakening method, comprising:

Speech feature extraction is carried out to the current input voice of acquisition；

It is determined in current input voice according to the phonetic feature extracted according to the keyword detection model constructed in advance With the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word；

When determining current input in voice in the presence of word is waken up, further judging, which whether there is in preset time threshold, is instructed Word；

Current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold；

Corresponding feedback is made according to voice wake-up mode.

Preferably, the step " judging current speech wake-up mode with the presence or absence of instruction word according in preset time threshold " Include:

When judging to have instruction word in preset time threshold, judge that current speech wake-up mode wakes up for a language is through Mode.

Preferably, the step " making corresponding feedback according to voice wake-up mode " includes:

When judge current speech wake-up mode for a language go directly wake-up mode when, identification described instruction word simultaneously execute instruction.

When judging the no instruction word in preset time threshold, judge current speech wake-up mode for common wake-up side Formula.

When judging current speech wake-up mode for common wake-up mode, it is anti-that wake-up word is made after the first preset time Feedback.

The invention also discloses a kind of voice Rousers, including above-mentioned voice awakening method, further includes:

Speech feature extraction unit, for carrying out speech feature extraction to the current input voice of acquisition；

Word detection unit is waken up, for according to the phonetic feature that extracts, according to the keyword detection model constructed in advance, Determine that the keyword in the keyword detection model includes at least preset call out with the presence or absence of word is waken up in current input voice Awake word；

Instruction word judging unit is used for when determining in current input voice in the presence of wake-up word, when further judgement is default Between in threshold value with the presence or absence of instruction word；

Voice wake-up mode judging unit, for judging current speech with the presence or absence of instruction word according in preset time threshold Wake-up mode；

Feedback unit, for making corresponding feedback according to voice wake-up mode.

Preferably, the voice wake-up mode judging unit, which is specifically used for working as, judges there is instruction in preset time threshold When word, judge current speech wake-up mode for the through wake-up mode of a language.

Preferably, the feedback unit, which is specifically used for working as, judges current speech wake-up mode for the through wake-up mode of a language When, it identifies described instruction word and executes instruction.

Preferably, the voice wake-up mode judging unit, which is specifically used for working as, judges that there is no refer in preset time threshold When enabling word, judge current speech wake-up mode for common wake-up mode.

Preferably, the feedback unit is specifically used for when judging current speech wake-up mode for common wake-up mode, It is made after first preset time and wakes up word feedback.

Compared with prior art, one kind provided by the invention has the advantages that

1, the present invention can be realized a language and go directly the combination of wake-up mode and common wake-up mode, allow user and voice system It is more natural when interacting；

2, the present invention is rapid using feedback when common wake-up mode, and going directly wake-up mode using a language will not be by feedback sound Interference is avoided when the two combines and is led to the problem of.

Detailed description of the invention

Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to a kind of voice awakening method and Above-mentioned characteristic, technical characteristic, advantage and its implementation of device are further described.

Fig. 1 is a kind of flow diagram of voice awakening method method of the present invention；

Fig. 2 is the flow diagram of another voice awakening method method of the invention；

Fig. 3 is the flow diagram of another voice awakening method method of the invention；

Fig. 4 is a kind of complete job flow chart of voice awakening method of the present invention；

Fig. 5 is a kind of structural schematic block diagram of voice Rouser of the present invention；

Drawing reference numeral explanation:

100, speech feature extraction unit, 200, wake up word detection unit, 300, instruction word judging unit, 400, voice calls out Awake mode judging unit, 500, feedback unit.

Specific embodiment

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.

To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".

A kind of embodiment provided according to the present invention, as shown in Figure 1, a kind of voice awakening method, comprising:

S1, speech feature extraction is carried out to the current input voice of acquisition；

Specifically, whether the smart machine monitoring with voice interactive function has voice input.In this step, it can use Existing acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum system Number.

S2, current input voice is determined according to the keyword detection model constructed in advance according to the phonetic feature extracted In with the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word；

In the embodiment of the present invention, with the presence or absence of before waking up word in detection input voice, first have to build keyword Detection model, the specific mode for constructing keyword detection model are as follows:

In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".

S3, when determine exist in current input voice wake up word when, further judge to whether there is in preset time threshold Instruct word；Keyword in the keyword detection model includes at least preset instruction word；

The characteristics of instruction word is used to refer to show that the smart machine executes the phrase of corresponding operating, instructs word is to reflect the intelligence Can the proprietary function of equipment, such as " navigating to " be it is highly relevant with the equipment (such as automobile) with navigation feature, " broadcast Put " it is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, instruction word can directly reflect The intention of user out.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from the signal of input voice Extract a frame speech feature vector.

When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.

Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.

S4, current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold；

S5, corresponding feedback is made according to voice wake-up mode.

Specifically, the embodiment of the present invention is judged by judging the instruction word for whether having user to issue in preset time threshold It is currently any wake-up mode, corresponding feedback is made according to different wake-up modes, different wake-up modes are used in combination, It is more natural when allowing user to interact with voice system.

Voice awakening method in the embodiment of the present invention can be applied on the smart machine with voice interactive function, example Such as: TV, mobile phone, computer, intelligent refrigerator.

Another embodiment provided according to the present invention, as shown in Fig. 2, a kind of voice awakening method, comprising:

S41, when judge in preset time threshold exist instruction word when, judge current speech wake-up mode be a language go directly Wake-up mode；

S51, when judge current speech wake-up mode for a language go directly wake-up mode when, identify described instruction word and execute refer to It enables.

Specifically, the embodiment of the present invention is judged by judging the instruction word for whether having user to issue in preset time threshold Currently it is any wake-up mode, when judge current awake mode is the through wake-up mode of a language, identifies described instruction word simultaneously It executes instruction.

Specifically, a language goes directly, wake-up mode is also referred to as One-shot (a language joint control) mode, One-shot, using " calling out The integrated mode of awake word+voice semantics recognition ", realizes and wakes up zero interval, zero-lag, seamless interfacing between word and speech control, The form of the question-response of abandoning tradition greatly reduces the step of user speech manipulates, and realizes information feedback, simplifies, real Existing easy operation.

The big feature of the one of One-shot is that identification wake-up is integrated with semantic understanding, guarantees uniformity and the company of interactive voice Coherence completes manipulation.One-shot function realizes " waking up word+voice semantics recognition " integration, than strictly according to the facts in being with a language Existing such interaction:

User: you are well small to speed, I will go to the airport.

Equipment: start to navigate for you and go to the airport.

Different wake-up modes are used in combination the present invention, realize the knot of a language through wake-up mode and common wake-up mode It closes, allows user and voice system when interacting more naturally, feedback rapidly, uses one when using common wake-up mode The through wake-up mode of language will not be interfered by feedback sound, avoided when the two combines and led to the problem of.

Another embodiment provided according to the present invention, as shown in figure 3, a kind of voice awakening method, comprising:

S42, when judging the no instruction word in preset time threshold, judge current speech wake-up mode commonly to call out The mode of waking up；

S52, when judging current speech wake-up mode for common wake-up mode, wake-up is made after the first preset time Word feedback.

Specifically, the first preset time can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 400ms, i.e., when judging current speech wake-up mode for common wake-up mode, Make after 400ms and wake up word feedback, for example issue voice messaging " I " etc..

Common wake-up mode is question-response form, and user, which issues, wakes up word, needs equipment to feed back standby information, then It can start to interact, such as:

User: you are well small to speed and (wakes up word)！

Equipment: there is anything that can help you? (equipment feedback indicates to be in information re-ception state)

User: I will go to the airport.

Equipment: start to navigate for you and go to the airport.

Fig. 4 is a kind of complete job flow chart of voice awakening method of the present invention.As shown in figure 4, complete job process packet It includes:

The current input voice of 401 pairs of acquisitions carries out speech feature extraction；

402 determine current input voice according to the keyword detection model constructed in advance according to the phonetic feature extracted In with the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word；

403, when determining in current input voice in the presence of word is waken up, further judge to whether there is in preset time threshold Instruct word；

404 when judging to have instruction word in preset time threshold, judges that current speech wake-up mode is through for a language Wake-up mode identifies described instruction word and executes instruction；

405 when judging the no instruction word in preset time threshold, judges current speech wake-up mode commonly to call out The mode of waking up makes after the first preset time and wakes up word feedback.

A kind of embodiment provided according to the present invention, as shown in figure 5, a kind of voice Rouser, including above-mentioned voice are called out Awake method, further includes:

Speech feature extraction unit 100, for carrying out speech feature extraction to the current input voice of acquisition；

Specifically, whether the monitoring of speech feature extraction unit 100 has voice input.In this step, it can use existing Acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum coefficient.

Word detection unit 200 is waken up, for the phonetic feature that basis is extracted, according to the keyword detection mould constructed in advance Type determines that the keyword in the keyword detection model includes at least default with the presence or absence of word is waken up in current input voice Wake-up word；

Specifically, with the presence or absence of before waking up word in detection input voice, first having to build in the embodiment of the present invention Keyword detection model, the specific mode for constructing keyword detection model are as follows:

Word judging unit 300 is instructed, for when waking up word, further judgement to be default when existing in determining current input voice With the presence or absence of instruction word in time threshold；

Specifically, it is energy that instruction word, which is used to refer to the characteristics of showing the phrase that the smart machine executes corresponding operating, instructing word, Reflect the proprietary function of the smart machine, for example " navigating to " is highly relevant with the equipment (such as automobile) with navigation feature , " broadcasting " is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, and instruction word can be straight The reversed intention for mirroring user.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from input voice A frame speech feature vector is extracted in signal.

Voice wake-up mode judging unit 400, for according to current with the presence or absence of instruction word judgement in preset time threshold Voice wake-up mode；

Feedback unit 500, for making corresponding feedback according to voice wake-up mode.

Voice Rouser in the embodiment of the present invention can be the smart machine with voice interactive function, such as: electricity Depending on, mobile phone, computer, intelligent refrigerator etc..

Another embodiment provided according to the present invention, a kind of voice Rouser, including above-mentioned voice awakening method, also Include:

Voice wake-up mode judging unit 400 is specifically used for sentencing when judging to have instruction word in preset time threshold Disconnected current speech wake-up mode is the through wake-up mode of a language；

Feedback unit 500 is specifically used for when judge current speech wake-up mode is the through wake-up mode of a language, identification institute It states instruction word and executes instruction.

User: you are well small to speed, I will go to the airport.

Equipment: start to navigate for you and go to the airport.

Voice wake-up mode judging unit 400 is specifically used for when judging the no instruction word in preset time threshold, Judge current speech wake-up mode for common wake-up mode；

Feedback unit 500 is specifically used for when judging current speech wake-up mode for common wake-up mode, default first It is made after time and wakes up word feedback.

User: you are well small to speed and (wakes up word)！

User: I will go to the airport.

Equipment: start to navigate for you and go to the airport.

Specifically, the present invention judges currently to be by judging the instruction word for whether having user to issue in preset time threshold Any wake-up mode makes corresponding feedback according to different wake-up modes, different wake-up modes is used in combination, and allows user It is more natural when being interacted with voice system.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferred Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims

1. a kind of voice awakening method characterized by comprising

According to the phonetic feature extracted, according to the keyword detection model constructed in advance, determine in current input voice whether In the presence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word；

When determining current input in voice in the presence of word is waken up, further judging, which whether there is in preset time threshold, instructs word； Keyword in the keyword detection model includes at least preset instruction word；

Corresponding feedback is made according to voice wake-up mode.

2. voice awakening method according to claim 1, which is characterized in that the step is " according in preset time threshold Current speech wake-up mode is judged with the presence or absence of instruction word " include:

When judging to have instruction word in preset time threshold, judge current speech wake-up mode for the through wake-up side of a language Formula.

3. voice awakening method according to claim 2, which is characterized in that the step " is made according to voice wake-up mode Corresponding feedback out " includes:

4. voice awakening method according to claim 1, which is characterized in that the step is " according in preset time threshold Current speech wake-up mode is judged with the presence or absence of instruction word " include:

When judging the no instruction word in preset time threshold, judge current speech wake-up mode for common wake-up mode.

5. voice awakening method according to claim 4, which is characterized in that the step " is made according to voice wake-up mode Corresponding feedback out " includes:

When judging current speech wake-up mode for common wake-up mode, is made after the first preset time and wake up word feedback.

6. a kind of voice Rouser, which is characterized in that including voice wake-up side described in the claims 1-5 any one Method, further includes:

Word detection unit is waken up, for determining according to the phonetic feature extracted according to the keyword detection model constructed in advance With the presence or absence of word is waken up in current input voice, the keyword in the keyword detection model includes at least preset wake-up Word；

Word judging unit being instructed, being used to further judge preset time threshold when there is wake-up word in the current input voice of judgement With the presence or absence of instruction word in value；

Voice wake-up mode judging unit, for judging that current speech wakes up with the presence or absence of instruction word according in preset time threshold Mode；

7. voice Rouser according to claim 6, which is characterized in that the voice wake-up mode judging unit is specific For when judging to have instruction word in preset time threshold, judging current speech wake-up mode for the through wake-up side of a language Formula.

8. voice Rouser according to claim 7, which is characterized in that the feedback unit is specifically used for working as when judgement Preceding voice wake-up mode be a language go directly wake-up mode when, identification described instruction word simultaneously execute instruction.

9. voice Rouser according to claim 6, which is characterized in that the voice wake-up mode judging unit is specific For when judging the no instruction word in preset time threshold, judging current speech wake-up mode for common wake-up mode.

10. voice Rouser according to claim 9, which is characterized in that the feedback unit is specifically used for when judgement When current speech wake-up mode is common wake-up mode, is made after the first preset time and wake up word feedback.