CN109545207A - A kind of voice awakening method and device - Google Patents

A kind of voice awakening method and device Download PDF

Info

Publication number
CN109545207A
CN109545207A CN201811369606.7A CN201811369606A CN109545207A CN 109545207 A CN109545207 A CN 109545207A CN 201811369606 A CN201811369606 A CN 201811369606A CN 109545207 A CN109545207 A CN 109545207A
Authority
CN
China
Prior art keywords
wake
mode
voice
word
preset time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811369606.7A
Other languages
Chinese (zh)
Inventor
林亚男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN201811369606.7A priority Critical patent/CN109545207A/en
Publication of CN109545207A publication Critical patent/CN109545207A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Abstract

The invention belongs to voices to wake up field, disclose a kind of voice awakening method and device, and voice awakening method includes: to carry out speech feature extraction to the current input voice of acquisition;According to the phonetic feature extracted, according to the keyword detection model constructed in advance, determine that the keyword in keyword detection model includes at least preset wake-up word with the presence or absence of word is waken up in current input voice;When determining current input in voice in the presence of word is waken up, further judging, which whether there is in preset time threshold, instructs word;Keyword in keyword detection model includes at least preset instruction word;Current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold;Corresponding feedback is made according to voice wake-up mode.The present invention forms new wake-up mode in conjunction with the advantage of both the through wake-up mode of a language and common wake-up mode, more natural when allowing user to interact with voice system.

Description

A kind of voice awakening method and device
Technical field
The invention belongs to voices to wake up field, in particular to a kind of voice awakening method and device.
Background technique
With the development of voice technology, many smart machines can be interacted by voice and user.Smart machine Voice interactive system identified by the voice to user, complete the instruction of user.In traditional interactive voice, user Usual manual actuation voice, for example record button is pressed, it can just carry out interactive voice.In order to make user more smoothly cut language Sound simulates the behavior for starting to call other side in person to person's interaction, devises voice arousal function.
Currently, existing voice wake-up mode is main are as follows: before carrying out interactive voice with smart machine, user needs first Say that wake-up word, wake-up word can be pre-set for smart machine.The wake-up module of voice interactive system to voice into Row detection, extracts phonetic feature, determines whether the phonetic feature extracted matches with the preset phonetic feature for waking up word, if Matching wakes up identification module, carries out speech recognition to the phonetic order of subsequent input and semanteme parses.Such as: user wants to make With the voice interactive system of TV, indication TV is transformed into sports channel.User, which needs to say, first wakes up word, for example " you are good electric Depending on ", after wake-up module detects wake-up word, activate identification module.Identification module starts to detect phonetic order, at this point, user says " seeing sports channel ", identification module identify phonetic order, and according to instruction by channel switch to sports channel.Know completing instruction After not, identification module closing no longer works, if user wants to issue instruction again, needs to say that waking up word wakes up identification mould again Block.
In above-mentioned existing voice wake-up mode, since user is before each sending instruction, require to carry out voice wake-up, It needs first to say wake-up word, then issues the voice of instruction, so that being needed again after voice interactive system completes an instruction operation Keyword detection is carried out, system resource is wasted: and for a user, before issuing instruction every time, require once to call out Awake word, voice wake-up mode is cumbersome, and user experience is poor.
It is proposed One-shot (a language joint control) mode for this defect Google, but also defective in terms of wake-up, The deficiency of One-shot wake-up mode is that voice system does not timely feedback, and may not be identified after user finishes a word It is intended to out, needs to say whole sentence again, common wake-up mode deficiency is to say that instruction will say a wake-up word, but advantage every time It is to timely feedback.
Summary of the invention
The object of the present invention is to provide a kind of voice awakening method and device, goes directly and wake-up mode and commonly call out in conjunction with a language The advantage of both the mode of waking up forms new wake-up mode, more natural when allowing user to interact with voice system.
Technical solution provided by the invention is as follows:
The present invention provides a kind of voice awakening method, comprising:
Speech feature extraction is carried out to the current input voice of acquisition;
It is determined in current input voice according to the phonetic feature extracted according to the keyword detection model constructed in advance With the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word;
When determining current input in voice in the presence of word is waken up, further judging, which whether there is in preset time threshold, is instructed Word;
Current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold;
Corresponding feedback is made according to voice wake-up mode.
Preferably, the step " judging current speech wake-up mode with the presence or absence of instruction word according in preset time threshold " Include:
When judging to have instruction word in preset time threshold, judge that current speech wake-up mode wakes up for a language is through Mode.
Preferably, the step " making corresponding feedback according to voice wake-up mode " includes:
When judge current speech wake-up mode for a language go directly wake-up mode when, identification described instruction word simultaneously execute instruction.
Preferably, the step " judging current speech wake-up mode with the presence or absence of instruction word according in preset time threshold " Include:
When judging the no instruction word in preset time threshold, judge current speech wake-up mode for common wake-up side Formula.
Preferably, the step " making corresponding feedback according to voice wake-up mode " includes:
When judging current speech wake-up mode for common wake-up mode, it is anti-that wake-up word is made after the first preset time Feedback.
The invention also discloses a kind of voice Rousers, including above-mentioned voice awakening method, further includes:
Speech feature extraction unit, for carrying out speech feature extraction to the current input voice of acquisition;
Word detection unit is waken up, for according to the phonetic feature that extracts, according to the keyword detection model constructed in advance, Determine that the keyword in the keyword detection model includes at least preset call out with the presence or absence of word is waken up in current input voice Awake word;
Instruction word judging unit is used for when determining in current input voice in the presence of wake-up word, when further judgement is default Between in threshold value with the presence or absence of instruction word;
Voice wake-up mode judging unit, for judging current speech with the presence or absence of instruction word according in preset time threshold Wake-up mode;
Feedback unit, for making corresponding feedback according to voice wake-up mode.
Preferably, the voice wake-up mode judging unit, which is specifically used for working as, judges there is instruction in preset time threshold When word, judge current speech wake-up mode for the through wake-up mode of a language.
Preferably, the feedback unit, which is specifically used for working as, judges current speech wake-up mode for the through wake-up mode of a language When, it identifies described instruction word and executes instruction.
Preferably, the voice wake-up mode judging unit, which is specifically used for working as, judges that there is no refer in preset time threshold When enabling word, judge current speech wake-up mode for common wake-up mode.
Preferably, the feedback unit is specifically used for when judging current speech wake-up mode for common wake-up mode, It is made after first preset time and wakes up word feedback.
Compared with prior art, one kind provided by the invention has the advantages that
1, the present invention can be realized a language and go directly the combination of wake-up mode and common wake-up mode, allow user and voice system It is more natural when interacting;
2, the present invention is rapid using feedback when common wake-up mode, and going directly wake-up mode using a language will not be by feedback sound Interference is avoided when the two combines and is led to the problem of.
Detailed description of the invention
Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to a kind of voice awakening method and Above-mentioned characteristic, technical characteristic, advantage and its implementation of device are further described.
Fig. 1 is a kind of flow diagram of voice awakening method method of the present invention;
Fig. 2 is the flow diagram of another voice awakening method method of the invention;
Fig. 3 is the flow diagram of another voice awakening method method of the invention;
Fig. 4 is a kind of complete job flow chart of voice awakening method of the present invention;
Fig. 5 is a kind of structural schematic block diagram of voice Rouser of the present invention;
Drawing reference numeral explanation:
100, speech feature extraction unit, 200, wake up word detection unit, 300, instruction word judging unit, 400, voice calls out Awake mode judging unit, 500, feedback unit.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".
A kind of embodiment provided according to the present invention, as shown in Figure 1, a kind of voice awakening method, comprising:
S1, speech feature extraction is carried out to the current input voice of acquisition;
Specifically, whether the smart machine monitoring with voice interactive function has voice input.In this step, it can use Existing acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum system Number.
S2, current input voice is determined according to the keyword detection model constructed in advance according to the phonetic feature extracted In with the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word;
In the embodiment of the present invention, with the presence or absence of before waking up word in detection input voice, first have to build keyword Detection model, the specific mode for constructing keyword detection model are as follows:
In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".
S3, when determine exist in current input voice wake up word when, further judge to whether there is in preset time threshold Instruct word;Keyword in the keyword detection model includes at least preset instruction word;
The characteristics of instruction word is used to refer to show that the smart machine executes the phrase of corresponding operating, instructs word is to reflect the intelligence Can the proprietary function of equipment, such as " navigating to " be it is highly relevant with the equipment (such as automobile) with navigation feature, " broadcast Put " it is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, instruction word can directly reflect The intention of user out.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from the signal of input voice Extract a frame speech feature vector.
When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.
Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.
S4, current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold;
S5, corresponding feedback is made according to voice wake-up mode.
Specifically, the embodiment of the present invention is judged by judging the instruction word for whether having user to issue in preset time threshold It is currently any wake-up mode, corresponding feedback is made according to different wake-up modes, different wake-up modes are used in combination, It is more natural when allowing user to interact with voice system.
Voice awakening method in the embodiment of the present invention can be applied on the smart machine with voice interactive function, example Such as: TV, mobile phone, computer, intelligent refrigerator.
Another embodiment provided according to the present invention, as shown in Fig. 2, a kind of voice awakening method, comprising:
S1, speech feature extraction is carried out to the current input voice of acquisition;
Specifically, whether the smart machine monitoring with voice interactive function has voice input.In this step, it can use Existing acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum system Number.
S2, current input voice is determined according to the keyword detection model constructed in advance according to the phonetic feature extracted In with the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word;
In the embodiment of the present invention, with the presence or absence of before waking up word in detection input voice, first have to build keyword Detection model, the specific mode for constructing keyword detection model are as follows:
In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".
S3, when determine exist in current input voice wake up word when, further judge to whether there is in preset time threshold Instruct word;Keyword in the keyword detection model includes at least preset instruction word;
The characteristics of instruction word is used to refer to show that the smart machine executes the phrase of corresponding operating, instructs word is to reflect the intelligence Can the proprietary function of equipment, such as " navigating to " be it is highly relevant with the equipment (such as automobile) with navigation feature, " broadcast Put " it is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, instruction word can directly reflect The intention of user out.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from the signal of input voice Extract a frame speech feature vector.
When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.
Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.
S41, when judge in preset time threshold exist instruction word when, judge current speech wake-up mode be a language go directly Wake-up mode;
S51, when judge current speech wake-up mode for a language go directly wake-up mode when, identify described instruction word and execute refer to It enables.
Specifically, the embodiment of the present invention is judged by judging the instruction word for whether having user to issue in preset time threshold Currently it is any wake-up mode, when judge current awake mode is the through wake-up mode of a language, identifies described instruction word simultaneously It executes instruction.
Specifically, a language goes directly, wake-up mode is also referred to as One-shot (a language joint control) mode, One-shot, using " calling out The integrated mode of awake word+voice semantics recognition ", realizes and wakes up zero interval, zero-lag, seamless interfacing between word and speech control, The form of the question-response of abandoning tradition greatly reduces the step of user speech manipulates, and realizes information feedback, simplifies, real Existing easy operation.
The big feature of the one of One-shot is that identification wake-up is integrated with semantic understanding, guarantees uniformity and the company of interactive voice Coherence completes manipulation.One-shot function realizes " waking up word+voice semantics recognition " integration, than strictly according to the facts in being with a language Existing such interaction:
User: you are well small to speed, I will go to the airport.
Equipment: start to navigate for you and go to the airport.
Different wake-up modes are used in combination the present invention, realize the knot of a language through wake-up mode and common wake-up mode It closes, allows user and voice system when interacting more naturally, feedback rapidly, uses one when using common wake-up mode The through wake-up mode of language will not be interfered by feedback sound, avoided when the two combines and led to the problem of.
Another embodiment provided according to the present invention, as shown in figure 3, a kind of voice awakening method, comprising:
S1, speech feature extraction is carried out to the current input voice of acquisition;
Specifically, whether the smart machine monitoring with voice interactive function has voice input.In this step, it can use Existing acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum system Number.
S2, current input voice is determined according to the keyword detection model constructed in advance according to the phonetic feature extracted In with the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word;
In the embodiment of the present invention, with the presence or absence of before waking up word in detection input voice, first have to build keyword Detection model, the specific mode for constructing keyword detection model are as follows:
In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".
S3, when determine exist in current input voice wake up word when, further judge to whether there is in preset time threshold Instruct word;Keyword in the keyword detection model includes at least preset instruction word;
The characteristics of instruction word is used to refer to show that the smart machine executes the phrase of corresponding operating, instructs word is to reflect the intelligence Can the proprietary function of equipment, such as " navigating to " be it is highly relevant with the equipment (such as automobile) with navigation feature, " broadcast Put " it is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, instruction word can directly reflect The intention of user out.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from the signal of input voice Extract a frame speech feature vector.
When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.
Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.
S42, when judging the no instruction word in preset time threshold, judge current speech wake-up mode commonly to call out The mode of waking up;
S52, when judging current speech wake-up mode for common wake-up mode, wake-up is made after the first preset time Word feedback.
Specifically, the first preset time can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 400ms, i.e., when judging current speech wake-up mode for common wake-up mode, Make after 400ms and wake up word feedback, for example issue voice messaging " I " etc..
Common wake-up mode is question-response form, and user, which issues, wakes up word, needs equipment to feed back standby information, then It can start to interact, such as:
User: you are well small to speed and (wakes up word)!
Equipment: there is anything that can help you? (equipment feedback indicates to be in information re-ception state)
User: I will go to the airport.
Equipment: start to navigate for you and go to the airport.
Different wake-up modes are used in combination the present invention, realize the knot of a language through wake-up mode and common wake-up mode It closes, allows user and voice system when interacting more naturally, feedback rapidly, uses one when using common wake-up mode The through wake-up mode of language will not be interfered by feedback sound, avoided when the two combines and led to the problem of.
Fig. 4 is a kind of complete job flow chart of voice awakening method of the present invention.As shown in figure 4, complete job process packet It includes:
The current input voice of 401 pairs of acquisitions carries out speech feature extraction;
402 determine current input voice according to the keyword detection model constructed in advance according to the phonetic feature extracted In with the presence or absence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word;
403, when determining in current input voice in the presence of word is waken up, further judge to whether there is in preset time threshold Instruct word;
404 when judging to have instruction word in preset time threshold, judges that current speech wake-up mode is through for a language Wake-up mode identifies described instruction word and executes instruction;
405 when judging the no instruction word in preset time threshold, judges current speech wake-up mode commonly to call out The mode of waking up makes after the first preset time and wakes up word feedback.
A kind of embodiment provided according to the present invention, as shown in figure 5, a kind of voice Rouser, including above-mentioned voice are called out Awake method, further includes:
Speech feature extraction unit 100, for carrying out speech feature extraction to the current input voice of acquisition;
Specifically, whether the monitoring of speech feature extraction unit 100 has voice input.In this step, it can use existing Acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum coefficient.
Word detection unit 200 is waken up, for the phonetic feature that basis is extracted, according to the keyword detection mould constructed in advance Type determines that the keyword in the keyword detection model includes at least default with the presence or absence of word is waken up in current input voice Wake-up word;
Specifically, with the presence or absence of before waking up word in detection input voice, first having to build in the embodiment of the present invention Keyword detection model, the specific mode for constructing keyword detection model are as follows:
In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".
Word judging unit 300 is instructed, for when waking up word, further judgement to be default when existing in determining current input voice With the presence or absence of instruction word in time threshold;
Specifically, it is energy that instruction word, which is used to refer to the characteristics of showing the phrase that the smart machine executes corresponding operating, instructing word, Reflect the proprietary function of the smart machine, for example " navigating to " is highly relevant with the equipment (such as automobile) with navigation feature , " broadcasting " is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, and instruction word can be straight The reversed intention for mirroring user.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from input voice A frame speech feature vector is extracted in signal.
When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.
Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.
Voice wake-up mode judging unit 400, for according to current with the presence or absence of instruction word judgement in preset time threshold Voice wake-up mode;
Feedback unit 500, for making corresponding feedback according to voice wake-up mode.
Specifically, the embodiment of the present invention is judged by judging the instruction word for whether having user to issue in preset time threshold It is currently any wake-up mode, corresponding feedback is made according to different wake-up modes, different wake-up modes are used in combination, It is more natural when allowing user to interact with voice system.
Voice Rouser in the embodiment of the present invention can be the smart machine with voice interactive function, such as: electricity Depending on, mobile phone, computer, intelligent refrigerator etc..
Another embodiment provided according to the present invention, a kind of voice Rouser, including above-mentioned voice awakening method, also Include:
Speech feature extraction unit 100, for carrying out speech feature extraction to the current input voice of acquisition;
Specifically, whether the monitoring of speech feature extraction unit 100 has voice input.In this step, it can use existing Acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum coefficient.
Word detection unit 200 is waken up, for the phonetic feature that basis is extracted, according to the keyword detection mould constructed in advance Type determines that the keyword in the keyword detection model includes at least default with the presence or absence of word is waken up in current input voice Wake-up word;
Specifically, with the presence or absence of before waking up word in detection input voice, first having to build in the embodiment of the present invention Keyword detection model, the specific mode for constructing keyword detection model are as follows:
In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".
Word judging unit 300 is instructed, for when waking up word, further judgement to be default when existing in determining current input voice With the presence or absence of instruction word in time threshold;
Specifically, it is energy that instruction word, which is used to refer to the characteristics of showing the phrase that the smart machine executes corresponding operating, instructing word, Reflect the proprietary function of the smart machine, for example " navigating to " is highly relevant with the equipment (such as automobile) with navigation feature , " broadcasting " is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, and instruction word can be straight The reversed intention for mirroring user.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from input voice A frame speech feature vector is extracted in signal.
When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.
Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.
Voice wake-up mode judging unit 400 is specifically used for sentencing when judging to have instruction word in preset time threshold Disconnected current speech wake-up mode is the through wake-up mode of a language;
Feedback unit 500 is specifically used for when judge current speech wake-up mode is the through wake-up mode of a language, identification institute It states instruction word and executes instruction.
Specifically, the embodiment of the present invention is judged by judging the instruction word for whether having user to issue in preset time threshold Currently it is any wake-up mode, when judge current awake mode is the through wake-up mode of a language, identifies described instruction word simultaneously It executes instruction.
Specifically, a language goes directly, wake-up mode is also referred to as One-shot (a language joint control) mode, One-shot, using " calling out The integrated mode of awake word+voice semantics recognition ", realizes and wakes up zero interval, zero-lag, seamless interfacing between word and speech control, The form of the question-response of abandoning tradition greatly reduces the step of user speech manipulates, and realizes information feedback, simplifies, real Existing easy operation.
The big feature of the one of One-shot is that identification wake-up is integrated with semantic understanding, guarantees uniformity and the company of interactive voice Coherence completes manipulation.One-shot function realizes " waking up word+voice semantics recognition " integration, than strictly according to the facts in being with a language Existing such interaction:
User: you are well small to speed, I will go to the airport.
Equipment: start to navigate for you and go to the airport.
Different wake-up modes are used in combination the present invention, realize the knot of a language through wake-up mode and common wake-up mode It closes, allows user and voice system when interacting more naturally, feedback rapidly, uses one when using common wake-up mode The through wake-up mode of language will not be interfered by feedback sound, avoided when the two combines and led to the problem of.
Another embodiment provided according to the present invention, a kind of voice Rouser, including above-mentioned voice awakening method, also Include:
Speech feature extraction unit 100, for carrying out speech feature extraction to the current input voice of acquisition;
Specifically, whether the monitoring of speech feature extraction unit 100 has voice input.In this step, it can use existing Acoustic model assessment carries out feature extraction to current input voice.Wherein, phonetic feature can be frequency spectrum or cepstrum coefficient.
Word detection unit 200 is waken up, for the phonetic feature that basis is extracted, according to the keyword detection mould constructed in advance Type determines that the keyword in the keyword detection model includes at least default with the presence or absence of word is waken up in current input voice Wake-up word;
Specifically, with the presence or absence of before waking up word in detection input voice, first having to build in the embodiment of the present invention Keyword detection model, the specific mode for constructing keyword detection model are as follows:
In general, user is if it is intended to use voice interactive function, it may be said that pre-set keyword, which can Think wake-up word, or instruction word.Wherein, waking up word is the phrase for waking up speech recognition device, wakes up word and usually selects With the more phrase of sounding initial consonant, such as comprising Chinese character start with initial consonants such as m, n, l, r in phrase, because of the presence of sounding initial consonant Vocal cord vibration can be distinguished preferably with ambient noise, have preferable noise immunity, such as: waking up word can be set to " you It is good " or " ".
Word judging unit 300 is instructed, for when waking up word, further judgement to be default when existing in determining current input voice With the presence or absence of instruction word in time threshold;
Specifically, it is energy that instruction word, which is used to refer to the characteristics of showing the phrase that the smart machine executes corresponding operating, instructing word, Reflect the proprietary function of the smart machine, for example " navigating to " is highly relevant with the equipment (such as automobile) with navigation feature , " broadcasting " is usually highly relevant with the equipment (such as TV and mobile phone) with multimedia function, and instruction word can be straight The reversed intention for mirroring user.Phonetic feature can be frequency spectrum or cepstrum coefficient etc., and every 10 milliseconds can be from input voice A frame speech feature vector is extracted in signal.
When due to detecting instruction word from current input voice, the instruction word might not identity user what is said or talked about is Phonetic order, it is also possible to which in current input voice by chance containing instruction word, but user is not intended to be the instruction word.Example Such as: user says the pronunciation containing similar " navigating to " inside " Huludao City navigation channel ", but the instruction that is really not intended to of user is led It navigates to some destination.Wherein, method in the prior art can be used by carrying out semantic parsing to current input voice, for example, The method based on template matching, or the method based on sequence labelling can be used, specific processing mode no longer carries out detailed herein Thin description.
Specifically, preset time threshold can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 200ms, i.e., when instruction word is not detected in 200ms, then illustrate user and aphalangia It enables and issuing, only wake up smart machine, then enter common awakening mode, give user and wake up the voices such as feedback, such as " I " Information.
Voice wake-up mode judging unit 400 is specifically used for when judging the no instruction word in preset time threshold, Judge current speech wake-up mode for common wake-up mode;
Feedback unit 500 is specifically used for when judging current speech wake-up mode for common wake-up mode, default first It is made after time and wakes up word feedback.
Specifically, the first preset time can be configured according to different voice interactive systems in the present embodiment, specifically It is not construed as limiting, for example generally can be set to 400ms, i.e., when judging current speech wake-up mode for common wake-up mode, Make after 400ms and wake up word feedback, for example issue voice messaging " I " etc..
Common wake-up mode is question-response form, and user, which issues, wakes up word, needs equipment to feed back standby information, then It can start to interact, such as:
User: you are well small to speed and (wakes up word)!
Equipment: there is anything that can help you? (equipment feedback indicates to be in information re-ception state)
User: I will go to the airport.
Equipment: start to navigate for you and go to the airport.
Different wake-up modes are used in combination the present invention, realize the knot of a language through wake-up mode and common wake-up mode It closes, allows user and voice system when interacting more naturally, feedback rapidly, uses one when using common wake-up mode The through wake-up mode of language will not be interfered by feedback sound, avoided when the two combines and led to the problem of.
Specifically, the present invention judges currently to be by judging the instruction word for whether having user to issue in preset time threshold Any wake-up mode makes corresponding feedback according to different wake-up modes, different wake-up modes is used in combination, and allows user It is more natural when being interacted with voice system.
Voice Rouser in the embodiment of the present invention can be the smart machine with voice interactive function, such as: electricity Depending on, mobile phone, computer, intelligent refrigerator etc..
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferred Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims (10)

1. a kind of voice awakening method characterized by comprising
Speech feature extraction is carried out to the current input voice of acquisition;
According to the phonetic feature extracted, according to the keyword detection model constructed in advance, determine in current input voice whether In the presence of word is waken up, the keyword in the keyword detection model includes at least preset wake-up word;
When determining current input in voice in the presence of word is waken up, further judging, which whether there is in preset time threshold, instructs word; Keyword in the keyword detection model includes at least preset instruction word;
Current speech wake-up mode is judged with the presence or absence of instruction word according in preset time threshold;
Corresponding feedback is made according to voice wake-up mode.
2. voice awakening method according to claim 1, which is characterized in that the step is " according in preset time threshold Current speech wake-up mode is judged with the presence or absence of instruction word " include:
When judging to have instruction word in preset time threshold, judge current speech wake-up mode for the through wake-up side of a language Formula.
3. voice awakening method according to claim 2, which is characterized in that the step " is made according to voice wake-up mode Corresponding feedback out " includes:
When judge current speech wake-up mode for a language go directly wake-up mode when, identification described instruction word simultaneously execute instruction.
4. voice awakening method according to claim 1, which is characterized in that the step is " according in preset time threshold Current speech wake-up mode is judged with the presence or absence of instruction word " include:
When judging the no instruction word in preset time threshold, judge current speech wake-up mode for common wake-up mode.
5. voice awakening method according to claim 4, which is characterized in that the step " is made according to voice wake-up mode Corresponding feedback out " includes:
When judging current speech wake-up mode for common wake-up mode, is made after the first preset time and wake up word feedback.
6. a kind of voice Rouser, which is characterized in that including voice wake-up side described in the claims 1-5 any one Method, further includes:
Speech feature extraction unit, for carrying out speech feature extraction to the current input voice of acquisition;
Word detection unit is waken up, for determining according to the phonetic feature extracted according to the keyword detection model constructed in advance With the presence or absence of word is waken up in current input voice, the keyword in the keyword detection model includes at least preset wake-up Word;
Word judging unit being instructed, being used to further judge preset time threshold when there is wake-up word in the current input voice of judgement With the presence or absence of instruction word in value;
Voice wake-up mode judging unit, for judging that current speech wakes up with the presence or absence of instruction word according in preset time threshold Mode;
Feedback unit, for making corresponding feedback according to voice wake-up mode.
7. voice Rouser according to claim 6, which is characterized in that the voice wake-up mode judging unit is specific For when judging to have instruction word in preset time threshold, judging current speech wake-up mode for the through wake-up side of a language Formula.
8. voice Rouser according to claim 7, which is characterized in that the feedback unit is specifically used for working as when judgement Preceding voice wake-up mode be a language go directly wake-up mode when, identification described instruction word simultaneously execute instruction.
9. voice Rouser according to claim 6, which is characterized in that the voice wake-up mode judging unit is specific For when judging the no instruction word in preset time threshold, judging current speech wake-up mode for common wake-up mode.
10. voice Rouser according to claim 9, which is characterized in that the feedback unit is specifically used for when judgement When current speech wake-up mode is common wake-up mode, is made after the first preset time and wake up word feedback.
CN201811369606.7A 2018-11-16 2018-11-16 A kind of voice awakening method and device Pending CN109545207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811369606.7A CN109545207A (en) 2018-11-16 2018-11-16 A kind of voice awakening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811369606.7A CN109545207A (en) 2018-11-16 2018-11-16 A kind of voice awakening method and device

Publications (1)

Publication Number Publication Date
CN109545207A true CN109545207A (en) 2019-03-29

Family

ID=65847787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811369606.7A Pending CN109545207A (en) 2018-11-16 2018-11-16 A kind of voice awakening method and device

Country Status (1)

Country Link
CN (1) CN109545207A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211589A (en) * 2019-06-05 2019-09-06 广州小鹏汽车科技有限公司 Awakening method, device and vehicle, the machine readable media of onboard system
CN110310633A (en) * 2019-05-23 2019-10-08 北京百度网讯科技有限公司 Multitone area audio recognition method, terminal device and storage medium
CN111128171A (en) * 2019-12-31 2020-05-08 云知声智能科技股份有限公司 Setting method and device based on voice recognition
CN111508492A (en) * 2020-04-20 2020-08-07 九牧厨卫股份有限公司 Intelligent closestool based on voice control
CN111768783A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Voice interaction control method, device, electronic equipment, storage medium and system
CN111833870A (en) * 2020-07-01 2020-10-27 中国第一汽车股份有限公司 Awakening method and device of vehicle-mounted voice system, vehicle and medium
CN111986682A (en) * 2020-08-31 2020-11-24 百度在线网络技术(北京)有限公司 Voice interaction method, device, equipment and storage medium
CN112133307A (en) * 2020-08-31 2020-12-25 百度在线网络技术(北京)有限公司 Man-machine interaction method and device, electronic equipment and storage medium
CN112489648A (en) * 2020-11-25 2021-03-12 广东美的制冷设备有限公司 Wake-up processing threshold adjustment method, voice home appliance, and storage medium
CN112634897A (en) * 2020-12-31 2021-04-09 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN117012206A (en) * 2023-10-07 2023-11-07 山东省智能机器人应用技术研究院 Man-machine voice interaction system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
WO2016161641A1 (en) * 2015-04-10 2016-10-13 华为技术有限公司 Voice recognition method, voice wake-up device, voice recognition device and terminal
CN106782554A (en) * 2016-12-19 2017-05-31 百度在线网络技术(北京)有限公司 Voice awakening method and device based on artificial intelligence
CN107146612A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 Voice guide method, device, smart machine and server
CN108132805A (en) * 2017-12-20 2018-06-08 深圳Tcl新技术有限公司 Voice interactive method, device and computer readable storage medium
US20180315424A1 (en) * 2014-10-09 2018-11-01 Google Llc Hotword detection on multiple devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180315424A1 (en) * 2014-10-09 2018-11-01 Google Llc Hotword detection on multiple devices
WO2016161641A1 (en) * 2015-04-10 2016-10-13 华为技术有限公司 Voice recognition method, voice wake-up device, voice recognition device and terminal
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN106782554A (en) * 2016-12-19 2017-05-31 百度在线网络技术(北京)有限公司 Voice awakening method and device based on artificial intelligence
CN107146612A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 Voice guide method, device, smart machine and server
CN108132805A (en) * 2017-12-20 2018-06-08 深圳Tcl新技术有限公司 Voice interactive method, device and computer readable storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310633A (en) * 2019-05-23 2019-10-08 北京百度网讯科技有限公司 Multitone area audio recognition method, terminal device and storage medium
CN110211589A (en) * 2019-06-05 2019-09-06 广州小鹏汽车科技有限公司 Awakening method, device and vehicle, the machine readable media of onboard system
CN111128171A (en) * 2019-12-31 2020-05-08 云知声智能科技股份有限公司 Setting method and device based on voice recognition
CN111508492B (en) * 2020-04-20 2023-02-14 九牧厨卫股份有限公司 Intelligent closestool based on voice control
CN111508492A (en) * 2020-04-20 2020-08-07 九牧厨卫股份有限公司 Intelligent closestool based on voice control
CN111768783A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Voice interaction control method, device, electronic equipment, storage medium and system
CN111768783B (en) * 2020-06-30 2024-04-02 北京百度网讯科技有限公司 Voice interaction control method, device, electronic equipment, storage medium and system
CN111833870A (en) * 2020-07-01 2020-10-27 中国第一汽车股份有限公司 Awakening method and device of vehicle-mounted voice system, vehicle and medium
CN111986682A (en) * 2020-08-31 2020-11-24 百度在线网络技术(北京)有限公司 Voice interaction method, device, equipment and storage medium
CN112133307A (en) * 2020-08-31 2020-12-25 百度在线网络技术(北京)有限公司 Man-machine interaction method and device, electronic equipment and storage medium
CN112489648A (en) * 2020-11-25 2021-03-12 广东美的制冷设备有限公司 Wake-up processing threshold adjustment method, voice home appliance, and storage medium
CN112489648B (en) * 2020-11-25 2024-03-19 广东美的制冷设备有限公司 Awakening processing threshold adjusting method, voice household appliance and storage medium
CN112634897A (en) * 2020-12-31 2021-04-09 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN117012206A (en) * 2023-10-07 2023-11-07 山东省智能机器人应用技术研究院 Man-machine voice interaction system
CN117012206B (en) * 2023-10-07 2024-01-16 山东省智能机器人应用技术研究院 Man-machine voice interaction system

Similar Documents

Publication Publication Date Title
CN109545207A (en) A kind of voice awakening method and device
CN109243462A (en) A kind of voice awakening method and device
US11133027B1 (en) Context driven device arbitration
CN108000526B (en) Dialogue interaction method and system for intelligent robot
US20230367546A1 (en) Audio output control
WO2017071182A1 (en) Voice wakeup method, apparatus and system
US10649727B1 (en) Wake word detection configuration
CN111344780A (en) Context-based device arbitration
CN107704169B (en) Virtual human state management method and system
US20170116994A1 (en) Voice-awaking method, electronic device and storage medium
CN110914828B (en) Speech translation method and device
CN111768783B (en) Voice interaction control method, device, electronic equipment, storage medium and system
CN106356059A (en) Voice control method, device and projector
CN105719647A (en) Background Speech Recognition Assistant Using Speaker Verification
JP2014203207A (en) Information processing unit, information processing method, and computer program
CN110047481A (en) Method for voice recognition and device
CN112735418B (en) Voice interaction processing method, device, terminal and storage medium
CN103680505A (en) Voice recognition method and voice recognition system
CN109697981B (en) Voice interaction method, device, equipment and storage medium
CN109166571A (en) Wake-up word training method, device and the household appliance of household appliance
CN110767240B (en) Equipment control method, equipment, storage medium and device for identifying child accent
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium
US11856674B1 (en) Content-based light illumination
CN110660393B (en) Voice interaction method, device, equipment and storage medium
CN115831109A (en) Voice awakening method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190329