CN110534102A - A kind of voice awakening method, device, equipment and medium - Google Patents
A kind of voice awakening method, device, equipment and medium Download PDFInfo
- Publication number
- CN110534102A CN110534102A CN201910887674.0A CN201910887674A CN110534102A CN 110534102 A CN110534102 A CN 110534102A CN 201910887674 A CN201910887674 A CN 201910887674A CN 110534102 A CN110534102 A CN 110534102A
- Authority
- CN
- China
- Prior art keywords
- wake
- word
- voice signal
- waking
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000002618 waking effect Effects 0.000 claims abstract description 177
- 238000004321 preservation Methods 0.000 claims abstract description 51
- 238000012512 characterization method Methods 0.000 claims description 45
- 238000004422 calculation algorithm Methods 0.000 claims description 37
- 238000012545 processing Methods 0.000 claims description 26
- 230000005236 sound signal Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 239000011159 matrix material Substances 0.000 description 33
- 230000004044 response Effects 0.000 description 24
- 230000006870 function Effects 0.000 description 10
- 230000002123 temporal effect Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 239000000284 extract Substances 0.000 description 6
- 230000002708 enhancing effect Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000037007 arousal Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Electric Clocks (AREA)
Abstract
This application discloses a kind of voice awakening method, device, equipment and media, are applied to field of artificial intelligence, to solve the problems, such as that it is single that voice awakening method in the prior art has wake-up word.Specifically: obtain voice signal to be processed;If recording information according to the wake-up of preservation, determine that having passed through the first wake-up word wakes up terminal device, then when waking up word comprising second in determining voice signal, wakes up word by second and wake up terminal device;If recording information according to the wake-up of preservation, determines that not waking up word by first wakes up terminal device, then when waking up word comprising first in determining voice signal, wake up word by first and wake up the terminal device.In this way, can wake up word after waking up terminal device by the first wake-up word by second and wake up terminal device, so as to be accustomed to according to practical language, it is arranged second for terminal device and wakes up word, and then realize the diversification for waking up word.
Description
Technical field
This application involves field of artificial intelligence more particularly to a kind of voice awakening method, device, equipment and media.
Background technique
With the constantly development of artificial intelligence technology, the terminal device control system waken up based on voice is also constantly being sent out
Exhibition, wherein voice wakes up the entrance as controlling terminal equipment, is increasingly becoming the research hotspot of field of artificial intelligence.
Currently, user, which can wake up terminal device and controlling terminal equipment by voice, executes corresponding operating, to user with
Many conveniences are carried out, however, current voice awakening method still has some problems, for example, it is relatively simple etc. to wake up word.
Summary of the invention
The embodiment of the present application provides a kind of voice awakening method, device, equipment and medium, to solve in the prior art
Voice awakening method existing wake up the more single problem of word.
Technical solution provided by the embodiments of the present application is as follows:
On the one hand, the embodiment of the present application provides a kind of voice awakening method, comprising:
Obtain voice signal to be processed;
Information is recorded according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, wherein wake up
Record information is the flag information that characterization wakes up word execution wake operation using first;
If so, passing through second when determining in voice signal comprising the second wake-up word and waking up word wake-up terminal device;
If not, it is determined that when waking up word comprising first in voice signal, wake up word by first and wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up word of the first wake-up word and customized calls out
At least one of awake word.
In a kind of possible embodiment, determine in voice signal comprising second wake up word before, further includes:
Information is recorded according to the wake-up of preservation, determines and is waking up the time interval after word wakes up terminal device by first
It is interior, it receives characterization and opens the simple voice signal for waking up word mode.
In a kind of possible embodiment, voice awakening method provided by the embodiments of the present application further include:
If recording information according to the wake-up of preservation, determines and waking up the time interval after word wakes up terminal device by first
It is interior, characterization is not received opens the simple voice signal for waking up word mode, it is determined that when waking up word comprising first in voice signal,
Word, which is waken up, by first wakes up terminal device.
In a kind of possible embodiment, determine in voice signal comprising second wake up word before, further includes:
Information is recorded according to the wake-up of preservation, determines that the time interval after waking up terminal device by the first wake-up word is being set
It fixes time in range.
In a kind of possible embodiment, voice awakening method provided by the embodiments of the present application further include:
If recording information according to the wake-up of preservation, the time interval after waking up terminal device by the first wake-up word is determined not
Within the scope of setting time, it is determined that when waking up word comprising first in voice signal, wake up word by first and wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up of the first wake-up word, the first wake-up word
At least one of word and customized wake-up word.
In a kind of possible embodiment, after waking up word wake-up terminal device by first, further comprise:
According to the current flag information for executing wake operation, the wake-up record information of preservation is updated.
In a kind of possible embodiment, determines in voice signal and wakes up word comprising second, comprising:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame
It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model
Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter
Number correspond to each second wake up word wake-up confidence level;
When detecting that voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence
Word is waken up comprising second in speech signal.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech
Other posterior probability obtains wake-up confidence level of the voice signal corresponding to each the second wake-up word, comprising:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class
Posterior probability counted using the Hidden Markov Model (Hidden Markov Model, HMM) based on viterbi algorithm
It is worth highest second and wakes up word path score, and numerical value highest second is waken up into word path score, is determined as voice signal pair
The wake-up confidence level that word should be waken up in second.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech
Other posterior probability obtains wake-up confidence level of the voice signal corresponding to each the second wake-up word, comprising:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path score and numerical value most using the HMM based on viterbi algorithm
High non-wake-up word path score, and numerical value highest second is waken up into word path score and the highest non-wake-up word path of numerical value
The difference of score is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, determines in voice signal and wakes up word comprising first, comprising:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame
It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model
Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter
Number correspond to first wake up word wake-up confidence level;
When detecting that voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, voice letter is determined
Word is waken up comprising first in number.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech
Other posterior probability obtains the wake-up confidence level that voice signal wakes up word corresponding to first, comprising:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score than the HMM of algorithm;
Numerical value highest first is waken up into word path score, the wake-up for being determined as voice signal corresponding to the first wake-up word is set
Reliability.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech
Other posterior probability obtains the wake-up confidence level that voice signal wakes up word corresponding to first, comprising:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score and the highest non-wake-up word path score of numerical value than the HMM of algorithm;
Numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of numerical value, is determined
Correspond to the first wake-up confidence level for waking up word for voice signal.
On the other hand, the embodiment of the present application provides a kind of voice Rouser, comprising:
Signal acquiring unit, for obtaining voice signal to be processed;
First judging unit judges whether that having passed through the first wake-up word wakes up for recording information according to the wake-up of preservation
Terminal device, wherein waking up record information is the flag information that characterization executes wake operation using the first wake-up word;
First wakeup unit, if having passed through the first wake-up word for the judgement of the first judging unit wakes up terminal device, really
When waking up word comprising second in speech signal, word is waken up by second and wakes up terminal device;
Second wakeup unit, if determining that not waking up word by first wakes up terminal device for the first judging unit, really
When waking up word comprising first in speech signal, word is waken up by first and wakes up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up word of the first wake-up word and customized calls out
At least one of awake word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Second judgment unit, for the first wakeup unit determine in voice signal comprising second wake up word before, according to
The wake-up of preservation records information, determines and wakes up in the time interval after word wakes up terminal device by first, receives characterization
Open the simple voice signal for waking up word mode.
In a kind of possible embodiment, the second wakeup unit is also used to:
If second judgment unit records information according to the wake-up of preservation, determines and waking up word wake-up terminal device by first
In time interval afterwards, characterization is not received and opens the simple voice signal for waking up word mode, it is determined that includes in voice signal
When the first wake-up word, passes through first and wake up word wake-up terminal device.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Third judging unit, for the first wakeup unit determine in voice signal comprising second wake up word before, according to
The wake-up of preservation records information, determines through the time interval after the first wake-up word wake-up terminal device in setting time range
It is interior.
In a kind of possible embodiment, the second wakeup unit is also used to:
If third judging unit records information according to the wake-up of preservation, determine after waking up word wake-up terminal device by first
Time interval not within the scope of setting time, it is determined that in voice signal comprising first wake up word when, pass through first wake up word
Wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up of the first wake-up word, the first wake-up word
At least one of word and customized wake-up word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Information updating unit, for being waken up after word wakes up terminal device in the second wakeup unit by first, according to the
Two wakeup units currently execute the flag information of wake operation, update the wake-up record information of preservation.
In a kind of possible embodiment, when waking up word comprising second in determining voice signal, the first wakeup unit
It is specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame
It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model
Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter
Number correspond to each second wake up word wake-up confidence level;
When detecting that voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence
Word is waken up comprising second in speech signal.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit
It is specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path score using the HMM based on viterbi algorithm, and by numerical value
Highest second wakes up word path score, is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit
It is specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path score and numerical value most using the HMM based on viterbi algorithm
High non-wake-up word path score, and numerical value highest second is waken up into word path score and the highest non-wake-up word path of numerical value
The difference of score is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, when waking up word comprising first in determining voice signal, the second wakeup unit
It is specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame
It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model
Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter
Number correspond to first wake up word wake-up confidence level;
When detecting that voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, voice letter is determined
Word is waken up comprising first in number.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit is specifically used
In:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score than the HMM of algorithm;
Numerical value highest first is waken up into word path score, the wake-up for being determined as voice signal corresponding to the first wake-up word is set
Reliability.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit is specifically used
In:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score and the highest non-wake-up word path score of numerical value than the HMM of algorithm;
Numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of numerical value, is determined
Correspond to the first wake-up confidence level for waking up word for voice signal.
On the other hand, the embodiment of the present application provides a kind of voice wake-up device, comprising: memory, processor and storage
On a memory and the computer program that can run on a processor, processor realize that the application is implemented when executing computer program
The voice awakening method that example provides.
On the other hand, the embodiment of the present application also provides a kind of computer readable storage medium, computer-readable storage mediums
Matter is stored with computer instruction, and voice wake-up side provided by the embodiments of the present application is realized when computer instruction is executed by processor
Method.
The embodiment of the present application has the beneficial effect that:
In the embodiment of the present application, after waking up word wake-up terminal device by first, word can be waken up by second and waken up eventually
End equipment wakes up word so as to be accustomed to according to practical language for terminal device setting second, and then realize and wake up the more of word
Sample, moreover, just can wake up terminal only after waking up word wake-up terminal device by first by the second wake-up word and set
It is standby, thus be avoided as much as because setting second wake up word it is too simple caused by false wake-up rate promoted the problem of.
Other features and advantage will illustrate in the following description, also, partly can be from specification
In become apparent, or understood and implementing the application.The purpose of the application and other advantages can be by written
The structure that particularly points out in specification, claims and attached drawing is achieved and obtained.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is the flow diagram of acoustics identification model method for building up in the embodiment of the present application;
Fig. 2 is the system framework schematic diagram that voice wakes up system in the embodiment of the present application;
Fig. 3 is the process overview schematic diagram of voice awakening method in the embodiment of the present application;
Fig. 4 is that word mode and the simple voice wake-up side waken up when word mode mutually switches are singly waken up in the embodiment of the present application
A kind of idiographic flow schematic diagram of method;
Fig. 5 is that word mode and the simple voice wake-up side waken up when word mode mutually switches are singly waken up in the embodiment of the present application
Another idiographic flow schematic diagram of method;
Fig. 6 is that word mode and the voice awakening method waken up when word mode mutually switches are singly waken up in the embodiment of the present application more
A kind of idiographic flow schematic diagram;
Fig. 7 is that word mode and the voice awakening method waken up when word mode mutually switches are singly waken up in the embodiment of the present application more
Another idiographic flow schematic diagram;
Fig. 8 is the illustrative view of functional configuration of voice Rouser in the embodiment of the present application;
Fig. 9 is the hardware structural diagram of voice wake-up device in the embodiment of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand the application, the technical terms referred in the application are carried out first
Explanation.
1, voice wake up, for by carry wake up word voice signal, by mobile phone, computer, personal digital assistant
The terminal devices such as (Personal Digital Assistant, PDA), wearable device, smart home device, mobile unit from
Dormant state wakes up a kind of technology being in working condition.
2, client, in the embodiment of the present application, for can in installing terminal equipment, can real-time monitoring terminal equipment whether
Voice signal is received, further, it is possible in the wake-up recognition result according to voice signal, when determination needs to wake up terminal device,
A kind of application program of wake operation is executed to terminal device.
3, server is clients providing data library clothes for the request initiated according to client in the embodiment of the present application
Business, the running background equipment for calculating service, waking up all kinds of services such as identification service.
4, acoustics identification model is the voice feature data according at least one corresponding speech frame of voice signal, to this
At least one speech frame, which corresponds to, wakes up the deep learning model that the posterior probability of word class and non-wake-up word class is predicted.
In the embodiment of the present application, in order to reduce the calculation amount of client, the establishment process of acoustics identification model can taken
It is executed in business device, as shown in fig.1, the process of acoustics identification model method for building up is as follows:
S101: voice set to be trained is acquired, wherein include waking up word sound signal and non-calling out in voice set to be trained
Awake word sound signal.
In the embodiment of the present application, waking up word sound signal is to carry the voice signal for waking up word, non-wake-up word message
Number not carry the voice signal and ambient noise signal etc. that wake up word.
S102: being directed to each voice signal, carries out sub-frame processing to the voice signal, obtains at least one speech frame,
And feature extraction is carried out at least one speech frame, obtain the voice feature data of at least one speech frame.
S103: be directed to each voice signal, the voice feature data of at least one speech frame based on the voice signal,
Using acoustics identification model to be trained, at least one speech frame for obtaining the voice signal, which corresponds to, wakes up word class and non-wake-up
The posterior probability of word class.
S104: corresponded to according at least one speech frame of each voice signal and wake up word class and non-wake-up word class
Posterior probability and each voice signal at least one speech frame true generic, treated using loss function
Training acoustics identification model is trained, and obtains each model parameter, wherein true generic is in advance to voice signal
What each speech frame obtained after being labeled.
S105: according to each model parameter, acoustics identification model is generated.
5, based on the HMM of viterbi algorithm, word class is waken up to correspond to according at least one speech frame of voice signal
With the non-posterior probability for waking up word class, the highest machine learning model for waking up word path score of numerical value is sought.
6, word mode is waken up, in the application, to be corresponding with word class is waken up for waking up the mode of terminal device, including
But it is not limited to: single to wake up word mode, more wake-up word modes and simply wake up word mode.Wherein:
It is single to wake up word mode, it is single to wake up word mode in the application to wake up the mode that word wakes up terminal device by one
To wake up the mode that word wakes up terminal device by first, wherein the first wake-up word can include but is not limited to: standard wakes up
Word.Such as: small A small A, ABAB etc..
It is wake up word mode more, the mould that word wakes up terminal device is waken up to wake up any one in word by least two
Formula, in the application, wake up word modes be by least two second wake up any one in words second wake up word and wake up eventually
The mode of end equipment, wherein wake up under word mode, the second wake-up word can include but is not limited to: standard wakes up word, standard is called out more
Simple wake-up word and customized wake-up word of awake word etc..Such as: small A, AB, small A small A, ABAB etc..
It is simple to wake up word mode, to wake up the mode that word wakes up terminal device by least one, in the application, simply call out
Word mode of waking up is that the mode of terminal device is waken up by any one second wake-up word at least one second wake-up word,
In, simple to wake up under word mode, the second wake-up word can include but is not limited to: standard wakes up the simple wake-up word of word and customized
Wake up word etc..Such as: small A, AB etc..
7, record information is waken up, wakes up the flag information that word executes wake operation using first to characterize.In the application, mark
Will information can include but is not limited to: characterization has utilized the zone bit information and temporal information of the first wake-up word execution wake operation
Deng.Such as: temporal information can be but be not limited to: the timing time of timer, the count value of counter are called out using first
At the time of word of waking up executes wake operation;Zone bit information can be but be not limited to: characterization is called out using the first wake-up word execution
The flag bit " 1 " of operation, the characterization of waking up do not execute flag bit " 0 " of wake operation etc. using the first wake-up word.
It should be noted that " first " that refers in the application, " second " etc. are to be used to distinguish similar objects, without
It is used to describe a particular order or precedence order.It should be understood that such term is interchangeable under appropriate circumstances, so as to here
The embodiment of description can be implemented with the sequence other than the content for illustrating or describing herein.
In order to which the purpose, technical solution and beneficial effect of the application is more clearly understood, below with reference to the application reality
The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation
Example is only some embodiments of the present application, is not whole embodiments.Based on the embodiment in the application, this field is common
Technical staff's every other embodiment obtained without making creative work belongs to the model of the application protection
It encloses.
Currently, waking up the standard that word has usually been set in voice call service and waking up word, user can only pass through
Standard wakes up word and wakes up terminal device, and it is more single to wake up word, moreover, in order to reduce false wake-up rate, standard wakes up word and is usually
Folded word, such as: small A small A, ABAB etc., however, this kind of practical language habit for waking up word and not meeting user usually, it is possible to shadow
Ring the user experience of voice call service.
For this purpose, the embodiment of the present application provides a kind of voice wake-up system, as shown in fig.2, voice wake-up system can
To include the terminal device 202 and server 203 for being equipped with client 201, wherein client 201 can use terminal and set
Standby 202, it is communicatively coupled by communication network and server 203.In practical application, terminal device 202 can be with real-time reception
Voice signal, and the voice signal received is stored to buffer zone, client 201 can be obtained to from from buffer zone
The voice signal of reason, and wake-up identification is carried out to the voice signal, to be determined according to the wake-up recognition result of the voice signal
Whether terminal device 202 is waken up.Certainly, client 201 can also incite somebody to action after obtaining voice signal to be processed in buffer zone
The voice signal carries and is sent to server 203 in waking up identification request, and server 203 can be obtained from waking up in identification request
Voice signal to be processed is taken, and wake-up identification is carried out to the voice signal, according to the wake-up recognition result of the voice signal, really
Determine after whether waking up terminal device 202, the judgement result that whether will need to wake up terminal device 202, which carries, is waking up identification response
In be back to client 201, client 201 can identify the judgement result that response carries according to the wake-up, it is determined whether wake up
Terminal device 202.
In practical application, client 201 or server 203, can be using these when carrying out waking up identification to voice signal
Apply for the voice awakening method that embodiment provides, specifically, client 201 or server 203 can get it is to be processed
After voice signal, information is recorded according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, if it is determined that
Pass through the first wake-up word and waken up terminal device, then when further determining that in voice signal comprising the second wake-up word, passes through second
It wakes up word and wakes up terminal device;If it is determined that not waking up word by first wakes up terminal device, then further determine that in voice signal
When waking up word comprising first, word is waken up by first and wakes up terminal device.In this way, waking up word by first wakes up terminal device
Afterwards, word can be waken up by second and wake up terminal device, so as to be accustomed to according to practical language, for terminal device setting second
Word is waken up, and then realizes the diversification for waking up word, moreover, only just may be used after waking up word wake-up terminal device by first
With by second wake up word wake up terminal device, thus be avoided as much as because setting second wake up word it is too simple caused by
The problem of false wake-up rate is promoted.
It is tellable to be, in the embodiment of the present application, wake-up knowledge is carried out to voice signal by client 201 and is claimed otherwise
To wake up offline, voice signal wake up by server 203 and knows referred to as online wake-up otherwise, the embodiment of the present application
The voice awakening method of offer is not only suitable for waking up offline, is also applied for waking up online.In addition, it should be understood that the terminal in Fig. 2
The number of equipment, communication network and server is only schematical, can have any number of terminal according to actual needs
Equipment, communication network and server.In practical application, when the voice wake-up device for running voice awakening method do not need with
It can only include being waken up for running the voice of voice awakening method when other equipment carry out data transmission, in voice wake-up system
Equipment, for example, can only include terminal device or server in voice wake-up system.
After the application scenarios and design philosophy for describing the embodiment of the present application, below to provided by the embodiments of the present application
Technical solution is illustrated.
The embodiment of the present application provides a kind of voice awakening method, which can be applied to be mounted on terminal
Client in equipment, also can be applied to server, specifically, as shown in fig.3, voice provided by the embodiments of the present application is called out
The process for method of waking up is as follows:
S301: voice signal to be processed is obtained.
In practical application, terminal device can be with real-time reception voice signal, and the voice signal received is stored to slow
Region is deposited, the client installed on the terminal device can obtain voice signal to be processed from buffer zone.Certainly, when this
When the voice awakening method for applying for that embodiment provides is applied to server, client is obtaining language to be processed from buffer zone
After sound signal, can also the voice signal carry and be sent to server in waking up identification request, server can be from receiving
Wake-up identification request in obtain voice signal to be processed.
S302: recording information according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, if
It is then to execute S303;If it is not, then executing S304.
In the specific implementation, when executing S303, the zone bit information that information include can be recorded according to wake-up, judgement is
The no first wake-up word that passed through wakes up terminal device.Such as: assuming that flag bit " 1 " characterization has utilized the first wake-up word to execute wake-up
Operation, flag bit " 0 " characterization do not wake up word using first and execute wake operation, then can record information determining the wake-up saved
When the flag bit for including is " 1 ", judgement has utilized the first wake-up word to execute wake operation, records information in the wake-up for determining preservation
When the flag bit for including is " 0 ", determine that not waking up word using first executes wake operation.
In practical application, if determining that having passed through first calls out according to the zone bit information that the wake-up record information of preservation includes
Word of waking up wakes up terminal device, then in one embodiment, can directly determine current awake word mode is simple wake-up word mode,
In such cases, at least one in the second wake-up word such as simple wake-up word of word can be waken up based on customized wake-up word and standard
A second wakes up word, carries out wake-up identification to the voice signal of acquisition, i.e. execution S303.
If the zone bit information that the wake-up record information according to preservation includes, determine that having passed through the first wake-up word wakes up terminal
Equipment then in another embodiment can judge passing through the further according to the record information temporal information that include is waken up
One wakes up in the time interval after word wakes up terminal device, if receives characterization and opens the simple voice letter for waking up word mode
Number, if it is determined that receiving characterization opens the simple voice signal for waking up word mode, then current awake word mode can be determined for letter
It is single to wake up word mode, in such cases, simple wake-up word of word etc. second can be waken up based on customized wake-up word and standard and is called out
At least one of word second of waking up wakes up word, carries out wake-up identification to the voice signal of acquisition, i.e. execution S303;If it is determined that not connecing
It receives characterization and opens the simple voice signal for waking up word mode, then can determine that current awake word mode wakes up word mode to be single,
In such cases, word etc. first can be waken up based on standard and wakes up word, wake-up identification is carried out to the voice signal of acquisition, that is, is executed
S304 further after executing S304, can also update calling out for preservation according to the current flag information for executing wake operation
It wakes up and records information.
If the zone bit information that the wake-up record information according to preservation includes, determine that having passed through the first wake-up word wakes up terminal
Equipment then in another embodiment further can also record information according to the wake-up of preservation, judgement wakes up word by first
Whether the time interval after waking up terminal device is within the scope of setting time.Specifically, can include according to record information is waken up
Temporal information, judgement by first wake up word wake up terminal device after time interval whether within the scope of setting time.Example
Such as: assuming that temporal information is the timing time of timer, then can be not more than time threshold in the timing time for determining timer
When, determine through the time interval after the first wake-up word wake-up terminal device within the scope of setting time, is determining timer
When timing time is greater than time threshold, determine through the time interval after the first wake-up word wake-up terminal device not in setting time
In range.
Wherein, if according to the temporal information that the wake-up record information of preservation includes, determine that waking up word by first wakes up eventually
Time interval after end equipment then may further determine that current awake word mode is more wake-up word moulds within the scope of setting time
Formula can wake up word, customized simple wake-up word for waking up word and standard wake-up word etc. second based on standard and call out in such cases
At least two second in awake word wake up words, carry out wake-up identification to the voice signal of acquisition, i.e. execution S303;If according to preservation
The wake-up record information temporal information that includes, determine that the time interval after waking up terminal device by the first wake-up word is not being set
It fixes time in range, then may further determine that current awake word mode can be based in such cases for single word mode that wakes up
Standard wakes up word etc. first and wakes up word, carries out wake-up identification to the voice signal of acquisition, i.e. execution S304 is further being held
After row S304, the wake-up record information of preservation according to the current flag information for executing wake operation, can also be updated.
In practical application, if determining and not called out by first according to the zone bit information that the wake-up record information of preservation includes
Word of waking up wakes up terminal device, then may further determine that current awake word mode wakes up word mode to be single, in such cases, can be with
Word etc. first is waken up based on standard and wakes up word, wake-up identification is carried out to the voice signal of acquisition, i.e. execution S304, further,
After executing S304, the wake-up record information of preservation according to the current flag information for executing wake operation, can also be updated.
S303: when determining in voice signal comprising the second wake-up word, pass through second and wake up word wake-up terminal device.
In practical application, when executing S303, it can use but be not limited to following manner:
Firstly, carrying out sub-frame processing to the voice signal, at least one speech frame is obtained, and at least to the voice signal
One speech frame carries out feature extraction, obtains the voice feature data of at least one speech frame of the voice signal.
Tellable to be, because being influenced by factors such as gains, the volume of some voice signals is likely to be at reduced levels, base
In this, in the embodiment of the present application, before carrying out sub-frame processing to the voice signal, automatic growth control can also be used
(Automatic Gain Control, AGC) technology, enhances the volume of the voice signal, so that the voice signal that volume is too low
Reach the level that can be identified.
Further, enhance the voice signal volume after, mobile window function can be used, to the voice signal into
Row sub-frame processing obtains at least one speech frame of the voice signal, wherein there may be parts to hand between every two speech frame
It is folded.Such as: it can be 25ms according to frame length, the framing mode that frame shifting is 10ms, sub-frame processing is carried out to the voice signal, this
Sample, the length of obtained each speech frame are 25 milliseconds, and have the friendship of 25ms-10ms=15ms between every two speech frame
It is folded.
Further, to the voice signal carry out sub-frame processing, obtain the voice signal at least one speech frame it
Afterwards, at least one speech frame of the voice signal, mel cepstrum coefficients (Mel-scale Frequency can be used
Cepstral Coefficients, MFCC), the voice feature data of at least one speech frame of the voice signal is obtained, that is, is obtained
Obtain the speech feature vector of at least one speech frame of the voice signal.
Then, the voice feature data of at least one speech frame based on the voice signal is obtained using acoustics identification model
At least one speech frame to the voice signal corresponds to wake-up word class and the non-posterior probability for waking up word class.Wherein, language
Sound frame correspond to wake up word class posterior probability include: the speech frame correspond to wake up word (the wake-up word be it is simple wake up word,
Such as: small A, AB etc.) include each phoneme posterior probability;Speech frame corresponds to the non-posterior probability packet for waking up word class
Include: the speech frame corresponds to the posterior probability of ambient noise phoneme, the speech frame corresponds to the non-posterior probability for waking up word phoneme.
Such as: assuming that carrying out sub-frame processing to the voice signal, 100 speech frames of the voice signal are obtained, wake up word
For small A and small A includes 6 phonemes, then can generate phonetic feature matrix based on the speech feature vector of 100 speech frames,
And by the phonetic feature Input matrix acoustics identification model, the posterior probability matrix of the voice signal is obtained, wherein the posteriority is general
The matrix that rate matrix is 100 × 8, is classified as each speech frame, and behavior corresponds to posterior probability, the correspondence that small A includes 6 phonemes
In the posterior probability of ambient noise phoneme and corresponding to the non-posterior probability for waking up word phoneme.
Tellable to be, when voice awakening method provided by the embodiments of the present application is applied to server, server can be with
Using the acoustics identification model pre-established, posterior probability is predicted, when voice wake-up side provided by the embodiments of the present application
When method is applied to client, which can be configured to by server after the foundation for completing acoustics identification model
In client, so that the acoustics identification model of server configuration can be used in client, posterior probability is predicted.
Secondly, after at least one speech frame based on the voice signal corresponds to wake-up word class and non-wake-up word class
Probability is tested, wake-up confidence level of the voice signal corresponding to each the second wake-up word is obtained.
In practical application, corresponds at least one speech frame based on the voice signal and wake up word class and non-wake-up word
The posterior probability of classification, obtain the voice signal corresponding to each second wake up word wake-up confidence level when, can use but
It is not limited to following manner:
First way: for each the second wake-up word, at least one speech frame based on the voice signal corresponds to
It wakes up word class and the non-posterior probability for waking up word class obtains numerical value highest second using the HMM based on viterbi algorithm
Word path score is waken up, and the numerical value highest second is waken up into word path score, is determined as the voice signal corresponding to second
Wake up the wake-up confidence level of word.
Such as: assuming that the second wake-up word includes small A small A and small A, then the posterior probability matrix based on the voice signal, makes
It is calculated separately wake-up word path for continuously being there is a plurality of wake-up word path of the small A of small A with the HMM based on viterbi algorithm and obtained
Point, and by the highest wake-up word path score of numerical value, it is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A, into one
Step, the posterior probability matrix based on the voice signal, using the HMM based on viterbi algorithm, for continuously there are the more of small A
Item wakes up word path, calculates separately and wakes up word path score, and by the highest wake-up word path score of numerical value, is determined as the voice
Signal corresponds to the wake-up confidence level of small A.
The second way: for each the second wake-up word, corresponded to based at least one speech frame and wake up word class
It obtains numerical value highest second using the HMM based on viterbi algorithm with the non-posterior probability for waking up word class and wakes up word path
Score and the highest non-wake-up word path score of numerical value, and the numerical value highest second is waken up into word path score and the numerical value
The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal wakes up word corresponding to second.
Such as: assuming that the second wake-up word includes small A small A and small A, then the posterior probability matrix based on the voice signal, makes
It is calculated separately wake-up word path for continuously being there is a plurality of wake-up word path of the small A of small A with the HMM based on viterbi algorithm and obtained
Point, and for continuously there is the non-a plurality of non-wake-up word path for waking up word, non-wake-up word path score is calculated separately, and will count
It is worth highest wake-up word path score and the highest non-difference for waking up word path score of numerical value, it is corresponding is determined as the voice signal
In the wake-up confidence level of the small A of small A, further, the posterior probability matrix based on the voice signal, using based on viterbi algorithm
HMM, for continuously there is a plurality of wake-up word path of small A, calculate separately wake up word path score, and for continuously occur it is non-
The a plurality of non-wake-up word path for waking up word, calculates separately non-wake-up word path score, and by the highest wake-up word path of numerical value
Score and the highest non-difference for waking up word path score of numerical value, are determined as the wake-up confidence level that the voice signal corresponds to small A.
Finally, detecting that the voice signal corresponds to any one second wake-up confidence level for waking up word and is not less than arousal threshold
When value, determine to wake up word comprising second in the voice signal.In such cases, when voice wake-up side provided by the embodiments of the present application
When method is applied to client, client directly can wake up word by second and wake up terminal device;And when the embodiment of the present application mentions
When the voice awakening method of confession is applied to server, the wake-up identification response that characterization can be waken up terminal device by server is returned
To client, when client receives the wake-up identification response of characterization wake-up terminal device, word can be waken up by second and waken up
Terminal device.
It is tellable to be, when detect the voice signal correspond to each second wake-up word wake-up confidence level be respectively less than
When threshold wake-up value, it is possible to determine that do not include second in the voice signal and wake up word.In such cases, when the embodiment of the present application provides
Voice awakening method when being applied to client, client can directly abandon the voice signal, and continue from buffer zone
It obtains voice signal to be processed and carries out wake-up identifying processing;And when voice awakening method provided by the embodiments of the present application is applied to
When server, the wake-up identification response that characterization can not waken up terminal device by server is back to client, and client receives
When not waking up the wake-up identification response of terminal device to characterization, wake operation is not executed to terminal device, and continue from buffer area
Voice signal to be processed is obtained in domain, carries out wake-up identifying processing to be sent to server.
Such as: assuming that the second wake-up word includes small A small A and small A, then detect the voice signal calling out corresponding to the small A of small A
When confidence level of waking up is not less than threshold wake-up value, terminal device can be waken up, detects that the voice signal corresponds to the wake-up confidence of small A
When degree is not less than threshold wake-up value, terminal device can also be waken up, and detects the voice signal and is set corresponding to the wake-up of the small A of small A
When reliability and wake-up confidence level corresponding to small A are respectively less than threshold wake-up value, terminal device is not waken up, and continue from buffer zone
It is middle to obtain voice signal to be processed.
S304: when determining in voice signal comprising the first wake-up word, pass through first and wake up word wake-up terminal device.
Specifically, can use when executing S304 but be not limited to following manner:
Firstly, carrying out sub-frame processing to the voice signal, at least one speech frame is obtained, and at least one speech frame
Feature extraction is carried out, the voice feature data of at least one speech frame is obtained.
Likewise, before carrying out sub-frame processing to the voice signal, AGC skill can also be used in the embodiment of the present application
Art enhances the volume of the voice signal, so that the too low voice signal of volume reaches the level that can be identified.Further, In
After the volume for enhancing the voice signal, mobile window function can be used, sub-frame processing is carried out to the voice signal, obtains the language
At least one speech frame of sound signal, and MFCC is used, obtain the phonetic feature number of at least one speech frame of the voice signal
According to obtaining the speech feature vector of at least one speech frame of the voice signal.Wherein, specific implementation and foregoing description
Implementation it is identical, overlaps will not be repeated.
Then, the voice feature data of at least one speech frame based on the voice signal is obtained using acoustics identification model
At least one speech frame to the voice signal corresponds to wake-up word class and the non-posterior probability for waking up word class.Likewise,
Speech frame corresponds to that wake up the posterior probability of word class include: that the speech frame corresponds to and wakes up word (the wake-up word is simple wakes up
Word, such as: small A, AB etc.) include each phoneme posterior probability, speech frame correspond to it is non-wake up word class posteriority it is general
Rate includes: that the speech frame corresponds to the posterior probability of ambient noise phoneme, the speech frame corresponds to the non-posteriority for waking up word phoneme
Probability.Wherein, specific implementation is identical as the implementation of foregoing description, and overlaps will not be repeated.
Secondly, after at least one speech frame based on the voice signal corresponds to wake-up word class and non-wake-up word class
Probability is tested, the wake-up confidence level that the voice signal wakes up word corresponding to first is obtained.
In practical application, corresponds at least one speech frame based on the voice signal and wake up word class and non-wake-up word
The posterior probability of classification can be used but be not limited to when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word
Following manner:
First way: at least one speech frame based on the voice signal, which corresponds to, wakes up word class and non-wake-up part of speech
Other posterior probability is obtained numerical value highest first and is waken up word path score using the HMM based on viterbi algorithm, and should
Numerical value highest first wakes up word path score, is determined as the wake-up confidence level that the voice signal wakes up word corresponding to first.
Such as: assuming that first to wake up word include the small A of small A, then the posterior probability matrix based on the voice signal, using being based on
The HMM of viterbi algorithm is calculated separately for continuously there is a plurality of wake-up word path of the small A of small A and is waken up word path score, and
By the highest wake-up word path score of numerical value, it is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
The second way: at least one speech frame based on the voice signal, which corresponds to, wakes up word class and non-wake-up part of speech
Other posterior probability is obtained numerical value highest first and is waken up word path score and numerical value using the HMM based on viterbi algorithm
Highest non-wake-up word path score, and the numerical value highest first is waken up into word path score and the highest non-wake-up of the numerical value
The difference of word path score is determined as the wake-up confidence level that the voice signal wakes up word corresponding to first.
Such as: assuming that first to wake up word include the small A of small A, then the posterior probability matrix based on the voice signal, using being based on
The HMM of viterbi algorithm is calculated separately for continuously there is a plurality of wake-up word path of the small A of small A and is waken up word path score, and
For continuously there is the non-a plurality of non-wake-up word path for waking up word, non-wake-up word path score is calculated separately, and most by numerical value
High wakes up word path score and the highest non-difference for waking up word path score of numerical value, is determined as the voice signal corresponding to small
The wake-up confidence level of the small A of A.
Finally, sentencing when detecting that the voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value
Word is waken up comprising first in speech signal.In such cases, when voice awakening method provided by the embodiments of the present application is applied to
When client, client directly can wake up word by first and wake up terminal device;And work as voice provided by the embodiments of the present application
When awakening method is applied to server, the wake-up identification response that characterization can be waken up terminal device by server is back to client
End when client receives the wake-up identification response of characterization wake-up terminal device, can wake up terminal by the first wake-up word and set
It is standby.
It is tellable to be, when the wake-up confidence level for detecting that the voice signal wakes up word corresponding to first is less than threshold wake-up value
When, it is possible to determine that do not include first in the voice signal and wakes up word.In such cases, when voice provided by the embodiments of the present application is called out
When method of waking up is applied to client, client can directly abandon the voice signal, and continue to obtain to from from buffer zone
The voice signal of reason carries out wake-up identifying processing;And when voice awakening method provided by the embodiments of the present application is applied to server
When, the wake-up identification response that characterization can not waken up terminal device by server is back to client, and client receives characterization
When not waking up the wake-up identification response of terminal device, wake operation is not executed to terminal device, and continue to obtain from buffer zone
Voice signal to be processed is taken, carries out wake-up identifying processing to be sent to server.
Such as: assuming that the first wake-up word includes the small A of small A, then detect that the voice signal is set corresponding to the wake-up of the small A of small A
When reliability is not less than threshold wake-up value, terminal device can be waken up, detects that the voice signal corresponds to the wake-up confidence of the small A of small A
When degree is less than threshold wake-up value, terminal device is not waken up, and continues to obtain voice signal to be processed from buffer zone.
Below with " offline to wake up " for concrete application scene, with " single word mode that wakes up mutually is cut with the simple word mode that wakes up
Change " it is specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first calls out
Awake word includes the small A of small A, and the second wake-up word includes small A.Specifically, as shown in fig.4, voice provided by the embodiments of the present application is called out
The detailed process for method of waking up is as follows:
S401: client obtains voice signal to be processed from buffer zone.
S402: client is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice
Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S403: client uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S404: the speech feature vector of 100 speech frame of the client based on the voice signal generates the voice signal
Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal
Test probability matrix.
S405: client judges whether to have passed through small A small according to the wake-up record information zone bit information that includes of preservation
A wakes up terminal device, if so, executing S406;If not, it is determined that current awake word mode is single wake-up word mode, and is executed
S410。
S406: the temporal information that client includes according to the wake-up record information of preservation, judgement are waken up eventually by the small A of small A
Client-initiated characterization whether is received in time interval after end equipment opens the simple voice signal for waking up word mode, if
It is, it is determined that current awake word mode is simple wake-up word mode, and executes S407;If not, it is determined that current awake word mode
Word mode is waken up to be single, and executes S410.
It is tellable to be, it, can be with one after client determines current awake word mode for simple wake-up word mode in S406
It is directly based on small A, wake-up identification is carried out to the voice signal of acquisition, until determining that receiving Client-initiated characterization closing simply calls out
Until the voice signal for word mode of waking up, and from the simple word pattern switching that wakes up to single wake-up word mode.
S407: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur
The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value
The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S408: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if
It is then to execute S409;If it is not, then executing S413.
S409: client wakes up terminal device by small A, returns to S401.
S410: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur
A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with
The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S411: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A,
If so, executing S412;If it is not, then executing S413.
S412: client wakes up terminal device by the small A of small A, and according to the current flag information for executing wake operation, more
The wake-up record information newly saved, returns to S401.
S413: client does not wake up terminal device, returns to S401.
Below with " online to wake up " for concrete application scene, with " single word mode that wakes up mutually is cut with the simple word mode that wakes up
Change " it is specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first calls out
Awake word includes the small A of small A, and the second wake-up word includes small A.Specifically, as shown in fig.5, voice provided by the embodiments of the present application is called out
The detailed process for method of waking up is as follows:
S501: client obtains voice signal to be processed from buffer zone.
S502: the voice signal to be processed is carried and is sent to server in waking up identification request by client.
S503: when server receives wake-up identification request, voice signal to be processed is obtained from waking up in identification request.
S504: server is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice
Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S505: server uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S506: the speech feature vector of 100 speech frame of the server based on the voice signal generates the voice signal
Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal
Test probability matrix.
S507: server judges whether to have passed through small A small according to the wake-up record information zone bit information that includes of preservation
A wakes up terminal device, if so, executing S508;If not, it is determined that current awake word mode is single wake-up word mode, and is executed
S513。
S508: the temporal information that server includes according to the wake-up record information of preservation, judgement are waken up eventually by the small A of small A
Client-initiated characterization whether is received in time interval after end equipment opens the simple voice signal for waking up word mode, if
It is, it is determined that current awake word mode is simple wake-up word mode, and executes S509;If not, it is determined that current awake word mode
Word mode is waken up to be single, and executes S513.
It is tellable to be, it, can be with one after client determines current awake word mode for simple wake-up word mode in S508
It is directly based on small A, wake-up identification is carried out to the voice signal of acquisition, until determining that receiving Client-initiated characterization closing simply calls out
Until the voice signal for word mode of waking up, and from the simple word pattern switching that wakes up to single wake-up word mode.
S509: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur
The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value
The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S510: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if
It is then to execute S511;If it is not, then executing S518.
S511: server returns to characterization to client and identifies response by the wake-up that small A wakes up terminal device.
S512: it when client receives wake-up identification response of the characterization by small A wake-up terminal device, is waken up by small A
Terminal device returns to S501.
S513: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur
A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with
The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S514: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A,
If so, executing S515;If it is not, then executing S518.
S515: server returns to characterization to client and identifies response by the wake-up that the small A of small A wakes up terminal device.
S516: the current flag information for executing and returning and operating is determined as client and currently executes wake operation by server
Flag information, and currently execute according to client the flag information of wake operation, update the wake-up record information of preservation.
S517: small by small A when client receives wake-up identification response of the characterization by small A small A wake-up terminal device
A wakes up terminal device, returns to S501.
S518: server returns to the wake-up identification response that characterization does not wake up terminal device to client.
S519: when client receives the wake-up identification response for characterizing and not waking up terminal device, terminal device is not waken up, is returned
Return S501.
Below with " offline to wake up " for concrete application scene, with " single wake-up word mode and the word mode that wakes up mutually switch more "
For specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first wakes up word
Including the small A of small A, the second wake-up word includes small A small A and small A.Specifically, as shown in fig.6, language provided by the embodiments of the present application
The detailed process of sound awakening method is as follows:
S601: client obtains voice signal to be processed from buffer zone.
S602: client is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice
Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S603: client uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S604: the speech feature vector of 100 speech frame of the client based on the voice signal generates the voice signal
Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal
Test probability matrix.
S605: client records information according to the wake-up of preservation, and judgement wakes up the time after terminal device by the small A of small A
Interval if so, determining that current awake word mode is more wake-up word modes, and executes S606 whether within the scope of setting time;
If not, it is determined that current awake word mode is single wake-up word mode, and executes S612.
S606: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur
A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with
The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S607: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur
The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value
The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S608: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A,
If so, executing S609;If it is not, then executing S610.
S609: client wakes up terminal device by the small A of small A, returns to S601.
S610: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if
It is then to execute S611;If it is not, then executing S615.
S611: client wakes up terminal device by small A, returns to S601.
S612: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur
A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with
The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S613: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A,
If so, executing S614;If it is not, then executing S615.
S614: client wakes up terminal device by the small A of small A, and according to the current flag information for executing wake operation, more
The wake-up record information newly saved, returns to S601.
S615: client does not wake up terminal device, returns to S601.
Below with " online to wake up " for concrete application scene, with " single wake-up word mode and the word mode that wakes up mutually switch more "
For specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first wakes up word
Including the small A of small A, the second wake-up word includes small A small A and small A.Specifically, as shown in fig.7, language provided by the embodiments of the present application
The detailed process of sound awakening method is as follows:
S701: client obtains voice signal to be processed from buffer zone.
S702: the voice signal to be processed is carried and is sent to server in waking up identification request by client.
S703: when server receives wake-up identification request, voice signal to be processed is obtained from waking up in identification request.
S704: server is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice
Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S705: server uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S706: the speech feature vector of 100 speech frame of the server based on the voice signal generates the voice signal
Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal
Test probability matrix.
S707: server records information according to the wake-up of preservation, and judgement wakes up the time after terminal device by the small A of small A
Interval if so, determining that current awake word mode is more wake-up word modes, and executes S708 whether within the scope of setting time;
If not, it is determined that current awake word mode is single wake-up word mode, and executes S716.
S708: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur
A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with
The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S709: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur
The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value
The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S710: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A,
If so, executing S711;If it is not, then executing S713.
S711: server returns to characterization to client and identifies response by the wake-up that the small A of small A wakes up terminal device.
S712: small by small A when client receives wake-up identification response of the characterization by small A small A wake-up terminal device
A wakes up terminal device, returns to S701.
S713: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if
It is then to execute S714;If it is not, then executing S716.
S714: server returns to characterization to client and identifies response by the wake-up that small A wakes up terminal device.
S715: it when client receives wake-up identification response of the characterization by small A wake-up terminal device, is waken up by small A
Terminal device returns to S701.
S716: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for
Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur
A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with
The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S717: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A,
If so, executing S718;If it is not, then executing S721.
S718: server returns to characterization to client and identifies response by the wake-up that the small A of small A wakes up terminal device.
S719: the current flag information for executing and returning and operating is determined as client and currently executes wake operation by server
Flag information, and currently execute according to client the flag information of wake operation, update the wake-up record information of preservation.
S720: small by small A when client receives wake-up identification response of the characterization by small A small A wake-up terminal device
A wakes up terminal device, returns to S701.
S721: server returns to the wake-up identification response that characterization does not wake up terminal device to client.
S722: when client receives the wake-up identification response for characterizing and not waking up terminal device, terminal device is not waken up, is returned
Return S701.
Based on the above embodiment, the embodiment of the present application provides a kind of voice Rouser, as shown in fig.8, the application
The voice Rouser 800 that embodiment provides includes at least:
Signal acquiring unit 801, for receiving voice signal;
First judging unit 802 judges whether to call out by the first wake-up word for recording information according to the wake-up of preservation
Awake terminal device, wherein waking up record information is the flag information that characterization executes wake operation using the first wake-up word;
First wakeup unit 803 has passed through the first wake-up word for the judgement of the first judging unit 802 and has waken up terminal device,
When then determining in voice signal comprising the second wake-up word, passes through second and wake up word wake-up terminal device;
Second wakeup unit 804, if determining that not waking up terminal by the first wake-up word sets for the first judging unit 802
It is standby, it is determined that when waking up word comprising first in voice signal, to wake up word by first and wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up word of the first wake-up word and customized calls out
At least one of awake word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Second judgment unit 805, for the first wakeup unit 803 determine voice signal in comprising second wake up word it
Before, information is recorded according to the wake-up of preservation, is determined through the time interval after the first wake-up word wake-up terminal device in setting
Between in range.
In a kind of possible embodiment, the second wakeup unit 804 is also used to:
If second judgment unit 805 records information according to the wake-up of preservation, determine that waking up terminal by the first wake-up word sets
Time interval after standby is not within the scope of setting time, it is determined that when waking up word comprising first in voice signal, calls out by first
Word of waking up wakes up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up of the first wake-up word, the first wake-up word
At least one of word and customized wake-up word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Information updating unit 806 is used for after the second wakeup unit 804 wakes up word wake-up terminal device by first,
The flag information that wake operation is currently executed according to the second wakeup unit 804 updates the wake-up record information of preservation.
In a kind of possible embodiment, when waking up word comprising second in determining voice signal, the first wakeup unit
803 are specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame
It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model
Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter
Number correspond to each second wake up word wake-up confidence level;
When detecting that voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence
Word is waken up comprising second in speech signal.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit
803 are specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path score using the HMM based on viterbi algorithm, and by numerical value
Highest second wakes up word path score, is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit
803 are specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path score and numerical value most using the HMM based on viterbi algorithm
High non-wake-up word path score, and numerical value highest second is waken up into word path score and the highest non-wake-up word path of numerical value
The difference of score is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, when waking up word comprising first in determining voice signal, the second wakeup unit
804 are specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame
It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model
Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter
Number correspond to first wake up word wake-up confidence level;
When detecting that voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, voice letter is determined
Word is waken up comprising first in number.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit 804 is specific
For:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score than the HMM of algorithm;
Numerical value highest first is waken up into word path score, the wake-up for being determined as voice signal corresponding to the first wake-up word is set
Reliability.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame
The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit 804 is specific
For:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score and the highest non-wake-up word path score of numerical value than the HMM of algorithm;
Numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of numerical value, is determined
Correspond to the first wake-up confidence level for waking up word for voice signal.
It should be noted that the application is real when voice awakening method provided by the embodiments of the present application is applied to server
The voice Rouser 800 for applying example offer can be set in server, when voice awakening method provided by the embodiments of the present application
When applied to terminal device, voice Rouser 800 provided by the embodiments of the present application be can be set in terminal device.
In addition, voice Rouser 800 provided by the embodiments of the present application solves the principle of technical problem and the application is implemented
The voice awakening method that example provides is similar, and therefore, the implementation of voice Rouser 800 provided by the embodiments of the present application may refer to
The implementation of voice awakening method provided by the embodiments of the present application, overlaps will not be repeated.
After describing voice provided by the embodiments of the present application and waking up system, method and apparatus, next, to the application
The voice wake-up device that embodiment provides simply is introduced.
Voice wake-up device provided by the embodiments of the present application can be terminal device, be also possible to server, refering to Fig. 9 institute
Show, voice wake-up device 900 provided by the embodiments of the present application includes at least: processor 901, memory 902 and being stored in storage
On device 902 and the computer program that can run on processor 901, processor 901 realize the application when executing computer program
The voice awakening method that embodiment provides.
It should be noted that voice wake-up device 900 shown in Fig. 9 is only an example, the application should not be implemented
The function and use scope of example bring any restrictions.
Voice wake-up device 900 provided by the embodiments of the present application can also include connecting different components (including processor 901
With memory 902) bus 903.Wherein, bus 903 indicates one of a few class bus structures or a variety of, including memory is total
Line, peripheral bus, local bus etc..
Memory 902 may include the readable medium of form of volatile memory, such as random access memory (Random
Access Memory, RAM) 921 and/or cache memory 922, it can further include read-only memory (Read
Only Memory, ROM) 923.
Memory 902 can also include the program means 925 with one group of (at least one) program module 924, program mould
Block 924 includes but is not limited to: operational subsystems, one or more application program, other program modules and program data, this
It may include the realization of network environment in each of a little examples or certain combination.
Voice wake-up device 900 can also be communicated with one or more external equipments 904 (such as keyboard, remote controler etc.),
The equipment interacted with voice wake-up device 900 communication (such as mobile phone, computer can also be enabled a user to one or more
Deng), and/or, it is any with voice wake-up device 900 is communicated with one or more of the other voice wake-up device 900
Equipment (such as router, modem etc.) communication.This communication can pass through input/output (Input/Output, I/O)
Interface 905 carries out.Also, voice wake-up device 900 can also pass through network adapter 906 and one or more network (example
Such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and/or public network,
Such as internet) communication.As shown in figure 9, network adapter 906 passes through other modules of bus 903 and voice wake-up device 900
Communication.It will be appreciated that though being not shown in Fig. 9, other hardware and/or software mould can be used in conjunction with voice wake-up device 900
Block, including but not limited to: microcode, device driver, redundant processor, external disk drive array, disk array
(Redundant Arrays of Independent Disks, RAID) subsystem, tape drive and data backup storage
Subsystem etc..
In addition, the embodiment of the present application also provides a kind of computer readable storage medium, the computer readable storage medium
It is stored with computer instruction, voice awakening method provided by the embodiments of the present application is realized when computer instruction is executed by processor.
Specifically, which can be built-in or be mounted in voice wake-up device 900, in this way, voice wake-up device 900 is just
Voice awakening method provided by the embodiments of the present application can be realized by executing built-in or installation executable program.
In addition, voice awakening method provided by the embodiments of the present application is also implemented as a kind of program product, which is produced
Product include program code, and when the program product can be run in voice wake-up device 900, the program code is for making voice
Wake-up device 900 executes voice awakening method provided by the embodiments of the present application.
Program product provided by the embodiments of the present application can be using any combination of one or more readable mediums, wherein
Readable medium can be readable signal medium or readable storage medium storing program for executing, and readable storage medium storing program for executing can be but it is electric to be not limited to,
Magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination is specifically, readable to deposit
The more specific example (non exhaustive list) of storage media includes: electrical connection with one or more conducting wires, portable disc, hard
Disk, RAM, ROM, erasable programmable read only memory (Erasable Programmable Read Only Memory,
EPROM), optical fiber, portable compact disc read only memory (Compact Disc Read-Only Memory, CD-ROM), light are deposited
Memory device, magnetic memory device or above-mentioned any appropriate combination.
Program product provided by the embodiments of the present application can also be set using CD-ROM and including program code in calculating
Standby upper operation.However, program product provided by the embodiments of the present application is without being limited thereto, and in the embodiment of the present application, readable storage medium
Matter can be any tangible medium for including or store program, which, which can be commanded execution system, device or device, makes
With or it is in connection.
It should be noted that although being referred to several unit or sub-units of device in the above detailed description, this stroke
It point is only exemplary not enforceable.In fact, according to presently filed embodiment, it is above-described two or more
The feature and function of unit can embody in a unit.Conversely, the feature and function of an above-described unit can
It is to be embodied by multiple units with further division.
In addition, although describing the operation of the application method in the accompanying drawings with particular order, this do not require that or
Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired
As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one
Step is decomposed into execution of multiple steps.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application
Please embodiment spirit and scope.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application
And its within the scope of equivalent technologies, then the application is also intended to include these modifications and variations.
Claims (22)
1. a kind of voice awakening method characterized by comprising
Obtain voice signal to be processed;
Information is recorded according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, wherein the wake-up
Record information is the flag information that characterization wakes up word execution wake operation using described first;
If so, passing through described second when determining in the voice signal comprising the second wake-up word and waking up the word wake-up terminal
Equipment;
If not, it is determined that when waking up word comprising described first in the voice signal, waken up described in word wake-up by described first
Terminal device.
2. voice awakening method as described in claim 1, which is characterized in that the second wake-up word includes first wake-up
At least one of the simple wake-up word of word and customized wake-up word.
3. voice awakening method as described in claim 1, which is characterized in that determine in the voice signal and waken up comprising second
Before word, further includes:
Information is recorded according to the wake-up of preservation, is determined between waking up the time after the terminal device by the first wake-up word
Every interior, receive characterization and open the simple voice signal for waking up word mode.
4. voice awakening method as claimed in claim 3, which is characterized in that further include:
If recording information according to the wake-up of preservation, the time after waking up the terminal device by the first wake-up word is determined
In interval, characterization is not received and opens the simple voice signal for waking up word mode, it is determined that comprising described in the voice signal
When the first wake-up word, passes through described first and wake up the word wake-up terminal device.
5. voice awakening method as described in claim 1, which is characterized in that determine in the voice signal and waken up comprising second
Before word, further includes:
Information is recorded according to the wake-up of preservation, determines the time interval after waking up the terminal device by the first wake-up word
Within the scope of setting time.
6. voice awakening method as claimed in claim 5, which is characterized in that further include:
If recording information according to the wake-up of preservation, determine that waking up word by described first woke up between the time after the terminal device
Every not within the scope of setting time, it is determined that when waking up word comprising described first in the voice signal, called out by described first
Word of waking up wakes up the terminal device.
7. voice awakening method as claimed in claim 5, which is characterized in that the second wake-up word includes first wake-up
At least one of word, the simple wake-up word of the first wake-up word and customized wake-up word.
8. the voice awakening method as described in claim 1,4 or 6, which is characterized in that wake up word by described first and wake up institute
After stating terminal device, further comprise:
According to the current flag information for executing wake operation, the wake-up record information of preservation is updated.
9. such as the described in any item voice awakening methods of claim 1-7, which is characterized in that determine in the voice signal and include
Second wakes up word, comprising:
Sub-frame processing is carried out to the voice signal, obtains at least one speech frame, and carry out at least one described speech frame
Feature extraction obtains the voice feature data of at least one speech frame;
Based on the voice feature data of at least one speech frame, using acoustics identification model, at least one described language is obtained
Sound frame, which corresponds to, wakes up word class and the non-posterior probability for waking up word class;
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, obtains institute's predicate
Sound signal corresponds to the wake-up confidence level of each the second wake-up word;
When detecting that the voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence
Word is waken up comprising second in the fixed voice signal.
10. voice awakening method as claimed in claim 9, which is characterized in that corresponded to based at least one described speech frame
Word class and the non-posterior probability for waking up word class are waken up, the voice signal calling out corresponding to each the second wake-up word is obtained
Awake confidence level, comprising:
For each the second wake-up word, is corresponded to based at least one described speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path using the Hidden Markov Model HMM based on viterbi algorithm
Score, and the numerical value highest second is waken up into word path score, it is determined as the voice signal and is called out corresponding to described second
The wake-up confidence level of awake word.
11. voice awakening method as claimed in claim 9, which is characterized in that corresponded to based at least one described speech frame
Word class and the non-posterior probability for waking up word class are waken up, the voice signal calling out corresponding to each the second wake-up word is obtained
Awake confidence level, comprising:
For each the second wake-up word, is corresponded to based at least one described speech frame and wake up word class and non-wake-up word class
Posterior probability obtain numerical value highest second and wake up word path using the Hidden Markov Model HMM based on viterbi algorithm
Score and the highest non-wake-up word path score of numerical value, and by the numerical value it is highest second wake up word path score with it is described
The highest non-difference for waking up word path score of numerical value, is determined as the voice signal and corresponds to the described second wake-up for waking up word
Confidence level.
12. such as the described in any item voice awakening methods of claim 1-7, which is characterized in that determine and wrapped in the voice signal
Word is waken up containing described first, comprising:
Sub-frame processing is carried out to the voice signal, obtains at least one speech frame, and carry out at least one described speech frame
Feature extraction obtains the voice feature data of at least one speech frame;
Based on the voice feature data of at least one speech frame, using acoustics identification model, at least one described language is obtained
Sound frame, which corresponds to, wakes up word class and the non-posterior probability for waking up word class;
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, obtains institute's predicate
Sound signal corresponds to the described first wake-up confidence level for waking up word;
When detecting that the voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, institute is determined
Word is waken up comprising described first in predicate sound signal.
13. voice awakening method as claimed in claim 12, which is characterized in that corresponded to based at least one described speech frame
Word class and the non-posterior probability for waking up word class are waken up, the voice signal is obtained and corresponds to the described first wake-up for waking up word
Confidence level, comprising:
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains numerical value highest first and wakes up word path score than the Hidden Markov Model HMM of algorithm;
The numerical value highest first is waken up into word path score, is determined as the voice signal and corresponds to the first wake-up word
Wake-up confidence level.
14. voice awakening method as claimed in claim 12, which is characterized in that corresponded to based at least one described speech frame
Word class and the non-posterior probability for waking up word class are waken up, the voice signal is obtained and corresponds to the described first wake-up for waking up word
Confidence level, comprising:
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension
Spy obtains the highest first wake-up word path score of numerical value and numerical value is highest non-than the Hidden Markov Model HMM of algorithm
Wake up word path score;
The numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of the numerical value,
It is determined as the voice signal and corresponds to the described first wake-up confidence level for waking up word.
15. a kind of voice Rouser characterized by comprising
Signal acquiring unit, for obtaining voice signal to be processed;
First judging unit judges whether that having passed through the first wake-up word wakes up terminal for recording information according to the wake-up of preservation
Equipment, wherein the record information that wakes up is the flag information that characterization executes wake operation using the first wake-up word;
First wakeup unit, if determining that waking up the terminal by the first wake-up word sets for first judging unit
It is standby, it is determined that when waking up word comprising second in the voice signal, to wake up word by described second and wake up the terminal device;
Second wakeup unit, if determining that not waking up the terminal by the first wake-up word sets for first judging unit
It is standby, it is determined that when waking up word comprising described first in the voice signal, the terminal to be waken up by the first wake-up word and is set
It is standby.
16. voice Rouser as claimed in claim 15, which is characterized in that further include:
Second judgment unit, for first wakeup unit determine in the voice signal comprising second wake up word before,
Information is recorded according to the wake-up of preservation, determines the time interval after waking up the terminal device by the first wake-up word
It is interior, it receives characterization and opens the simple voice signal for waking up word mode.
17. voice Rouser as claimed in claim 16, which is characterized in that second wakeup unit is also used to:
If the second judgment unit records information according to the wake-up of preservation, determines and waken up described in word wake-up by described first
In time interval after terminal device, characterization is not received and opens the simple voice signal for waking up word mode, it is determined that institute's predicate
When waking up word comprising described first in sound signal, word is waken up by described first and wakes up the terminal device.
18. voice Rouser as claimed in claim 15, which is characterized in that further include:
Third judging unit, for first wakeup unit determine in the voice signal comprising second wake up word before,
Information is recorded according to the wake-up of preservation, determines to wake up word by described first and wakes up the time interval after the terminal device and setting
It fixes time in range.
19. voice Rouser as claimed in claim 18, which is characterized in that second wakeup unit is also used to:
If the third judging unit records information according to the wake-up of preservation, determine that waking up word by described first wakes up the end
Time interval after end equipment is not within the scope of setting time, it is determined that wakes up word comprising described first in the voice signal
When, word, which is waken up, by described first wakes up the terminal device.
20. the voice Rouser as described in claim 15,17 or 19, which is characterized in that further include:
Information updating unit, for second wakeup unit by described first wake up word wake up the terminal device it
Afterwards, the flag information that wake operation is currently executed according to second wakeup unit updates the wake-up record information of preservation.
21. a kind of voice wake-up device characterized by comprising memory, processor and be stored on the memory and can
The computer program run on the processor, the processor realize such as claim 1- when executing the computer program
14 described in any item voice awakening methods.
22. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer to refer to
It enables, such as claim 1-14 described in any item voice awakening methods is realized when the computer instruction is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910887674.0A CN110534102B (en) | 2019-09-19 | 2019-09-19 | Voice wake-up method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910887674.0A CN110534102B (en) | 2019-09-19 | 2019-09-19 | Voice wake-up method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110534102A true CN110534102A (en) | 2019-12-03 |
CN110534102B CN110534102B (en) | 2020-10-30 |
Family
ID=68669399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910887674.0A Active CN110534102B (en) | 2019-09-19 | 2019-09-19 | Voice wake-up method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110534102B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091839A (en) * | 2020-03-20 | 2020-05-01 | 深圳市友杰智新科技有限公司 | Voice awakening method and device, storage medium and intelligent device |
CN111768783A (en) * | 2020-06-30 | 2020-10-13 | 北京百度网讯科技有限公司 | Voice interaction control method, device, electronic equipment, storage medium and system |
CN111883121A (en) * | 2020-07-20 | 2020-11-03 | 北京声智科技有限公司 | Awakening method and device and electronic equipment |
CN112037786A (en) * | 2020-08-31 | 2020-12-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method, device, equipment and storage medium |
CN113096651A (en) * | 2020-01-07 | 2021-07-09 | 北京地平线机器人技术研发有限公司 | Voice signal processing method and device, readable storage medium and electronic equipment |
EP3923275A1 (en) * | 2020-06-12 | 2021-12-15 | Beijing Xiaomi Pinecone Electronics Co., Ltd. | Device wakeup method and apparatus, electronic device, and storage medium |
CN115605949A (en) * | 2020-10-30 | 2023-01-13 | 谷歌有限责任公司(Us) | Simultaneous acoustic event detection across multiple assistant devices |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971678A (en) * | 2013-01-29 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and device for detecting keywords |
CN104536978A (en) * | 2014-12-05 | 2015-04-22 | 奇瑞汽车股份有限公司 | Voice data identifying method and device |
US20170025124A1 (en) * | 2014-10-09 | 2017-01-26 | Google Inc. | Device Leadership Negotiation Among Voice Interface Devices |
CN106448663A (en) * | 2016-10-17 | 2017-02-22 | 海信集团有限公司 | Voice wakeup method and voice interaction device |
CN106898352A (en) * | 2017-02-27 | 2017-06-27 | 联想(北京)有限公司 | Sound control method and electronic equipment |
CN107221326A (en) * | 2017-05-16 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device and computer equipment based on artificial intelligence |
CN107919124A (en) * | 2017-12-22 | 2018-04-17 | 北京小米移动软件有限公司 | Equipment awakening method and device |
CN108122563A (en) * | 2017-12-19 | 2018-06-05 | 北京声智科技有限公司 | Improve voice wake-up rate and the method for correcting DOA |
CN109036412A (en) * | 2018-09-17 | 2018-12-18 | 苏州奇梦者网络科技有限公司 | voice awakening method and system |
CN109065044A (en) * | 2018-08-30 | 2018-12-21 | 出门问问信息科技有限公司 | Wake up word recognition method, device, electronic equipment and computer readable storage medium |
US20190051307A1 (en) * | 2017-08-14 | 2019-02-14 | Lenovo (Singapore) Pte. Ltd. | Digital assistant activation based on wake word association |
CN109358751A (en) * | 2018-10-23 | 2019-02-19 | 北京猎户星空科技有限公司 | A kind of wake-up control method of robot, device and equipment |
WO2019079974A1 (en) * | 2017-10-24 | 2019-05-02 | Beijing Didi Infinity Technology And Development Co., Ltd. | System and method for uninterrupted application awakening and speech recognition |
CN109817200A (en) * | 2019-01-30 | 2019-05-28 | 北京声智科技有限公司 | The optimization device and method that voice wakes up |
-
2019
- 2019-09-19 CN CN201910887674.0A patent/CN110534102B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971678A (en) * | 2013-01-29 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and device for detecting keywords |
US20170025124A1 (en) * | 2014-10-09 | 2017-01-26 | Google Inc. | Device Leadership Negotiation Among Voice Interface Devices |
CN104536978A (en) * | 2014-12-05 | 2015-04-22 | 奇瑞汽车股份有限公司 | Voice data identifying method and device |
CN106448663A (en) * | 2016-10-17 | 2017-02-22 | 海信集团有限公司 | Voice wakeup method and voice interaction device |
CN106898352A (en) * | 2017-02-27 | 2017-06-27 | 联想(北京)有限公司 | Sound control method and electronic equipment |
CN107221326A (en) * | 2017-05-16 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device and computer equipment based on artificial intelligence |
US20190051307A1 (en) * | 2017-08-14 | 2019-02-14 | Lenovo (Singapore) Pte. Ltd. | Digital assistant activation based on wake word association |
WO2019079974A1 (en) * | 2017-10-24 | 2019-05-02 | Beijing Didi Infinity Technology And Development Co., Ltd. | System and method for uninterrupted application awakening and speech recognition |
CN108122563A (en) * | 2017-12-19 | 2018-06-05 | 北京声智科技有限公司 | Improve voice wake-up rate and the method for correcting DOA |
CN107919124A (en) * | 2017-12-22 | 2018-04-17 | 北京小米移动软件有限公司 | Equipment awakening method and device |
CN109065044A (en) * | 2018-08-30 | 2018-12-21 | 出门问问信息科技有限公司 | Wake up word recognition method, device, electronic equipment and computer readable storage medium |
CN109036412A (en) * | 2018-09-17 | 2018-12-18 | 苏州奇梦者网络科技有限公司 | voice awakening method and system |
CN109358751A (en) * | 2018-10-23 | 2019-02-19 | 北京猎户星空科技有限公司 | A kind of wake-up control method of robot, device and equipment |
CN109817200A (en) * | 2019-01-30 | 2019-05-28 | 北京声智科技有限公司 | The optimization device and method that voice wakes up |
Non-Patent Citations (3)
Title |
---|
AH XING: "Compact wake-up word speech recognition on embedded platforms", 《APPLIED MECHANICS AND MATERIALS, 2014 - TRANS TECH PUBL》 * |
VETON KËPUSKA: "Improving Wake-Up-Word and General Speech Recognition Systems", 《2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING》 * |
郑志辉: "基于语音实现人机对话的空调控制器研究开发", 《2018年中国家用电器技术大会论文集》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113096651A (en) * | 2020-01-07 | 2021-07-09 | 北京地平线机器人技术研发有限公司 | Voice signal processing method and device, readable storage medium and electronic equipment |
CN111091839A (en) * | 2020-03-20 | 2020-05-01 | 深圳市友杰智新科技有限公司 | Voice awakening method and device, storage medium and intelligent device |
EP3923275A1 (en) * | 2020-06-12 | 2021-12-15 | Beijing Xiaomi Pinecone Electronics Co., Ltd. | Device wakeup method and apparatus, electronic device, and storage medium |
US11665644B2 (en) | 2020-06-12 | 2023-05-30 | Beijing Xiaomi Pinecone Electronics Co., Ltd. | Device wakeup method and apparatus, electronic device, and storage medium |
CN111768783A (en) * | 2020-06-30 | 2020-10-13 | 北京百度网讯科技有限公司 | Voice interaction control method, device, electronic equipment, storage medium and system |
CN111768783B (en) * | 2020-06-30 | 2024-04-02 | 北京百度网讯科技有限公司 | Voice interaction control method, device, electronic equipment, storage medium and system |
CN111883121A (en) * | 2020-07-20 | 2020-11-03 | 北京声智科技有限公司 | Awakening method and device and electronic equipment |
CN112037786A (en) * | 2020-08-31 | 2020-12-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method, device, equipment and storage medium |
CN115605949A (en) * | 2020-10-30 | 2023-01-13 | 谷歌有限责任公司(Us) | Simultaneous acoustic event detection across multiple assistant devices |
Also Published As
Publication number | Publication date |
---|---|
CN110534102B (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110534102A (en) | A kind of voice awakening method, device, equipment and medium | |
CN106611597B (en) | Voice awakening method and device based on artificial intelligence | |
CN110890093B (en) | Intelligent equipment awakening method and device based on artificial intelligence | |
CN108320733B (en) | Voice data processing method and device, storage medium and electronic equipment | |
CN102270450B (en) | System and method of multi model adaptation and voice recognition | |
CN110517664B (en) | Multi-party identification method, device, equipment and readable storage medium | |
JP2020531898A (en) | Voice emotion detection methods, devices, computer equipment, and storage media | |
US7386443B1 (en) | System and method for mobile automatic speech recognition | |
KR20190042918A (en) | Electronic device and operating method thereof | |
CN111667818B (en) | Method and device for training wake-up model | |
CN106297777A (en) | Method and device for awakening voice service | |
US20150348542A1 (en) | Speech recognition method and system based on user personalized information | |
CN108182937A (en) | Keyword recognition method, device, equipment and storage medium | |
CN110751260B (en) | Electronic device, task processing method and neural network training method | |
CN103456305A (en) | Terminal and speech processing method based on multiple sound collecting units | |
CN103903627A (en) | Voice-data transmission method and device | |
CN105122354A (en) | Speech model retrieval in distributed speech recognition systems | |
CN112397083A (en) | Voice processing method and related device | |
KR20200074690A (en) | Electonic device and Method for controlling the electronic device thereof | |
KR20190066401A (en) | Electronic device for network setup of external device and operating method thereof | |
US20210210109A1 (en) | Adaptive decoder for highly compressed grapheme model | |
CN108899028A (en) | Voice awakening method, searching method, device and terminal | |
CN109240641A (en) | Audio method of adjustment, device, electronic equipment and storage medium | |
CN107808662B (en) | Method and device for updating grammar rule base for speech recognition | |
CN111386566A (en) | Device control method, cloud device, intelligent device, computer medium and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |