CN110534102A - A kind of voice awakening method, device, equipment and medium - Google Patents

A kind of voice awakening method, device, equipment and medium Download PDF

Info

Publication number
CN110534102A
CN110534102A CN201910887674.0A CN201910887674A CN110534102A CN 110534102 A CN110534102 A CN 110534102A CN 201910887674 A CN201910887674 A CN 201910887674A CN 110534102 A CN110534102 A CN 110534102A
Authority
CN
China
Prior art keywords
wake
word
voice signal
waking
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910887674.0A
Other languages
Chinese (zh)
Other versions
CN110534102B (en
Inventor
靳源
冯大航
陈孝良
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sound Intelligence Technology Co Ltd
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing Sound Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sound Intelligence Technology Co Ltd filed Critical Beijing Sound Intelligence Technology Co Ltd
Priority to CN201910887674.0A priority Critical patent/CN110534102B/en
Publication of CN110534102A publication Critical patent/CN110534102A/en
Application granted granted Critical
Publication of CN110534102B publication Critical patent/CN110534102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Electric Clocks (AREA)

Abstract

This application discloses a kind of voice awakening method, device, equipment and media, are applied to field of artificial intelligence, to solve the problems, such as that it is single that voice awakening method in the prior art has wake-up word.Specifically: obtain voice signal to be processed;If recording information according to the wake-up of preservation, determine that having passed through the first wake-up word wakes up terminal device, then when waking up word comprising second in determining voice signal, wakes up word by second and wake up terminal device;If recording information according to the wake-up of preservation, determines that not waking up word by first wakes up terminal device, then when waking up word comprising first in determining voice signal, wake up word by first and wake up the terminal device.In this way, can wake up word after waking up terminal device by the first wake-up word by second and wake up terminal device, so as to be accustomed to according to practical language, it is arranged second for terminal device and wakes up word, and then realize the diversification for waking up word.

Description

A kind of voice awakening method, device, equipment and medium
Technical field
This application involves field of artificial intelligence more particularly to a kind of voice awakening method, device, equipment and media.
Background technique
With the constantly development of artificial intelligence technology, the terminal device control system waken up based on voice is also constantly being sent out Exhibition, wherein voice wakes up the entrance as controlling terminal equipment, is increasingly becoming the research hotspot of field of artificial intelligence.
Currently, user, which can wake up terminal device and controlling terminal equipment by voice, executes corresponding operating, to user with Many conveniences are carried out, however, current voice awakening method still has some problems, for example, it is relatively simple etc. to wake up word.
Summary of the invention
The embodiment of the present application provides a kind of voice awakening method, device, equipment and medium, to solve in the prior art Voice awakening method existing wake up the more single problem of word.
Technical solution provided by the embodiments of the present application is as follows:
On the one hand, the embodiment of the present application provides a kind of voice awakening method, comprising:
Obtain voice signal to be processed;
Information is recorded according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, wherein wake up Record information is the flag information that characterization wakes up word execution wake operation using first;
If so, passing through second when determining in voice signal comprising the second wake-up word and waking up word wake-up terminal device;
If not, it is determined that when waking up word comprising first in voice signal, wake up word by first and wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up word of the first wake-up word and customized calls out At least one of awake word.
In a kind of possible embodiment, determine in voice signal comprising second wake up word before, further includes:
Information is recorded according to the wake-up of preservation, determines and is waking up the time interval after word wakes up terminal device by first It is interior, it receives characterization and opens the simple voice signal for waking up word mode.
In a kind of possible embodiment, voice awakening method provided by the embodiments of the present application further include:
If recording information according to the wake-up of preservation, determines and waking up the time interval after word wakes up terminal device by first It is interior, characterization is not received opens the simple voice signal for waking up word mode, it is determined that when waking up word comprising first in voice signal, Word, which is waken up, by first wakes up terminal device.
In a kind of possible embodiment, determine in voice signal comprising second wake up word before, further includes:
Information is recorded according to the wake-up of preservation, determines that the time interval after waking up terminal device by the first wake-up word is being set It fixes time in range.
In a kind of possible embodiment, voice awakening method provided by the embodiments of the present application further include:
If recording information according to the wake-up of preservation, the time interval after waking up terminal device by the first wake-up word is determined not Within the scope of setting time, it is determined that when waking up word comprising first in voice signal, wake up word by first and wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up of the first wake-up word, the first wake-up word At least one of word and customized wake-up word.
In a kind of possible embodiment, after waking up word wake-up terminal device by first, further comprise:
According to the current flag information for executing wake operation, the wake-up record information of preservation is updated.
In a kind of possible embodiment, determines in voice signal and wakes up word comprising second, comprising:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter Number correspond to each second wake up word wake-up confidence level;
When detecting that voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence Word is waken up comprising second in speech signal.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech Other posterior probability obtains wake-up confidence level of the voice signal corresponding to each the second wake-up word, comprising:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class Posterior probability counted using the Hidden Markov Model (Hidden Markov Model, HMM) based on viterbi algorithm It is worth highest second and wakes up word path score, and numerical value highest second is waken up into word path score, is determined as voice signal pair The wake-up confidence level that word should be waken up in second.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech Other posterior probability obtains wake-up confidence level of the voice signal corresponding to each the second wake-up word, comprising:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path score and numerical value most using the HMM based on viterbi algorithm High non-wake-up word path score, and numerical value highest second is waken up into word path score and the highest non-wake-up word path of numerical value The difference of score is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, determines in voice signal and wakes up word comprising first, comprising:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter Number correspond to first wake up word wake-up confidence level;
When detecting that voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, voice letter is determined Word is waken up comprising first in number.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech Other posterior probability obtains the wake-up confidence level that voice signal wakes up word corresponding to first, comprising:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score than the HMM of algorithm;
Numerical value highest first is waken up into word path score, the wake-up for being determined as voice signal corresponding to the first wake-up word is set Reliability.
In a kind of possible embodiment, is corresponded to based at least one speech frame and wake up word class and non-wake-up part of speech Other posterior probability obtains the wake-up confidence level that voice signal wakes up word corresponding to first, comprising:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score and the highest non-wake-up word path score of numerical value than the HMM of algorithm;
Numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of numerical value, is determined Correspond to the first wake-up confidence level for waking up word for voice signal.
On the other hand, the embodiment of the present application provides a kind of voice Rouser, comprising:
Signal acquiring unit, for obtaining voice signal to be processed;
First judging unit judges whether that having passed through the first wake-up word wakes up for recording information according to the wake-up of preservation Terminal device, wherein waking up record information is the flag information that characterization executes wake operation using the first wake-up word;
First wakeup unit, if having passed through the first wake-up word for the judgement of the first judging unit wakes up terminal device, really When waking up word comprising second in speech signal, word is waken up by second and wakes up terminal device;
Second wakeup unit, if determining that not waking up word by first wakes up terminal device for the first judging unit, really When waking up word comprising first in speech signal, word is waken up by first and wakes up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up word of the first wake-up word and customized calls out At least one of awake word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Second judgment unit, for the first wakeup unit determine in voice signal comprising second wake up word before, according to The wake-up of preservation records information, determines and wakes up in the time interval after word wakes up terminal device by first, receives characterization Open the simple voice signal for waking up word mode.
In a kind of possible embodiment, the second wakeup unit is also used to:
If second judgment unit records information according to the wake-up of preservation, determines and waking up word wake-up terminal device by first In time interval afterwards, characterization is not received and opens the simple voice signal for waking up word mode, it is determined that includes in voice signal When the first wake-up word, passes through first and wake up word wake-up terminal device.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Third judging unit, for the first wakeup unit determine in voice signal comprising second wake up word before, according to The wake-up of preservation records information, determines through the time interval after the first wake-up word wake-up terminal device in setting time range It is interior.
In a kind of possible embodiment, the second wakeup unit is also used to:
If third judging unit records information according to the wake-up of preservation, determine after waking up word wake-up terminal device by first Time interval not within the scope of setting time, it is determined that in voice signal comprising first wake up word when, pass through first wake up word Wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up of the first wake-up word, the first wake-up word At least one of word and customized wake-up word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Information updating unit, for being waken up after word wakes up terminal device in the second wakeup unit by first, according to the Two wakeup units currently execute the flag information of wake operation, update the wake-up record information of preservation.
In a kind of possible embodiment, when waking up word comprising second in determining voice signal, the first wakeup unit It is specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter Number correspond to each second wake up word wake-up confidence level;
When detecting that voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence Word is waken up comprising second in speech signal.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit It is specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path score using the HMM based on viterbi algorithm, and by numerical value Highest second wakes up word path score, is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit It is specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path score and numerical value most using the HMM based on viterbi algorithm High non-wake-up word path score, and numerical value highest second is waken up into word path score and the highest non-wake-up word path of numerical value The difference of score is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, when waking up word comprising first in determining voice signal, the second wakeup unit It is specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter Number correspond to first wake up word wake-up confidence level;
When detecting that voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, voice letter is determined Word is waken up comprising first in number.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit is specifically used In:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score than the HMM of algorithm;
Numerical value highest first is waken up into word path score, the wake-up for being determined as voice signal corresponding to the first wake-up word is set Reliability.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit is specifically used In:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score and the highest non-wake-up word path score of numerical value than the HMM of algorithm;
Numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of numerical value, is determined Correspond to the first wake-up confidence level for waking up word for voice signal.
On the other hand, the embodiment of the present application provides a kind of voice wake-up device, comprising: memory, processor and storage On a memory and the computer program that can run on a processor, processor realize that the application is implemented when executing computer program The voice awakening method that example provides.
On the other hand, the embodiment of the present application also provides a kind of computer readable storage medium, computer-readable storage mediums Matter is stored with computer instruction, and voice wake-up side provided by the embodiments of the present application is realized when computer instruction is executed by processor Method.
The embodiment of the present application has the beneficial effect that:
In the embodiment of the present application, after waking up word wake-up terminal device by first, word can be waken up by second and waken up eventually End equipment wakes up word so as to be accustomed to according to practical language for terminal device setting second, and then realize and wake up the more of word Sample, moreover, just can wake up terminal only after waking up word wake-up terminal device by first by the second wake-up word and set It is standby, thus be avoided as much as because setting second wake up word it is too simple caused by false wake-up rate promoted the problem of.
Other features and advantage will illustrate in the following description, also, partly can be from specification In become apparent, or understood and implementing the application.The purpose of the application and other advantages can be by written The structure that particularly points out in specification, claims and attached drawing is achieved and obtained.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is the flow diagram of acoustics identification model method for building up in the embodiment of the present application;
Fig. 2 is the system framework schematic diagram that voice wakes up system in the embodiment of the present application;
Fig. 3 is the process overview schematic diagram of voice awakening method in the embodiment of the present application;
Fig. 4 is that word mode and the simple voice wake-up side waken up when word mode mutually switches are singly waken up in the embodiment of the present application A kind of idiographic flow schematic diagram of method;
Fig. 5 is that word mode and the simple voice wake-up side waken up when word mode mutually switches are singly waken up in the embodiment of the present application Another idiographic flow schematic diagram of method;
Fig. 6 is that word mode and the voice awakening method waken up when word mode mutually switches are singly waken up in the embodiment of the present application more A kind of idiographic flow schematic diagram;
Fig. 7 is that word mode and the voice awakening method waken up when word mode mutually switches are singly waken up in the embodiment of the present application more Another idiographic flow schematic diagram;
Fig. 8 is the illustrative view of functional configuration of voice Rouser in the embodiment of the present application;
Fig. 9 is the hardware structural diagram of voice wake-up device in the embodiment of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand the application, the technical terms referred in the application are carried out first Explanation.
1, voice wake up, for by carry wake up word voice signal, by mobile phone, computer, personal digital assistant The terminal devices such as (Personal Digital Assistant, PDA), wearable device, smart home device, mobile unit from Dormant state wakes up a kind of technology being in working condition.
2, client, in the embodiment of the present application, for can in installing terminal equipment, can real-time monitoring terminal equipment whether Voice signal is received, further, it is possible in the wake-up recognition result according to voice signal, when determination needs to wake up terminal device, A kind of application program of wake operation is executed to terminal device.
3, server is clients providing data library clothes for the request initiated according to client in the embodiment of the present application Business, the running background equipment for calculating service, waking up all kinds of services such as identification service.
4, acoustics identification model is the voice feature data according at least one corresponding speech frame of voice signal, to this At least one speech frame, which corresponds to, wakes up the deep learning model that the posterior probability of word class and non-wake-up word class is predicted.
In the embodiment of the present application, in order to reduce the calculation amount of client, the establishment process of acoustics identification model can taken It is executed in business device, as shown in fig.1, the process of acoustics identification model method for building up is as follows:
S101: voice set to be trained is acquired, wherein include waking up word sound signal and non-calling out in voice set to be trained Awake word sound signal.
In the embodiment of the present application, waking up word sound signal is to carry the voice signal for waking up word, non-wake-up word message Number not carry the voice signal and ambient noise signal etc. that wake up word.
S102: being directed to each voice signal, carries out sub-frame processing to the voice signal, obtains at least one speech frame, And feature extraction is carried out at least one speech frame, obtain the voice feature data of at least one speech frame.
S103: be directed to each voice signal, the voice feature data of at least one speech frame based on the voice signal, Using acoustics identification model to be trained, at least one speech frame for obtaining the voice signal, which corresponds to, wakes up word class and non-wake-up The posterior probability of word class.
S104: corresponded to according at least one speech frame of each voice signal and wake up word class and non-wake-up word class Posterior probability and each voice signal at least one speech frame true generic, treated using loss function Training acoustics identification model is trained, and obtains each model parameter, wherein true generic is in advance to voice signal What each speech frame obtained after being labeled.
S105: according to each model parameter, acoustics identification model is generated.
5, based on the HMM of viterbi algorithm, word class is waken up to correspond to according at least one speech frame of voice signal With the non-posterior probability for waking up word class, the highest machine learning model for waking up word path score of numerical value is sought.
6, word mode is waken up, in the application, to be corresponding with word class is waken up for waking up the mode of terminal device, including But it is not limited to: single to wake up word mode, more wake-up word modes and simply wake up word mode.Wherein:
It is single to wake up word mode, it is single to wake up word mode in the application to wake up the mode that word wakes up terminal device by one To wake up the mode that word wakes up terminal device by first, wherein the first wake-up word can include but is not limited to: standard wakes up Word.Such as: small A small A, ABAB etc..
It is wake up word mode more, the mould that word wakes up terminal device is waken up to wake up any one in word by least two Formula, in the application, wake up word modes be by least two second wake up any one in words second wake up word and wake up eventually The mode of end equipment, wherein wake up under word mode, the second wake-up word can include but is not limited to: standard wakes up word, standard is called out more Simple wake-up word and customized wake-up word of awake word etc..Such as: small A, AB, small A small A, ABAB etc..
It is simple to wake up word mode, to wake up the mode that word wakes up terminal device by least one, in the application, simply call out Word mode of waking up is that the mode of terminal device is waken up by any one second wake-up word at least one second wake-up word, In, simple to wake up under word mode, the second wake-up word can include but is not limited to: standard wakes up the simple wake-up word of word and customized Wake up word etc..Such as: small A, AB etc..
7, record information is waken up, wakes up the flag information that word executes wake operation using first to characterize.In the application, mark Will information can include but is not limited to: characterization has utilized the zone bit information and temporal information of the first wake-up word execution wake operation Deng.Such as: temporal information can be but be not limited to: the timing time of timer, the count value of counter are called out using first At the time of word of waking up executes wake operation;Zone bit information can be but be not limited to: characterization is called out using the first wake-up word execution The flag bit " 1 " of operation, the characterization of waking up do not execute flag bit " 0 " of wake operation etc. using the first wake-up word.
It should be noted that " first " that refers in the application, " second " etc. are to be used to distinguish similar objects, without It is used to describe a particular order or precedence order.It should be understood that such term is interchangeable under appropriate circumstances, so as to here The embodiment of description can be implemented with the sequence other than the content for illustrating or describing herein.
In order to which the purpose, technical solution and beneficial effect of the application is more clearly understood, below with reference to the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is only some embodiments of the present application, is not whole embodiments.Based on the embodiment in the application, this field is common Technical staff's every other embodiment obtained without making creative work belongs to the model of the application protection It encloses.
Currently, waking up the standard that word has usually been set in voice call service and waking up word, user can only pass through Standard wakes up word and wakes up terminal device, and it is more single to wake up word, moreover, in order to reduce false wake-up rate, standard wakes up word and is usually Folded word, such as: small A small A, ABAB etc., however, this kind of practical language habit for waking up word and not meeting user usually, it is possible to shadow Ring the user experience of voice call service.
For this purpose, the embodiment of the present application provides a kind of voice wake-up system, as shown in fig.2, voice wake-up system can To include the terminal device 202 and server 203 for being equipped with client 201, wherein client 201 can use terminal and set Standby 202, it is communicatively coupled by communication network and server 203.In practical application, terminal device 202 can be with real-time reception Voice signal, and the voice signal received is stored to buffer zone, client 201 can be obtained to from from buffer zone The voice signal of reason, and wake-up identification is carried out to the voice signal, to be determined according to the wake-up recognition result of the voice signal Whether terminal device 202 is waken up.Certainly, client 201 can also incite somebody to action after obtaining voice signal to be processed in buffer zone The voice signal carries and is sent to server 203 in waking up identification request, and server 203 can be obtained from waking up in identification request Voice signal to be processed is taken, and wake-up identification is carried out to the voice signal, according to the wake-up recognition result of the voice signal, really Determine after whether waking up terminal device 202, the judgement result that whether will need to wake up terminal device 202, which carries, is waking up identification response In be back to client 201, client 201 can identify the judgement result that response carries according to the wake-up, it is determined whether wake up Terminal device 202.
In practical application, client 201 or server 203, can be using these when carrying out waking up identification to voice signal Apply for the voice awakening method that embodiment provides, specifically, client 201 or server 203 can get it is to be processed After voice signal, information is recorded according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, if it is determined that Pass through the first wake-up word and waken up terminal device, then when further determining that in voice signal comprising the second wake-up word, passes through second It wakes up word and wakes up terminal device;If it is determined that not waking up word by first wakes up terminal device, then further determine that in voice signal When waking up word comprising first, word is waken up by first and wakes up terminal device.In this way, waking up word by first wakes up terminal device Afterwards, word can be waken up by second and wake up terminal device, so as to be accustomed to according to practical language, for terminal device setting second Word is waken up, and then realizes the diversification for waking up word, moreover, only just may be used after waking up word wake-up terminal device by first With by second wake up word wake up terminal device, thus be avoided as much as because setting second wake up word it is too simple caused by The problem of false wake-up rate is promoted.
It is tellable to be, in the embodiment of the present application, wake-up knowledge is carried out to voice signal by client 201 and is claimed otherwise To wake up offline, voice signal wake up by server 203 and knows referred to as online wake-up otherwise, the embodiment of the present application The voice awakening method of offer is not only suitable for waking up offline, is also applied for waking up online.In addition, it should be understood that the terminal in Fig. 2 The number of equipment, communication network and server is only schematical, can have any number of terminal according to actual needs Equipment, communication network and server.In practical application, when the voice wake-up device for running voice awakening method do not need with It can only include being waken up for running the voice of voice awakening method when other equipment carry out data transmission, in voice wake-up system Equipment, for example, can only include terminal device or server in voice wake-up system.
After the application scenarios and design philosophy for describing the embodiment of the present application, below to provided by the embodiments of the present application Technical solution is illustrated.
The embodiment of the present application provides a kind of voice awakening method, which can be applied to be mounted on terminal Client in equipment, also can be applied to server, specifically, as shown in fig.3, voice provided by the embodiments of the present application is called out The process for method of waking up is as follows:
S301: voice signal to be processed is obtained.
In practical application, terminal device can be with real-time reception voice signal, and the voice signal received is stored to slow Region is deposited, the client installed on the terminal device can obtain voice signal to be processed from buffer zone.Certainly, when this When the voice awakening method for applying for that embodiment provides is applied to server, client is obtaining language to be processed from buffer zone After sound signal, can also the voice signal carry and be sent to server in waking up identification request, server can be from receiving Wake-up identification request in obtain voice signal to be processed.
S302: recording information according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, if It is then to execute S303;If it is not, then executing S304.
In the specific implementation, when executing S303, the zone bit information that information include can be recorded according to wake-up, judgement is The no first wake-up word that passed through wakes up terminal device.Such as: assuming that flag bit " 1 " characterization has utilized the first wake-up word to execute wake-up Operation, flag bit " 0 " characterization do not wake up word using first and execute wake operation, then can record information determining the wake-up saved When the flag bit for including is " 1 ", judgement has utilized the first wake-up word to execute wake operation, records information in the wake-up for determining preservation When the flag bit for including is " 0 ", determine that not waking up word using first executes wake operation.
In practical application, if determining that having passed through first calls out according to the zone bit information that the wake-up record information of preservation includes Word of waking up wakes up terminal device, then in one embodiment, can directly determine current awake word mode is simple wake-up word mode, In such cases, at least one in the second wake-up word such as simple wake-up word of word can be waken up based on customized wake-up word and standard A second wakes up word, carries out wake-up identification to the voice signal of acquisition, i.e. execution S303.
If the zone bit information that the wake-up record information according to preservation includes, determine that having passed through the first wake-up word wakes up terminal Equipment then in another embodiment can judge passing through the further according to the record information temporal information that include is waken up One wakes up in the time interval after word wakes up terminal device, if receives characterization and opens the simple voice letter for waking up word mode Number, if it is determined that receiving characterization opens the simple voice signal for waking up word mode, then current awake word mode can be determined for letter It is single to wake up word mode, in such cases, simple wake-up word of word etc. second can be waken up based on customized wake-up word and standard and is called out At least one of word second of waking up wakes up word, carries out wake-up identification to the voice signal of acquisition, i.e. execution S303;If it is determined that not connecing It receives characterization and opens the simple voice signal for waking up word mode, then can determine that current awake word mode wakes up word mode to be single, In such cases, word etc. first can be waken up based on standard and wakes up word, wake-up identification is carried out to the voice signal of acquisition, that is, is executed S304 further after executing S304, can also update calling out for preservation according to the current flag information for executing wake operation It wakes up and records information.
If the zone bit information that the wake-up record information according to preservation includes, determine that having passed through the first wake-up word wakes up terminal Equipment then in another embodiment further can also record information according to the wake-up of preservation, judgement wakes up word by first Whether the time interval after waking up terminal device is within the scope of setting time.Specifically, can include according to record information is waken up Temporal information, judgement by first wake up word wake up terminal device after time interval whether within the scope of setting time.Example Such as: assuming that temporal information is the timing time of timer, then can be not more than time threshold in the timing time for determining timer When, determine through the time interval after the first wake-up word wake-up terminal device within the scope of setting time, is determining timer When timing time is greater than time threshold, determine through the time interval after the first wake-up word wake-up terminal device not in setting time In range.
Wherein, if according to the temporal information that the wake-up record information of preservation includes, determine that waking up word by first wakes up eventually Time interval after end equipment then may further determine that current awake word mode is more wake-up word moulds within the scope of setting time Formula can wake up word, customized simple wake-up word for waking up word and standard wake-up word etc. second based on standard and call out in such cases At least two second in awake word wake up words, carry out wake-up identification to the voice signal of acquisition, i.e. execution S303;If according to preservation The wake-up record information temporal information that includes, determine that the time interval after waking up terminal device by the first wake-up word is not being set It fixes time in range, then may further determine that current awake word mode can be based in such cases for single word mode that wakes up Standard wakes up word etc. first and wakes up word, carries out wake-up identification to the voice signal of acquisition, i.e. execution S304 is further being held After row S304, the wake-up record information of preservation according to the current flag information for executing wake operation, can also be updated.
In practical application, if determining and not called out by first according to the zone bit information that the wake-up record information of preservation includes Word of waking up wakes up terminal device, then may further determine that current awake word mode wakes up word mode to be single, in such cases, can be with Word etc. first is waken up based on standard and wakes up word, wake-up identification is carried out to the voice signal of acquisition, i.e. execution S304, further, After executing S304, the wake-up record information of preservation according to the current flag information for executing wake operation, can also be updated.
S303: when determining in voice signal comprising the second wake-up word, pass through second and wake up word wake-up terminal device.
In practical application, when executing S303, it can use but be not limited to following manner:
Firstly, carrying out sub-frame processing to the voice signal, at least one speech frame is obtained, and at least to the voice signal One speech frame carries out feature extraction, obtains the voice feature data of at least one speech frame of the voice signal.
Tellable to be, because being influenced by factors such as gains, the volume of some voice signals is likely to be at reduced levels, base In this, in the embodiment of the present application, before carrying out sub-frame processing to the voice signal, automatic growth control can also be used (Automatic Gain Control, AGC) technology, enhances the volume of the voice signal, so that the voice signal that volume is too low Reach the level that can be identified.
Further, enhance the voice signal volume after, mobile window function can be used, to the voice signal into Row sub-frame processing obtains at least one speech frame of the voice signal, wherein there may be parts to hand between every two speech frame It is folded.Such as: it can be 25ms according to frame length, the framing mode that frame shifting is 10ms, sub-frame processing is carried out to the voice signal, this Sample, the length of obtained each speech frame are 25 milliseconds, and have the friendship of 25ms-10ms=15ms between every two speech frame It is folded.
Further, to the voice signal carry out sub-frame processing, obtain the voice signal at least one speech frame it Afterwards, at least one speech frame of the voice signal, mel cepstrum coefficients (Mel-scale Frequency can be used Cepstral Coefficients, MFCC), the voice feature data of at least one speech frame of the voice signal is obtained, that is, is obtained Obtain the speech feature vector of at least one speech frame of the voice signal.
Then, the voice feature data of at least one speech frame based on the voice signal is obtained using acoustics identification model At least one speech frame to the voice signal corresponds to wake-up word class and the non-posterior probability for waking up word class.Wherein, language Sound frame correspond to wake up word class posterior probability include: the speech frame correspond to wake up word (the wake-up word be it is simple wake up word, Such as: small A, AB etc.) include each phoneme posterior probability;Speech frame corresponds to the non-posterior probability packet for waking up word class Include: the speech frame corresponds to the posterior probability of ambient noise phoneme, the speech frame corresponds to the non-posterior probability for waking up word phoneme.
Such as: assuming that carrying out sub-frame processing to the voice signal, 100 speech frames of the voice signal are obtained, wake up word For small A and small A includes 6 phonemes, then can generate phonetic feature matrix based on the speech feature vector of 100 speech frames, And by the phonetic feature Input matrix acoustics identification model, the posterior probability matrix of the voice signal is obtained, wherein the posteriority is general The matrix that rate matrix is 100 × 8, is classified as each speech frame, and behavior corresponds to posterior probability, the correspondence that small A includes 6 phonemes In the posterior probability of ambient noise phoneme and corresponding to the non-posterior probability for waking up word phoneme.
Tellable to be, when voice awakening method provided by the embodiments of the present application is applied to server, server can be with Using the acoustics identification model pre-established, posterior probability is predicted, when voice wake-up side provided by the embodiments of the present application When method is applied to client, which can be configured to by server after the foundation for completing acoustics identification model In client, so that the acoustics identification model of server configuration can be used in client, posterior probability is predicted.
Secondly, after at least one speech frame based on the voice signal corresponds to wake-up word class and non-wake-up word class Probability is tested, wake-up confidence level of the voice signal corresponding to each the second wake-up word is obtained.
In practical application, corresponds at least one speech frame based on the voice signal and wake up word class and non-wake-up word The posterior probability of classification, obtain the voice signal corresponding to each second wake up word wake-up confidence level when, can use but It is not limited to following manner:
First way: for each the second wake-up word, at least one speech frame based on the voice signal corresponds to It wakes up word class and the non-posterior probability for waking up word class obtains numerical value highest second using the HMM based on viterbi algorithm Word path score is waken up, and the numerical value highest second is waken up into word path score, is determined as the voice signal corresponding to second Wake up the wake-up confidence level of word.
Such as: assuming that the second wake-up word includes small A small A and small A, then the posterior probability matrix based on the voice signal, makes It is calculated separately wake-up word path for continuously being there is a plurality of wake-up word path of the small A of small A with the HMM based on viterbi algorithm and obtained Point, and by the highest wake-up word path score of numerical value, it is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A, into one Step, the posterior probability matrix based on the voice signal, using the HMM based on viterbi algorithm, for continuously there are the more of small A Item wakes up word path, calculates separately and wakes up word path score, and by the highest wake-up word path score of numerical value, is determined as the voice Signal corresponds to the wake-up confidence level of small A.
The second way: for each the second wake-up word, corresponded to based at least one speech frame and wake up word class It obtains numerical value highest second using the HMM based on viterbi algorithm with the non-posterior probability for waking up word class and wakes up word path Score and the highest non-wake-up word path score of numerical value, and the numerical value highest second is waken up into word path score and the numerical value The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal wakes up word corresponding to second.
Such as: assuming that the second wake-up word includes small A small A and small A, then the posterior probability matrix based on the voice signal, makes It is calculated separately wake-up word path for continuously being there is a plurality of wake-up word path of the small A of small A with the HMM based on viterbi algorithm and obtained Point, and for continuously there is the non-a plurality of non-wake-up word path for waking up word, non-wake-up word path score is calculated separately, and will count It is worth highest wake-up word path score and the highest non-difference for waking up word path score of numerical value, it is corresponding is determined as the voice signal In the wake-up confidence level of the small A of small A, further, the posterior probability matrix based on the voice signal, using based on viterbi algorithm HMM, for continuously there is a plurality of wake-up word path of small A, calculate separately wake up word path score, and for continuously occur it is non- The a plurality of non-wake-up word path for waking up word, calculates separately non-wake-up word path score, and by the highest wake-up word path of numerical value Score and the highest non-difference for waking up word path score of numerical value, are determined as the wake-up confidence level that the voice signal corresponds to small A.
Finally, detecting that the voice signal corresponds to any one second wake-up confidence level for waking up word and is not less than arousal threshold When value, determine to wake up word comprising second in the voice signal.In such cases, when voice wake-up side provided by the embodiments of the present application When method is applied to client, client directly can wake up word by second and wake up terminal device;And when the embodiment of the present application mentions When the voice awakening method of confession is applied to server, the wake-up identification response that characterization can be waken up terminal device by server is returned To client, when client receives the wake-up identification response of characterization wake-up terminal device, word can be waken up by second and waken up Terminal device.
It is tellable to be, when detect the voice signal correspond to each second wake-up word wake-up confidence level be respectively less than When threshold wake-up value, it is possible to determine that do not include second in the voice signal and wake up word.In such cases, when the embodiment of the present application provides Voice awakening method when being applied to client, client can directly abandon the voice signal, and continue from buffer zone It obtains voice signal to be processed and carries out wake-up identifying processing;And when voice awakening method provided by the embodiments of the present application is applied to When server, the wake-up identification response that characterization can not waken up terminal device by server is back to client, and client receives When not waking up the wake-up identification response of terminal device to characterization, wake operation is not executed to terminal device, and continue from buffer area Voice signal to be processed is obtained in domain, carries out wake-up identifying processing to be sent to server.
Such as: assuming that the second wake-up word includes small A small A and small A, then detect the voice signal calling out corresponding to the small A of small A When confidence level of waking up is not less than threshold wake-up value, terminal device can be waken up, detects that the voice signal corresponds to the wake-up confidence of small A When degree is not less than threshold wake-up value, terminal device can also be waken up, and detects the voice signal and is set corresponding to the wake-up of the small A of small A When reliability and wake-up confidence level corresponding to small A are respectively less than threshold wake-up value, terminal device is not waken up, and continue from buffer zone It is middle to obtain voice signal to be processed.
S304: when determining in voice signal comprising the first wake-up word, pass through first and wake up word wake-up terminal device.
Specifically, can use when executing S304 but be not limited to following manner:
Firstly, carrying out sub-frame processing to the voice signal, at least one speech frame is obtained, and at least one speech frame Feature extraction is carried out, the voice feature data of at least one speech frame is obtained.
Likewise, before carrying out sub-frame processing to the voice signal, AGC skill can also be used in the embodiment of the present application Art enhances the volume of the voice signal, so that the too low voice signal of volume reaches the level that can be identified.Further, In After the volume for enhancing the voice signal, mobile window function can be used, sub-frame processing is carried out to the voice signal, obtains the language At least one speech frame of sound signal, and MFCC is used, obtain the phonetic feature number of at least one speech frame of the voice signal According to obtaining the speech feature vector of at least one speech frame of the voice signal.Wherein, specific implementation and foregoing description Implementation it is identical, overlaps will not be repeated.
Then, the voice feature data of at least one speech frame based on the voice signal is obtained using acoustics identification model At least one speech frame to the voice signal corresponds to wake-up word class and the non-posterior probability for waking up word class.Likewise, Speech frame corresponds to that wake up the posterior probability of word class include: that the speech frame corresponds to and wakes up word (the wake-up word is simple wakes up Word, such as: small A, AB etc.) include each phoneme posterior probability, speech frame correspond to it is non-wake up word class posteriority it is general Rate includes: that the speech frame corresponds to the posterior probability of ambient noise phoneme, the speech frame corresponds to the non-posteriority for waking up word phoneme Probability.Wherein, specific implementation is identical as the implementation of foregoing description, and overlaps will not be repeated.
Secondly, after at least one speech frame based on the voice signal corresponds to wake-up word class and non-wake-up word class Probability is tested, the wake-up confidence level that the voice signal wakes up word corresponding to first is obtained.
In practical application, corresponds at least one speech frame based on the voice signal and wake up word class and non-wake-up word The posterior probability of classification can be used but be not limited to when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word Following manner:
First way: at least one speech frame based on the voice signal, which corresponds to, wakes up word class and non-wake-up part of speech Other posterior probability is obtained numerical value highest first and is waken up word path score using the HMM based on viterbi algorithm, and should Numerical value highest first wakes up word path score, is determined as the wake-up confidence level that the voice signal wakes up word corresponding to first.
Such as: assuming that first to wake up word include the small A of small A, then the posterior probability matrix based on the voice signal, using being based on The HMM of viterbi algorithm is calculated separately for continuously there is a plurality of wake-up word path of the small A of small A and is waken up word path score, and By the highest wake-up word path score of numerical value, it is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
The second way: at least one speech frame based on the voice signal, which corresponds to, wakes up word class and non-wake-up part of speech Other posterior probability is obtained numerical value highest first and is waken up word path score and numerical value using the HMM based on viterbi algorithm Highest non-wake-up word path score, and the numerical value highest first is waken up into word path score and the highest non-wake-up of the numerical value The difference of word path score is determined as the wake-up confidence level that the voice signal wakes up word corresponding to first.
Such as: assuming that first to wake up word include the small A of small A, then the posterior probability matrix based on the voice signal, using being based on The HMM of viterbi algorithm is calculated separately for continuously there is a plurality of wake-up word path of the small A of small A and is waken up word path score, and For continuously there is the non-a plurality of non-wake-up word path for waking up word, non-wake-up word path score is calculated separately, and most by numerical value High wakes up word path score and the highest non-difference for waking up word path score of numerical value, is determined as the voice signal corresponding to small The wake-up confidence level of the small A of A.
Finally, sentencing when detecting that the voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value Word is waken up comprising first in speech signal.In such cases, when voice awakening method provided by the embodiments of the present application is applied to When client, client directly can wake up word by first and wake up terminal device;And work as voice provided by the embodiments of the present application When awakening method is applied to server, the wake-up identification response that characterization can be waken up terminal device by server is back to client End when client receives the wake-up identification response of characterization wake-up terminal device, can wake up terminal by the first wake-up word and set It is standby.
It is tellable to be, when the wake-up confidence level for detecting that the voice signal wakes up word corresponding to first is less than threshold wake-up value When, it is possible to determine that do not include first in the voice signal and wakes up word.In such cases, when voice provided by the embodiments of the present application is called out When method of waking up is applied to client, client can directly abandon the voice signal, and continue to obtain to from from buffer zone The voice signal of reason carries out wake-up identifying processing;And when voice awakening method provided by the embodiments of the present application is applied to server When, the wake-up identification response that characterization can not waken up terminal device by server is back to client, and client receives characterization When not waking up the wake-up identification response of terminal device, wake operation is not executed to terminal device, and continue to obtain from buffer zone Voice signal to be processed is taken, carries out wake-up identifying processing to be sent to server.
Such as: assuming that the first wake-up word includes the small A of small A, then detect that the voice signal is set corresponding to the wake-up of the small A of small A When reliability is not less than threshold wake-up value, terminal device can be waken up, detects that the voice signal corresponds to the wake-up confidence of the small A of small A When degree is less than threshold wake-up value, terminal device is not waken up, and continues to obtain voice signal to be processed from buffer zone.
Below with " offline to wake up " for concrete application scene, with " single word mode that wakes up mutually is cut with the simple word mode that wakes up Change " it is specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first calls out Awake word includes the small A of small A, and the second wake-up word includes small A.Specifically, as shown in fig.4, voice provided by the embodiments of the present application is called out The detailed process for method of waking up is as follows:
S401: client obtains voice signal to be processed from buffer zone.
S402: client is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S403: client uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S404: the speech feature vector of 100 speech frame of the client based on the voice signal generates the voice signal Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal Test probability matrix.
S405: client judges whether to have passed through small A small according to the wake-up record information zone bit information that includes of preservation A wakes up terminal device, if so, executing S406;If not, it is determined that current awake word mode is single wake-up word mode, and is executed S410。
S406: the temporal information that client includes according to the wake-up record information of preservation, judgement are waken up eventually by the small A of small A Client-initiated characterization whether is received in time interval after end equipment opens the simple voice signal for waking up word mode, if It is, it is determined that current awake word mode is simple wake-up word mode, and executes S407;If not, it is determined that current awake word mode Word mode is waken up to be single, and executes S410.
It is tellable to be, it, can be with one after client determines current awake word mode for simple wake-up word mode in S406 It is directly based on small A, wake-up identification is carried out to the voice signal of acquisition, until determining that receiving Client-initiated characterization closing simply calls out Until the voice signal for word mode of waking up, and from the simple word pattern switching that wakes up to single wake-up word mode.
S407: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S408: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if It is then to execute S409;If it is not, then executing S413.
S409: client wakes up terminal device by small A, returns to S401.
S410: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S411: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A, If so, executing S412;If it is not, then executing S413.
S412: client wakes up terminal device by the small A of small A, and according to the current flag information for executing wake operation, more The wake-up record information newly saved, returns to S401.
S413: client does not wake up terminal device, returns to S401.
Below with " online to wake up " for concrete application scene, with " single word mode that wakes up mutually is cut with the simple word mode that wakes up Change " it is specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first calls out Awake word includes the small A of small A, and the second wake-up word includes small A.Specifically, as shown in fig.5, voice provided by the embodiments of the present application is called out The detailed process for method of waking up is as follows:
S501: client obtains voice signal to be processed from buffer zone.
S502: the voice signal to be processed is carried and is sent to server in waking up identification request by client.
S503: when server receives wake-up identification request, voice signal to be processed is obtained from waking up in identification request.
S504: server is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S505: server uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S506: the speech feature vector of 100 speech frame of the server based on the voice signal generates the voice signal Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal Test probability matrix.
S507: server judges whether to have passed through small A small according to the wake-up record information zone bit information that includes of preservation A wakes up terminal device, if so, executing S508;If not, it is determined that current awake word mode is single wake-up word mode, and is executed S513。
S508: the temporal information that server includes according to the wake-up record information of preservation, judgement are waken up eventually by the small A of small A Client-initiated characterization whether is received in time interval after end equipment opens the simple voice signal for waking up word mode, if It is, it is determined that current awake word mode is simple wake-up word mode, and executes S509;If not, it is determined that current awake word mode Word mode is waken up to be single, and executes S513.
It is tellable to be, it, can be with one after client determines current awake word mode for simple wake-up word mode in S508 It is directly based on small A, wake-up identification is carried out to the voice signal of acquisition, until determining that receiving Client-initiated characterization closing simply calls out Until the voice signal for word mode of waking up, and from the simple word pattern switching that wakes up to single wake-up word mode.
S509: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S510: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if It is then to execute S511;If it is not, then executing S518.
S511: server returns to characterization to client and identifies response by the wake-up that small A wakes up terminal device.
S512: it when client receives wake-up identification response of the characterization by small A wake-up terminal device, is waken up by small A Terminal device returns to S501.
S513: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S514: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A, If so, executing S515;If it is not, then executing S518.
S515: server returns to characterization to client and identifies response by the wake-up that the small A of small A wakes up terminal device.
S516: the current flag information for executing and returning and operating is determined as client and currently executes wake operation by server Flag information, and currently execute according to client the flag information of wake operation, update the wake-up record information of preservation.
S517: small by small A when client receives wake-up identification response of the characterization by small A small A wake-up terminal device A wakes up terminal device, returns to S501.
S518: server returns to the wake-up identification response that characterization does not wake up terminal device to client.
S519: when client receives the wake-up identification response for characterizing and not waking up terminal device, terminal device is not waken up, is returned Return S501.
Below with " offline to wake up " for concrete application scene, with " single wake-up word mode and the word mode that wakes up mutually switch more " For specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first wakes up word Including the small A of small A, the second wake-up word includes small A small A and small A.Specifically, as shown in fig.6, language provided by the embodiments of the present application The detailed process of sound awakening method is as follows:
S601: client obtains voice signal to be processed from buffer zone.
S602: client is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S603: client uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S604: the speech feature vector of 100 speech frame of the client based on the voice signal generates the voice signal Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal Test probability matrix.
S605: client records information according to the wake-up of preservation, and judgement wakes up the time after terminal device by the small A of small A Interval if so, determining that current awake word mode is more wake-up word modes, and executes S606 whether within the scope of setting time; If not, it is determined that current awake word mode is single wake-up word mode, and executes S612.
S606: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S607: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S608: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A, If so, executing S609;If it is not, then executing S610.
S609: client wakes up terminal device by the small A of small A, returns to S601.
S610: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if It is then to execute S611;If it is not, then executing S615.
S611: client wakes up terminal device by small A, returns to S601.
S612: posterior probability matrix of the client based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S613: client judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A, If so, executing S614;If it is not, then executing S615.
S614: client wakes up terminal device by the small A of small A, and according to the current flag information for executing wake operation, more The wake-up record information newly saved, returns to S601.
S615: client does not wake up terminal device, returns to S601.
Below with " online to wake up " for concrete application scene, with " single wake-up word mode and the word mode that wakes up mutually switch more " For specific embodiment, voice awakening method provided by the embodiments of the present application is described in further detail, wherein first wakes up word Including the small A of small A, the second wake-up word includes small A small A and small A.Specifically, as shown in fig.7, language provided by the embodiments of the present application The detailed process of sound awakening method is as follows:
S701: client obtains voice signal to be processed from buffer zone.
S702: the voice signal to be processed is carried and is sent to server in waking up identification request by client.
S703: when server receives wake-up identification request, voice signal to be processed is obtained from waking up in identification request.
S704: server is using AGC technology, after the volume for enhancing the voice signal, using mobile window function, to the voice Signal carries out sub-frame processing, obtains 100 speech frames of the voice signal.
S705: server uses MFCC, obtains the speech feature vector of 100 speech frames of the voice signal.
S706: the speech feature vector of 100 speech frame of the server based on the voice signal generates the voice signal Phonetic feature matrix, and by the phonetic feature Input matrix acoustics identification model of the voice signal, after obtaining the voice signal Test probability matrix.
S707: server records information according to the wake-up of preservation, and judgement wakes up the time after terminal device by the small A of small A Interval if so, determining that current awake word mode is more wake-up word modes, and executes S708 whether within the scope of setting time; If not, it is determined that current awake word mode is single wake-up word mode, and executes S716.
S708: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S709: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of small A, calculate separately and wake up word path score, and is directed to and the more of non-wake-up word continuously occur The non-wake-up word path of item, calculates separately non-wake-up word path score, and by the highest wake-up word path score of numerical value and numerical value The highest non-difference for waking up word path score, is determined as the wake-up confidence level that the voice signal corresponds to small A.
S710: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A, If so, executing S711;If it is not, then executing S713.
S711: server returns to characterization to client and identifies response by the wake-up that the small A of small A wakes up terminal device.
S712: small by small A when client receives wake-up identification response of the characterization by small A small A wake-up terminal device A wakes up terminal device, returns to S701.
S713: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of small A, if It is then to execute S714;If it is not, then executing S716.
S714: server returns to characterization to client and identifies response by the wake-up that small A wakes up terminal device.
S715: it when client receives wake-up identification response of the characterization by small A wake-up terminal device, is waken up by small A Terminal device returns to S701.
S716: posterior probability matrix of the server based on the voice signal, using the HMM based on viterbi algorithm, for Continuously there is a plurality of wake-up word path of the small A of small A, calculates separately and wake up word path score, and be directed to and non-wake-up word continuously occur A plurality of non-wake-up word path, calculate separately non-wake-up word path score, and by the highest wake-up word path score of numerical value with The highest non-difference for waking up word path score of numerical value, is determined as the wake-up confidence level that the voice signal corresponds to the small A of small A.
S717: server judges whether the voice signal is not less than threshold wake-up value corresponding to the wake-up confidence level of the small A of small A, If so, executing S718;If it is not, then executing S721.
S718: server returns to characterization to client and identifies response by the wake-up that the small A of small A wakes up terminal device.
S719: the current flag information for executing and returning and operating is determined as client and currently executes wake operation by server Flag information, and currently execute according to client the flag information of wake operation, update the wake-up record information of preservation.
S720: small by small A when client receives wake-up identification response of the characterization by small A small A wake-up terminal device A wakes up terminal device, returns to S701.
S721: server returns to the wake-up identification response that characterization does not wake up terminal device to client.
S722: when client receives the wake-up identification response for characterizing and not waking up terminal device, terminal device is not waken up, is returned Return S701.
Based on the above embodiment, the embodiment of the present application provides a kind of voice Rouser, as shown in fig.8, the application The voice Rouser 800 that embodiment provides includes at least:
Signal acquiring unit 801, for receiving voice signal;
First judging unit 802 judges whether to call out by the first wake-up word for recording information according to the wake-up of preservation Awake terminal device, wherein waking up record information is the flag information that characterization executes wake operation using the first wake-up word;
First wakeup unit 803 has passed through the first wake-up word for the judgement of the first judging unit 802 and has waken up terminal device, When then determining in voice signal comprising the second wake-up word, passes through second and wake up word wake-up terminal device;
Second wakeup unit 804, if determining that not waking up terminal by the first wake-up word sets for the first judging unit 802 It is standby, it is determined that when waking up word comprising first in voice signal, to wake up word by first and wake up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up word of the first wake-up word and customized calls out At least one of awake word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Second judgment unit 805, for the first wakeup unit 803 determine voice signal in comprising second wake up word it Before, information is recorded according to the wake-up of preservation, is determined through the time interval after the first wake-up word wake-up terminal device in setting Between in range.
In a kind of possible embodiment, the second wakeup unit 804 is also used to:
If second judgment unit 805 records information according to the wake-up of preservation, determine that waking up terminal by the first wake-up word sets Time interval after standby is not within the scope of setting time, it is determined that when waking up word comprising first in voice signal, calls out by first Word of waking up wakes up terminal device.
In a kind of possible embodiment, the second wake-up word includes the simple wake-up of the first wake-up word, the first wake-up word At least one of word and customized wake-up word.
In a kind of possible embodiment, voice Rouser provided by the embodiments of the present application further include:
Information updating unit 806 is used for after the second wakeup unit 804 wakes up word wake-up terminal device by first, The flag information that wake operation is currently executed according to the second wakeup unit 804 updates the wake-up record information of preservation.
In a kind of possible embodiment, when waking up word comprising second in determining voice signal, the first wakeup unit 803 are specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter Number correspond to each second wake up word wake-up confidence level;
When detecting that voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence Word is waken up comprising second in speech signal.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit 803 are specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path score using the HMM based on viterbi algorithm, and by numerical value Highest second wakes up word path score, is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to each the second wake-up word, the first wakeup unit 803 are specifically used for:
For each the second wake-up word, is corresponded to based at least one speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path score and numerical value most using the HMM based on viterbi algorithm High non-wake-up word path score, and numerical value highest second is waken up into word path score and the highest non-wake-up word path of numerical value The difference of score is determined as the wake-up confidence level that voice signal wakes up word corresponding to second.
In a kind of possible embodiment, when waking up word comprising first in determining voice signal, the second wakeup unit 804 are specifically used for:
Sub-frame processing is carried out to voice signal, obtains at least one speech frame, and feature is carried out at least one speech frame It extracts, obtains the voice feature data of at least one speech frame;
Voice feature data based at least one speech frame obtains at least one speech frame using acoustics identification model Corresponding to the posterior probability for waking up word class and non-wake-up word class;
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, obtains voice letter Number correspond to first wake up word wake-up confidence level;
When detecting that voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, voice letter is determined Word is waken up comprising first in number.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit 804 is specific For:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score than the HMM of algorithm;
Numerical value highest first is waken up into word path score, the wake-up for being determined as voice signal corresponding to the first wake-up word is set Reliability.
In a kind of possible embodiment, word class and non-wake-up word are waken up corresponding to based at least one speech frame The posterior probability of classification, when obtaining wake-up confidence level of the voice signal corresponding to the first wake-up word, the second wakeup unit 804 is specific For:
Corresponded to based at least one speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score and the highest non-wake-up word path score of numerical value than the HMM of algorithm;
Numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of numerical value, is determined Correspond to the first wake-up confidence level for waking up word for voice signal.
It should be noted that the application is real when voice awakening method provided by the embodiments of the present application is applied to server The voice Rouser 800 for applying example offer can be set in server, when voice awakening method provided by the embodiments of the present application When applied to terminal device, voice Rouser 800 provided by the embodiments of the present application be can be set in terminal device.
In addition, voice Rouser 800 provided by the embodiments of the present application solves the principle of technical problem and the application is implemented The voice awakening method that example provides is similar, and therefore, the implementation of voice Rouser 800 provided by the embodiments of the present application may refer to The implementation of voice awakening method provided by the embodiments of the present application, overlaps will not be repeated.
After describing voice provided by the embodiments of the present application and waking up system, method and apparatus, next, to the application The voice wake-up device that embodiment provides simply is introduced.
Voice wake-up device provided by the embodiments of the present application can be terminal device, be also possible to server, refering to Fig. 9 institute Show, voice wake-up device 900 provided by the embodiments of the present application includes at least: processor 901, memory 902 and being stored in storage On device 902 and the computer program that can run on processor 901, processor 901 realize the application when executing computer program The voice awakening method that embodiment provides.
It should be noted that voice wake-up device 900 shown in Fig. 9 is only an example, the application should not be implemented The function and use scope of example bring any restrictions.
Voice wake-up device 900 provided by the embodiments of the present application can also include connecting different components (including processor 901 With memory 902) bus 903.Wherein, bus 903 indicates one of a few class bus structures or a variety of, including memory is total Line, peripheral bus, local bus etc..
Memory 902 may include the readable medium of form of volatile memory, such as random access memory (Random Access Memory, RAM) 921 and/or cache memory 922, it can further include read-only memory (Read Only Memory, ROM) 923.
Memory 902 can also include the program means 925 with one group of (at least one) program module 924, program mould Block 924 includes but is not limited to: operational subsystems, one or more application program, other program modules and program data, this It may include the realization of network environment in each of a little examples or certain combination.
Voice wake-up device 900 can also be communicated with one or more external equipments 904 (such as keyboard, remote controler etc.), The equipment interacted with voice wake-up device 900 communication (such as mobile phone, computer can also be enabled a user to one or more Deng), and/or, it is any with voice wake-up device 900 is communicated with one or more of the other voice wake-up device 900 Equipment (such as router, modem etc.) communication.This communication can pass through input/output (Input/Output, I/O) Interface 905 carries out.Also, voice wake-up device 900 can also pass through network adapter 906 and one or more network (example Such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and/or public network, Such as internet) communication.As shown in figure 9, network adapter 906 passes through other modules of bus 903 and voice wake-up device 900 Communication.It will be appreciated that though being not shown in Fig. 9, other hardware and/or software mould can be used in conjunction with voice wake-up device 900 Block, including but not limited to: microcode, device driver, redundant processor, external disk drive array, disk array (Redundant Arrays of Independent Disks, RAID) subsystem, tape drive and data backup storage Subsystem etc..
In addition, the embodiment of the present application also provides a kind of computer readable storage medium, the computer readable storage medium It is stored with computer instruction, voice awakening method provided by the embodiments of the present application is realized when computer instruction is executed by processor. Specifically, which can be built-in or be mounted in voice wake-up device 900, in this way, voice wake-up device 900 is just Voice awakening method provided by the embodiments of the present application can be realized by executing built-in or installation executable program.
In addition, voice awakening method provided by the embodiments of the present application is also implemented as a kind of program product, which is produced Product include program code, and when the program product can be run in voice wake-up device 900, the program code is for making voice Wake-up device 900 executes voice awakening method provided by the embodiments of the present application.
Program product provided by the embodiments of the present application can be using any combination of one or more readable mediums, wherein Readable medium can be readable signal medium or readable storage medium storing program for executing, and readable storage medium storing program for executing can be but it is electric to be not limited to, Magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination is specifically, readable to deposit The more specific example (non exhaustive list) of storage media includes: electrical connection with one or more conducting wires, portable disc, hard Disk, RAM, ROM, erasable programmable read only memory (Erasable Programmable Read Only Memory, EPROM), optical fiber, portable compact disc read only memory (Compact Disc Read-Only Memory, CD-ROM), light are deposited Memory device, magnetic memory device or above-mentioned any appropriate combination.
Program product provided by the embodiments of the present application can also be set using CD-ROM and including program code in calculating Standby upper operation.However, program product provided by the embodiments of the present application is without being limited thereto, and in the embodiment of the present application, readable storage medium Matter can be any tangible medium for including or store program, which, which can be commanded execution system, device or device, makes With or it is in connection.
It should be noted that although being referred to several unit or sub-units of device in the above detailed description, this stroke It point is only exemplary not enforceable.In fact, according to presently filed embodiment, it is above-described two or more The feature and function of unit can embody in a unit.Conversely, the feature and function of an above-described unit can It is to be embodied by multiple units with further division.
In addition, although describing the operation of the application method in the accompanying drawings with particular order, this do not require that or Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one Step is decomposed into execution of multiple steps.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application Please embodiment spirit and scope.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application And its within the scope of equivalent technologies, then the application is also intended to include these modifications and variations.

Claims (22)

1. a kind of voice awakening method characterized by comprising
Obtain voice signal to be processed;
Information is recorded according to the wake-up of preservation, judges whether that having passed through the first wake-up word wakes up terminal device, wherein the wake-up Record information is the flag information that characterization wakes up word execution wake operation using described first;
If so, passing through described second when determining in the voice signal comprising the second wake-up word and waking up the word wake-up terminal Equipment;
If not, it is determined that when waking up word comprising described first in the voice signal, waken up described in word wake-up by described first Terminal device.
2. voice awakening method as described in claim 1, which is characterized in that the second wake-up word includes first wake-up At least one of the simple wake-up word of word and customized wake-up word.
3. voice awakening method as described in claim 1, which is characterized in that determine in the voice signal and waken up comprising second Before word, further includes:
Information is recorded according to the wake-up of preservation, is determined between waking up the time after the terminal device by the first wake-up word Every interior, receive characterization and open the simple voice signal for waking up word mode.
4. voice awakening method as claimed in claim 3, which is characterized in that further include:
If recording information according to the wake-up of preservation, the time after waking up the terminal device by the first wake-up word is determined In interval, characterization is not received and opens the simple voice signal for waking up word mode, it is determined that comprising described in the voice signal When the first wake-up word, passes through described first and wake up the word wake-up terminal device.
5. voice awakening method as described in claim 1, which is characterized in that determine in the voice signal and waken up comprising second Before word, further includes:
Information is recorded according to the wake-up of preservation, determines the time interval after waking up the terminal device by the first wake-up word Within the scope of setting time.
6. voice awakening method as claimed in claim 5, which is characterized in that further include:
If recording information according to the wake-up of preservation, determine that waking up word by described first woke up between the time after the terminal device Every not within the scope of setting time, it is determined that when waking up word comprising described first in the voice signal, called out by described first Word of waking up wakes up the terminal device.
7. voice awakening method as claimed in claim 5, which is characterized in that the second wake-up word includes first wake-up At least one of word, the simple wake-up word of the first wake-up word and customized wake-up word.
8. the voice awakening method as described in claim 1,4 or 6, which is characterized in that wake up word by described first and wake up institute After stating terminal device, further comprise:
According to the current flag information for executing wake operation, the wake-up record information of preservation is updated.
9. such as the described in any item voice awakening methods of claim 1-7, which is characterized in that determine in the voice signal and include Second wakes up word, comprising:
Sub-frame processing is carried out to the voice signal, obtains at least one speech frame, and carry out at least one described speech frame Feature extraction obtains the voice feature data of at least one speech frame;
Based on the voice feature data of at least one speech frame, using acoustics identification model, at least one described language is obtained Sound frame, which corresponds to, wakes up word class and the non-posterior probability for waking up word class;
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, obtains institute's predicate Sound signal corresponds to the wake-up confidence level of each the second wake-up word;
When detecting that the voice signal corresponds to wake-up confidence level of any one the second wake-up word not less than threshold wake-up value, sentence Word is waken up comprising second in the fixed voice signal.
10. voice awakening method as claimed in claim 9, which is characterized in that corresponded to based at least one described speech frame Word class and the non-posterior probability for waking up word class are waken up, the voice signal calling out corresponding to each the second wake-up word is obtained Awake confidence level, comprising:
For each the second wake-up word, is corresponded to based at least one described speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path using the Hidden Markov Model HMM based on viterbi algorithm Score, and the numerical value highest second is waken up into word path score, it is determined as the voice signal and is called out corresponding to described second The wake-up confidence level of awake word.
11. voice awakening method as claimed in claim 9, which is characterized in that corresponded to based at least one described speech frame Word class and the non-posterior probability for waking up word class are waken up, the voice signal calling out corresponding to each the second wake-up word is obtained Awake confidence level, comprising:
For each the second wake-up word, is corresponded to based at least one described speech frame and wake up word class and non-wake-up word class Posterior probability obtain numerical value highest second and wake up word path using the Hidden Markov Model HMM based on viterbi algorithm Score and the highest non-wake-up word path score of numerical value, and by the numerical value it is highest second wake up word path score with it is described The highest non-difference for waking up word path score of numerical value, is determined as the voice signal and corresponds to the described second wake-up for waking up word Confidence level.
12. such as the described in any item voice awakening methods of claim 1-7, which is characterized in that determine and wrapped in the voice signal Word is waken up containing described first, comprising:
Sub-frame processing is carried out to the voice signal, obtains at least one speech frame, and carry out at least one described speech frame Feature extraction obtains the voice feature data of at least one speech frame;
Based on the voice feature data of at least one speech frame, using acoustics identification model, at least one described language is obtained Sound frame, which corresponds to, wakes up word class and the non-posterior probability for waking up word class;
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, obtains institute's predicate Sound signal corresponds to the described first wake-up confidence level for waking up word;
When detecting that the voice signal corresponds to the wake-up confidence level of the first wake-up word not less than threshold wake-up value, institute is determined Word is waken up comprising described first in predicate sound signal.
13. voice awakening method as claimed in claim 12, which is characterized in that corresponded to based at least one described speech frame Word class and the non-posterior probability for waking up word class are waken up, the voice signal is obtained and corresponds to the described first wake-up for waking up word Confidence level, comprising:
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains numerical value highest first and wakes up word path score than the Hidden Markov Model HMM of algorithm;
The numerical value highest first is waken up into word path score, is determined as the voice signal and corresponds to the first wake-up word Wake-up confidence level.
14. voice awakening method as claimed in claim 12, which is characterized in that corresponded to based at least one described speech frame Word class and the non-posterior probability for waking up word class are waken up, the voice signal is obtained and corresponds to the described first wake-up for waking up word Confidence level, comprising:
Corresponded to based at least one described speech frame and wake up word class and the non-posterior probability for waking up word class, using based on dimension Spy obtains the highest first wake-up word path score of numerical value and numerical value is highest non-than the Hidden Markov Model HMM of algorithm Wake up word path score;
The numerical value highest first is waken up into word path score and the highest non-difference for waking up word path score of the numerical value, It is determined as the voice signal and corresponds to the described first wake-up confidence level for waking up word.
15. a kind of voice Rouser characterized by comprising
Signal acquiring unit, for obtaining voice signal to be processed;
First judging unit judges whether that having passed through the first wake-up word wakes up terminal for recording information according to the wake-up of preservation Equipment, wherein the record information that wakes up is the flag information that characterization executes wake operation using the first wake-up word;
First wakeup unit, if determining that waking up the terminal by the first wake-up word sets for first judging unit It is standby, it is determined that when waking up word comprising second in the voice signal, to wake up word by described second and wake up the terminal device;
Second wakeup unit, if determining that not waking up the terminal by the first wake-up word sets for first judging unit It is standby, it is determined that when waking up word comprising described first in the voice signal, the terminal to be waken up by the first wake-up word and is set It is standby.
16. voice Rouser as claimed in claim 15, which is characterized in that further include:
Second judgment unit, for first wakeup unit determine in the voice signal comprising second wake up word before, Information is recorded according to the wake-up of preservation, determines the time interval after waking up the terminal device by the first wake-up word It is interior, it receives characterization and opens the simple voice signal for waking up word mode.
17. voice Rouser as claimed in claim 16, which is characterized in that second wakeup unit is also used to:
If the second judgment unit records information according to the wake-up of preservation, determines and waken up described in word wake-up by described first In time interval after terminal device, characterization is not received and opens the simple voice signal for waking up word mode, it is determined that institute's predicate When waking up word comprising described first in sound signal, word is waken up by described first and wakes up the terminal device.
18. voice Rouser as claimed in claim 15, which is characterized in that further include:
Third judging unit, for first wakeup unit determine in the voice signal comprising second wake up word before, Information is recorded according to the wake-up of preservation, determines to wake up word by described first and wakes up the time interval after the terminal device and setting It fixes time in range.
19. voice Rouser as claimed in claim 18, which is characterized in that second wakeup unit is also used to:
If the third judging unit records information according to the wake-up of preservation, determine that waking up word by described first wakes up the end Time interval after end equipment is not within the scope of setting time, it is determined that wakes up word comprising described first in the voice signal When, word, which is waken up, by described first wakes up the terminal device.
20. the voice Rouser as described in claim 15,17 or 19, which is characterized in that further include:
Information updating unit, for second wakeup unit by described first wake up word wake up the terminal device it Afterwards, the flag information that wake operation is currently executed according to second wakeup unit updates the wake-up record information of preservation.
21. a kind of voice wake-up device characterized by comprising memory, processor and be stored on the memory and can The computer program run on the processor, the processor realize such as claim 1- when executing the computer program 14 described in any item voice awakening methods.
22. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer to refer to It enables, such as claim 1-14 described in any item voice awakening methods is realized when the computer instruction is executed by processor.
CN201910887674.0A 2019-09-19 2019-09-19 Voice wake-up method, device, equipment and medium Active CN110534102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910887674.0A CN110534102B (en) 2019-09-19 2019-09-19 Voice wake-up method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910887674.0A CN110534102B (en) 2019-09-19 2019-09-19 Voice wake-up method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN110534102A true CN110534102A (en) 2019-12-03
CN110534102B CN110534102B (en) 2020-10-30

Family

ID=68669399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910887674.0A Active CN110534102B (en) 2019-09-19 2019-09-19 Voice wake-up method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110534102B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091839A (en) * 2020-03-20 2020-05-01 深圳市友杰智新科技有限公司 Voice awakening method and device, storage medium and intelligent device
CN111768783A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Voice interaction control method, device, electronic equipment, storage medium and system
CN111883121A (en) * 2020-07-20 2020-11-03 北京声智科技有限公司 Awakening method and device and electronic equipment
CN112037786A (en) * 2020-08-31 2020-12-04 百度在线网络技术(北京)有限公司 Voice interaction method, device, equipment and storage medium
CN113096651A (en) * 2020-01-07 2021-07-09 北京地平线机器人技术研发有限公司 Voice signal processing method and device, readable storage medium and electronic equipment
EP3923275A1 (en) * 2020-06-12 2021-12-15 Beijing Xiaomi Pinecone Electronics Co., Ltd. Device wakeup method and apparatus, electronic device, and storage medium
CN115605949A (en) * 2020-10-30 2023-01-13 谷歌有限责任公司(Us) Simultaneous acoustic event detection across multiple assistant devices

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971678A (en) * 2013-01-29 2014-08-06 腾讯科技(深圳)有限公司 Method and device for detecting keywords
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
US20170025124A1 (en) * 2014-10-09 2017-01-26 Google Inc. Device Leadership Negotiation Among Voice Interface Devices
CN106448663A (en) * 2016-10-17 2017-02-22 海信集团有限公司 Voice wakeup method and voice interaction device
CN106898352A (en) * 2017-02-27 2017-06-27 联想(北京)有限公司 Sound control method and electronic equipment
CN107221326A (en) * 2017-05-16 2017-09-29 百度在线网络技术(北京)有限公司 Voice awakening method, device and computer equipment based on artificial intelligence
CN107919124A (en) * 2017-12-22 2018-04-17 北京小米移动软件有限公司 Equipment awakening method and device
CN108122563A (en) * 2017-12-19 2018-06-05 北京声智科技有限公司 Improve voice wake-up rate and the method for correcting DOA
CN109036412A (en) * 2018-09-17 2018-12-18 苏州奇梦者网络科技有限公司 voice awakening method and system
CN109065044A (en) * 2018-08-30 2018-12-21 出门问问信息科技有限公司 Wake up word recognition method, device, electronic equipment and computer readable storage medium
US20190051307A1 (en) * 2017-08-14 2019-02-14 Lenovo (Singapore) Pte. Ltd. Digital assistant activation based on wake word association
CN109358751A (en) * 2018-10-23 2019-02-19 北京猎户星空科技有限公司 A kind of wake-up control method of robot, device and equipment
WO2019079974A1 (en) * 2017-10-24 2019-05-02 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for uninterrupted application awakening and speech recognition
CN109817200A (en) * 2019-01-30 2019-05-28 北京声智科技有限公司 The optimization device and method that voice wakes up

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971678A (en) * 2013-01-29 2014-08-06 腾讯科技(深圳)有限公司 Method and device for detecting keywords
US20170025124A1 (en) * 2014-10-09 2017-01-26 Google Inc. Device Leadership Negotiation Among Voice Interface Devices
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
CN106448663A (en) * 2016-10-17 2017-02-22 海信集团有限公司 Voice wakeup method and voice interaction device
CN106898352A (en) * 2017-02-27 2017-06-27 联想(北京)有限公司 Sound control method and electronic equipment
CN107221326A (en) * 2017-05-16 2017-09-29 百度在线网络技术(北京)有限公司 Voice awakening method, device and computer equipment based on artificial intelligence
US20190051307A1 (en) * 2017-08-14 2019-02-14 Lenovo (Singapore) Pte. Ltd. Digital assistant activation based on wake word association
WO2019079974A1 (en) * 2017-10-24 2019-05-02 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for uninterrupted application awakening and speech recognition
CN108122563A (en) * 2017-12-19 2018-06-05 北京声智科技有限公司 Improve voice wake-up rate and the method for correcting DOA
CN107919124A (en) * 2017-12-22 2018-04-17 北京小米移动软件有限公司 Equipment awakening method and device
CN109065044A (en) * 2018-08-30 2018-12-21 出门问问信息科技有限公司 Wake up word recognition method, device, electronic equipment and computer readable storage medium
CN109036412A (en) * 2018-09-17 2018-12-18 苏州奇梦者网络科技有限公司 voice awakening method and system
CN109358751A (en) * 2018-10-23 2019-02-19 北京猎户星空科技有限公司 A kind of wake-up control method of robot, device and equipment
CN109817200A (en) * 2019-01-30 2019-05-28 北京声智科技有限公司 The optimization device and method that voice wakes up

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AH XING: "Compact wake-up word speech recognition on embedded platforms", 《APPLIED MECHANICS AND MATERIALS, 2014 - TRANS TECH PUBL》 *
VETON KËPUSKA: "Improving Wake-Up-Word and General Speech Recognition Systems", 《2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING》 *
郑志辉: "基于语音实现人机对话的空调控制器研究开发", 《2018年中国家用电器技术大会论文集》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096651A (en) * 2020-01-07 2021-07-09 北京地平线机器人技术研发有限公司 Voice signal processing method and device, readable storage medium and electronic equipment
CN111091839A (en) * 2020-03-20 2020-05-01 深圳市友杰智新科技有限公司 Voice awakening method and device, storage medium and intelligent device
EP3923275A1 (en) * 2020-06-12 2021-12-15 Beijing Xiaomi Pinecone Electronics Co., Ltd. Device wakeup method and apparatus, electronic device, and storage medium
US11665644B2 (en) 2020-06-12 2023-05-30 Beijing Xiaomi Pinecone Electronics Co., Ltd. Device wakeup method and apparatus, electronic device, and storage medium
CN111768783A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Voice interaction control method, device, electronic equipment, storage medium and system
CN111768783B (en) * 2020-06-30 2024-04-02 北京百度网讯科技有限公司 Voice interaction control method, device, electronic equipment, storage medium and system
CN111883121A (en) * 2020-07-20 2020-11-03 北京声智科技有限公司 Awakening method and device and electronic equipment
CN112037786A (en) * 2020-08-31 2020-12-04 百度在线网络技术(北京)有限公司 Voice interaction method, device, equipment and storage medium
CN115605949A (en) * 2020-10-30 2023-01-13 谷歌有限责任公司(Us) Simultaneous acoustic event detection across multiple assistant devices

Also Published As

Publication number Publication date
CN110534102B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN110534102A (en) A kind of voice awakening method, device, equipment and medium
CN106611597B (en) Voice awakening method and device based on artificial intelligence
CN110890093B (en) Intelligent equipment awakening method and device based on artificial intelligence
CN108320733B (en) Voice data processing method and device, storage medium and electronic equipment
CN102270450B (en) System and method of multi model adaptation and voice recognition
CN110517664B (en) Multi-party identification method, device, equipment and readable storage medium
JP2020531898A (en) Voice emotion detection methods, devices, computer equipment, and storage media
US7386443B1 (en) System and method for mobile automatic speech recognition
KR20190042918A (en) Electronic device and operating method thereof
CN111667818B (en) Method and device for training wake-up model
CN106297777A (en) Method and device for awakening voice service
US20150348542A1 (en) Speech recognition method and system based on user personalized information
CN108182937A (en) Keyword recognition method, device, equipment and storage medium
CN110751260B (en) Electronic device, task processing method and neural network training method
CN103456305A (en) Terminal and speech processing method based on multiple sound collecting units
CN103903627A (en) Voice-data transmission method and device
CN105122354A (en) Speech model retrieval in distributed speech recognition systems
CN112397083A (en) Voice processing method and related device
KR20200074690A (en) Electonic device and Method for controlling the electronic device thereof
KR20190066401A (en) Electronic device for network setup of external device and operating method thereof
US20210210109A1 (en) Adaptive decoder for highly compressed grapheme model
CN108899028A (en) Voice awakening method, searching method, device and terminal
CN109240641A (en) Audio method of adjustment, device, electronic equipment and storage medium
CN107808662B (en) Method and device for updating grammar rule base for speech recognition
CN111386566A (en) Device control method, cloud device, intelligent device, computer medium and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant