CN109273007A - Voice awakening method and device - Google Patents
Voice awakening method and device Download PDFInfo
- Publication number
- CN109273007A CN109273007A CN201811184504.8A CN201811184504A CN109273007A CN 109273007 A CN109273007 A CN 109273007A CN 201811184504 A CN201811184504 A CN 201811184504A CN 109273007 A CN109273007 A CN 109273007A
- Authority
- CN
- China
- Prior art keywords
- word
- wake
- acoustic feature
- network
- waking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000002618 waking effect Effects 0.000 claims abstract description 66
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000003780 insertion Methods 0.000 claims description 8
- 230000037431 insertion Effects 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 7
- 230000001052 transient effect Effects 0.000 claims description 5
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000012512 characterization method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 239000000945 filler Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000037007 arousal Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the present invention provides a kind of voice awakening method and device, belongs to technical field of voice recognition.This method comprises: obtaining the acoustic feature for waking up word in voice data;Acoustic feature is input to wake-up and determines network, output wake-up determines to determine that result is used to indicate whether to wake up successfully as a result, waking up, and waking up judgement network is obtained based on the training of sample acoustic feature, wakes up and determines that network is used to carry out confidence declaration to wake-up word.The embodiment of the present invention wakes up the acoustic feature of word by obtaining in voice data.Acoustic feature is input to wake-up and determines network, output, which wakes up, to be determined to determine that result is used to indicate whether to wake up successfully as a result, waking up.Due to that can determine that network carries out wake-up judgement by waking up under the customized any wake-up word of user, and without relying on fixed preset threshold, so that wake-up success rate can be improved, the applicable scene of wakeup process is more extensive.
Description
Technical field
The present embodiments relate to technical field of voice recognition more particularly to a kind of voice awakening methods and device.
Background technique
With the development of smart home, voice arousal function is more more and more universal.Voice wakes up mainly by understanding user
Voice data, to wake up intelligent terminal.At present when realizing that voice wakes up, usually distinguish according in wake-up word identification process
The corresponding acoustics likelihood score for waking up word path and the path filler;If acoustics likelihood ratio is greater than fixed preset threshold, really
The recognition result for recognizing wake-up word is credible, and successfully wakes up intelligent terminal.Since preset threshold is fixed, if waking up password
Variation, then preset threshold may not be able to be suitable for the decision process of current awake language, to reduce wake-up success rate.
Summary of the invention
To solve the above-mentioned problems, the embodiment of the present invention provides one kind and overcomes the above problem or at least be partially solved
State the voice awakening method and device of problem.
According to a first aspect of the embodiments of the present invention, a kind of voice awakening method is provided, comprising:
Obtain the acoustic feature that word is waken up in voice data;
Acoustic feature is input to wake-up and determines network, output, which wakes up, to be determined to determine that result is used to indicate and is as a result, waking up
No to wake up successfully, waking up judgement network is obtained based on the training of sample acoustic feature, is waken up and is determined that network is used for wake-up word
Carry out confidence declaration.
Method provided in an embodiment of the present invention wakes up the acoustic feature of word by obtaining in voice data.By acoustic feature
It is input to wake-up and determines network, output, which wakes up, to be determined to determine that result is used to indicate whether to wake up successfully as a result, waking up.Due to
Under the customized any wake-up word in family, it can determine that network carry out wake-up judgement by waking up, and without relying on fixed pre-
If threshold value, so that wake-up success rate can be improved, the applicable scene of wakeup process is more extensive.
According to a second aspect of the embodiments of the present invention, a kind of voice Rouser is provided, comprising:
Module is obtained, for obtaining the acoustic feature for waking up word in voice data;
Output module determines network for acoustic feature to be input to wake-up, and output, which wakes up, to be determined to determine knot as a result, waking up
Fruit is used to indicate whether to wake up successfully, wakes up and determines that network is obtained based on the training of sample acoustic feature, wakes up and determine network
For carrying out confidence declaration to wake-up word.
According to a third aspect of the embodiments of the present invention, a kind of electronic equipment is provided, comprising:
At least one processor;And
At least one processor being connect with processor communication, in which:
Memory is stored with the program instruction that can be executed by processor, and the instruction of processor caller is able to carry out first party
Voice awakening method provided by any possible implementation in the various possible implementations in face.
According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium, non-transient computer are provided
Readable storage medium storing program for executing stores computer instruction, and computer instruction makes the various possible implementations of computer execution first aspect
In voice awakening method provided by any possible implementation.
It should be understood that above general description and following detailed description be it is exemplary and explanatory, can not
Limit the embodiment of the present invention.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of voice awakening method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram for waking up word and identifying network provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram for waking up identification network provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of voice Rouser provided in an embodiment of the present invention;
Fig. 5 is the block diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
With the development of smart home, voice arousal function is more more and more universal.Voice wakes up mainly by understanding user
Voice data, to wake up intelligent terminal.At present when realizing that voice wakes up, usually distinguish according in wake-up word identification process
It is corresponding to wake up word and the non-acoustics likelihood score for waking up word;Calculate the ratio waken up between word and the non-acoustics likelihood score for waking up word
Obtain waking up the acoustics likelihood ratio of word;If acoustics likelihood ratio is greater than fixed preset threshold, the recognition result for waking up word is confirmed
It is credible, and successfully wake up intelligent terminal.Since preset threshold is fixed, changed if waking up password, preset threshold
It may not be able to be suitable for the decision process of current awake language, to reduce wake-up success rate.
For example, if the wake-up password of current preset is " ding-dong ding-dong ", by that will wake up the acoustics of word " ding-dong ding-dong " seemingly
Right ratio is compared with preset threshold, and can the wake-up password that can accurately determine that user tells wake up intelligent terminal.If
User's voluntarily customized new wake-up password, such as " hello, small to fly ", then fixed preset threshold may not be suitable for user and make by oneself
The wake-up password of justice, so that wake-up success rate can be reduced.
For said circumstances, the embodiment of the invention provides a kind of voice awakening methods.It should be noted that this method can
Applied to the intelligent terminal with arousal function, such as intelligent sound box, wearable device or intelligent appliance.Referring to Fig. 1, this method packet
It includes but is not limited to:
101, the acoustic feature that word is waken up in voice data is obtained.
Before executing 101, voice data can be input to and wake up word identification network, to identify wake-up word.Specifically,
Wake up identification network can be the Keyword and Filler network based on Hidden Markov Model, the network as shown in Fig. 2,
Contain the approach of Keyword and Filler.Wherein, the non-wake-up word path of Filler path representation, other than waking up word
Vocabulary is included in the path Filler.While identification wakes up word, the acoustic feature for waking up word can be also extracted simultaneously.
102, acoustic feature is input to wake-up and determines network, output, which wakes up, to be determined to determine result for referring to as a result, waking up
Show and whether wake up success, wakes up and determine that network is obtained based on the training of sample acoustic feature, wake up and determine that network is used for calling out
Word of waking up carries out confidence declaration.
Before executing 102, it can train to obtain wake-up judgement network.Specifically, can by the sample acoustic feature of positive example and
The sample acoustic feature of counter-example determines that network is trained to initial, determines network to obtain waking up.Wherein, the sample sound of positive example
It learns feature and refers to that the acoustic feature for waking up word waken up that can succeed, the sample acoustic feature of counter-example refer to waking up failure
The non-acoustic feature for waking up word.Content based on the above embodiment wakes up as a kind of alternative embodiment and determines that network can be
Encoder-decoder model.Correspondingly, initially determine that network may include the end encoder and the end decoder, waking up word can wrap
Include multiple sub- words.The end encoder can be using Recognition with Recurrent Neural Network, the long Series Modelings network such as memory network in short-term, will be each
The acoustic feature list entries modeled network of sub- word, can get the characterization vector of every sub- word, then by the characterization of every sub- word to
Amount access attention mechanism to obtain the weighing factor of the characterization vector to wake-up word of every sub- word, and then obtains whole word
Vector is characterized, then is made decisions according to this.
It wakes up and determines that the structure of network can refer to Fig. 3, for encoder encoder, x1~xT indicates the acoustics of each sub- word
The feature vector of feature, h1~hT are characterization vector of the feature vector of sub- word after Series Modeling network, at, 1~at, T
It is each characterization vector by the weight obtained after attention mechanism, ct is the characterization vector of whole word.For decoder decoder,
History decoding result before yt-1 expression.Yt indicates final decoding result, that is, is used to indicate whether to wake up successfully.St-1 is indicated
The average information of decoding process before, st indicate the average information in this decoding process.Net is determined it should be noted that waking up
Network uses encoder-decoder model, can satisfy the wake-up word demand of different length.
Method provided in an embodiment of the present invention wakes up the acoustic feature of word by obtaining in voice data.By acoustic feature
It is input to wake-up and determines network, output, which wakes up, to be determined to determine that result is used to indicate whether to wake up successfully as a result, waking up.Due to
Under the customized any wake-up word in family, it can determine that network carry out wake-up judgement by waking up, and without relying on fixed pre-
If threshold value, so that wake-up success rate can be improved, the applicable scene of wakeup process is more extensive.
Content based on the above embodiment, as a kind of alternative embodiment, acoustic feature includes in following five kinds of information
At least any one, five kinds of information are respectively to wake up the non-wake-up for waking up word score, waking up word neutron word of word neutron word below
Word score wakes up the corresponding frame number of word neutron word, wakes up score distribution and wake-up word neutron that word neutron word corresponds to acoustic feature
The insertion feature of word.
Five kinds of information provided in an embodiment of the present invention, the execution moment for obtaining five kinds of information can be for by acoustic feature
It is input to wake-up and determines network, output wakes up before determining result, can also be further determined as when identification wakes up word, simultaneously
Five kinds of information are obtained, the present invention is not especially limit this.In order to make it easy to understand, it is existing by taking sub- word is phoneme as an example,
Above-mentioned five kinds of information is illustrated respectively:
(1) the wake-up word score of word neutron word is waken up;
Specifically, the acoustic score that can get every sub- word according to identification decoding, by the wake-up word score of every sub- word into
Row splicing obtains the wake-up word score of one-dimensional scalar namely sub- word.For example, waking up word is " hello, small to fly ", wake up word includes altogether
4 voice units, it is assumed that each voice unit uses 2 phonemic representations, then entirely wakes up word totally 8 phonemes, and can obtain length
The sub- word acoustic score sequence that degree is 8, namely wake up the wake-up word score of word neutron word.
(2) the non-wake-up word score of word neutron word is waken up;
Similarly, the filler acoustic score that can get every sub- word according to identification decoding, by the non-wake-up word of every sub- word
Score is spliced to obtain the non-wake-up word score of one-dimensional scalar namely sub- word.Equally can obtain for " hello, small fly "
The non-wake-up word acoustic score sequence that length is 8 is obtained, namely wakes up the non-wake-up word score of word neutron word.
(3) the corresponding frame number of word neutron word is waken up;
Since sub- word accounts for certain duration, and duration can be indicated with frame number, wake up each son in word to can determine
The corresponding frame number of word.For example, equally if each voice unit includes two sub- words, to come to 8 for " hello, small fly "
A sub- word.After determining the frame number that each sub- word occupies, the frame number sequence that length is 8 can be obtained.
(4) the score distribution that word neutron word corresponds to acoustic feature is waken up;
As shown in the above, the process for waking up the acoustic score of word neutron word is obtained, can be inputted by acoustic feature
Network is determined to waking up, and output wakes up and executes before determining result.Content based on the above embodiment, as a kind of optional implementation
Example, the score distribution mode that the embodiment of the present invention does not correspond to acoustic feature to acquisition wake-up word neutron word are made specifically to limit, including
But it is not limited to: determines that the corresponding acoustic feature of any sub- word is subordinated to the probability value of each example phoneme, and as any sub- word
The score distribution of corresponding acoustic feature.
Wherein, example phoneme refer to currently can exhaustion all phonemes.By taking Chinese as an example, the example phoneme of Chinese can
Think initial consonant and simple or compound vowel of a Chinese syllable, the total quantity of example factor comes to 83.For waking up any sub- word in word, the sub- word is corresponding
Acoustic feature may be subordinated to a certain example phoneme in 83 example phonemes, to can determine the corresponding acoustic feature of the sub- word
A possibility that being subordinated to each example phoneme, i.e., indicated with probability value.For any sub- word, " being subordinated to " refers to the sub- word pair
The acoustic feature answered matches with example element.
(5) the insertion feature of word neutron word is waken up;
It since sub- word is phoneme, and wakes up word and is made of multiple sub- words, every sub- word is that determining and example phoneme is
Can be exhausted, match to can determine and wake up each sub- word in word with 83 example phoneme which example factors.Based on upper
Content is stated, insertion feature is to be used to indicate this matching relationship.Specifically, insertion feature can be one hot vector, lead to
One hot vector is crossed to indicate to wake up the matching relationship in word between each sub- word and all example factors.For example, if waking up word
In first sub- word be to be matched with second example phoneme, then can obtain the corresponding one hot vector of this first sub- word be (0,
1,0....0).Wherein, 1 81 0 are followed by.Similarly, if waking up second sub- word in word is and first example phoneme
Match, then can obtain the corresponding one hot vector of this second sub- word is (1,0,0....0).Wherein, 1 82 0 are followed by.
It should be noted that the insertion feature of sub- word can be in preposition wake-up identification process by waking up identification network
It obtains, namely before executing the embodiment of the present invention, can also be identified by the decision procedure based on acoustics likelihood ratio by waking up
Network carries out wake-up identification.When carrying out waking up identification by waking up identification network, it can simultaneously obtain and wake up the embedding of word neutron word
Enter feature.For example, if voice data is " hello, small to fly ", by waking up in identification network " hello, small to fly " corresponding wake-up word
Behind path, obtained path confidence level can be higher than the path confidence level obtained by other wake-up word paths, and thus can
Obtain waking up the insertion feature of word neutron word.
Method provided in an embodiment of the present invention, by the way that at least one of five kinds of information information input to wake-up is determined net
Network, output, which wakes up, determines result.Due to that can determine that network is called out by waking up under the customized any wake-up word of user
It wakes up and determines, and without relying on fixed preset threshold, so that wake-up success rate can be improved, the applicable scene of wakeup process is more
Extensively.
Content based on the above embodiment, as a kind of alternative embodiment, sub- word not any to determination of the embodiment of the present invention
The mode that corresponding acoustic feature is subordinated to the probability value of each example phoneme specifically limits, including but not limited to: calculating and appoints
Each frame is subordinated to the probability value of each example phoneme in one sub- word, obtains the corresponding probability value sequence of each frame;According to any
The totalframes that sub- word includes, it is regular to the progress of each frame corresponding probability value sequence, obtain the corresponding acoustic feature of any sub- word
It is subordinated to the probability value of each example phoneme.
For example, for any sub- word, if the sub- word includes two frames, for first frame acoustic feature, it may be determined that first frame
Acoustic feature is subordinated to the probability value of each example phoneme.By taking the quantity of example factor is 83 as an example, then first frame is available
83 probability values are calculated as A=(0.01,0.02 ..., 0.3).Similarly, for the second frame acoustic feature, it may be determined that the second frame acoustics
Feature is subordinated to the probability value of each example phoneme, can be calculated as B=(0.03,0.04 ..., 0.1).Since the sub- word includes
Totalframes is 2, then can the corresponding probability value sequence of each frame carry out it is regular, also will two sequences be added and be averaged, can
It obtains the corresponding acoustic feature of the sub- word and is subordinated to the probability value of each example phoneme to be (0.02,0.03 ..., 0.2) C=.
Method provided in an embodiment of the present invention is subordinated to each example by calculating each frame acoustic feature in any sub- word
The probability value of phoneme obtains the corresponding probability value sequence of each frame.It is corresponding to each frame according to the totalframes that any sub- word includes
Probability value sequence carry out it is regular, obtain the probability value that the corresponding acoustic feature of any sub- word is subordinated to each example phoneme.By
In under the customized any wake-up word of user, it can determine that network carries out wake-up judgement by waking up, and without relying on solid
Fixed preset threshold, so that wake-up success rate can be improved, the applicable scene of wakeup process is more extensive.
Content based on the above embodiment, as a kind of alternative embodiment, sub- word be state, single-tone element or triphones or
Person's syllable.Wherein, waking up each voice unit in word may include multiple phonemes, and each phoneme may include multiple states.Alternatively,
Each voice unit may include multiple triphones, and each triphones may include multiple states.
Content based on the above embodiment determines net acoustic feature is input to wake-up as a kind of alternative embodiment
Network, output wake up before determining result, can also determine to wake up whether word meets preset condition.The embodiment of the present invention is not to judgement
The mode whether wake-up word meets preset condition specifically limits, including but not limited to: obtaining the acoustics likelihood ratio for waking up word, sentences
Whether disconnected acoustics likelihood ratio meets preset condition.Correspondingly, the embodiment of the present invention is not input to wake-up judgement to by acoustic feature
Network, output, which wakes up, determines that the mode of result specifically limits, including but not limited to: if acoustics likelihood ratio meets preset condition,
Acoustic feature is then input to wake-up and determines network, output, which wakes up, determines result.
Content based on the above embodiment, as a kind of alternative embodiment, preset condition is acoustics likelihood ratio less than default
Threshold value is in preset threshold section.
Wherein, when preset condition is that acoustics likelihood ratio is less than preset threshold, then in the actual implementation process, if acoustics is seemingly
So than be less than preset threshold, then can be performed wake up decision process, if acoustics likelihood ratio be greater than preset threshold, can determine wake-up at
Function.Certainly, preset condition can also be greater than preset threshold for acoustics likelihood ratio, when preset condition is acoustics likelihood ratio greater than default
When threshold value, then in the actual implementation process, if acoustics likelihood ratio is greater than preset threshold, it can be performed and wake up decision process.If sound
It learns likelihood ratio and is less than preset threshold, then can directly determine wake-up failure.When preset condition is that acoustics likelihood ratio is greater than preset threshold
When, it is equivalent to and wake-up is realized by confidence process twice.
When preset condition is that acoustics likelihood ratio is in pre-set interval, if acoustics likelihood ratio is in pre-set interval (a, b),
If the acoustics likelihood ratio for the wake-up word being then actually calculated is in (a, b), decision process can be waken up executing.If practical
The acoustics likelihood ratio for the wake-up word being calculated is less than preset threshold, then can determine wake-up failure.If what is be actually calculated calls out
The acoustics likelihood ratio of awake word is greater than preset threshold, then can determine and wake up successfully.
Method provided in an embodiment of the present invention wakes up the acoustics likelihood ratio of word by obtaining, whether judges acoustics likelihood ratio
Meet preset condition.If acoustics likelihood ratio meets preset condition, acoustic feature is input to wake-up and determines network, output wakes up
Determine result.It due to can first determine to wake up the acoustics likelihood ratio of word, then executes wake-up and determines, to be determined by secondary
Journey can be improved and wake up success rate and rate of precision.
It should be noted that above-mentioned all alternative embodiments, can form optional implementation of the invention using any combination
Example, this is no longer going to repeat them.
Content based on the above embodiment, the embodiment of the invention provides a kind of voice Rouser, the device is for holding
Voice awakening method in row above method embodiment.Referring to fig. 4, which includes: to obtain module 401 and output module 402;
Wherein,
Module 401 is obtained, for obtaining the acoustic feature for waking up word in voice data;
Output module 402 determines network for acoustic feature to be input to wake-up, and output, which wakes up, to be determined to sentence as a result, waking up
Determine result to be used to indicate whether to wake up successfully, wakes up and determine that network is obtained based on the training of sample acoustic feature.
Content based on the above embodiment, as a kind of alternative embodiment, acoustic feature includes in following five kinds of information
At least any one, five kinds of information are respectively to wake up the non-wake-up for waking up word score, waking up word neutron word of word neutron word below
Word score wakes up the corresponding frame number of word neutron word, wakes up score distribution and wake-up word neutron that word neutron word corresponds to acoustic feature
The insertion feature of word.
Content based on the above embodiment, as a kind of alternative embodiment, acoustic feature includes waking up word neutron word to correspond to
The score of acoustic feature is distributed;Correspondingly, the device further include:
Determining module, for determining that the corresponding acoustic feature of any sub- word is subordinated to for waking up any sub- word in word
The probability value of each example phoneme, and be distributed as the score that any sub- word corresponds to acoustic feature.
Content based on the above embodiment, as a kind of alternative embodiment, determining module is every in any sub- word for calculating
One frame acoustic feature is subordinated to the probability value of each example phoneme, obtains the corresponding probability value sequence of each frame;According to any son
The totalframes that word includes, to the corresponding probability value sequence of each frame carry out it is regular, obtain the corresponding acoustic feature of any sub- word from
Belong to the probability value of each example phoneme.
Content based on the above embodiment, as a kind of alternative embodiment, sub- word be state, single-tone element, triphones or
Syllable.
Content based on the above embodiment, as a kind of alternative embodiment, the device further include:
Judgment module judges whether acoustics likelihood ratio meets preset condition for obtaining the acoustics likelihood ratio for waking up word;Phase
Ying Di, output module 402, for when acoustics likelihood ratio meets preset condition, then acoustic feature being input to wake-up and determining net
Network, output, which wakes up, determines result.
Content based on the above embodiment, as a kind of alternative embodiment, preset condition is acoustics likelihood ratio less than default
Threshold value is in preset threshold section.
Content based on the above embodiment wakes up as a kind of alternative embodiment and determines that network is encoder-decoder
Model.
Device provided in an embodiment of the present invention wakes up the acoustic feature of word by obtaining in voice data.By acoustic feature
It is input to wake-up and determines network, output, which wakes up, to be determined to determine that result is used to indicate whether to wake up successfully as a result, waking up.Due to
Under the customized any wake-up word in family, it can determine that network carry out wake-up judgement by waking up, and without relying on fixed pre-
If threshold value, so that wake-up success rate can be improved, the applicable scene of wakeup process is more extensive.
Fig. 5 illustrates the entity structure schematic diagram of a kind of electronic equipment, as shown in figure 5, the electronic equipment may include: place
Manage device (processor) 510, communication interface (Communications Interface) 520,530 He of memory (memory)
Communication bus 540, wherein processor 510, communication interface 520, memory 530 complete mutual lead to by communication bus 540
Letter.Processor 510 can call the logical order in memory 530, to execute following method: obtaining in voice data and wake up word
Acoustic feature;Acoustic feature is input to wake-up and determines network, output, which wakes up, to be determined to determine that result is used to indicate as a result, waking up
Whether wake-up is successful, wakes up and determines that network is obtained based on the training of sample acoustic feature, wakes up judgement network and be used for wake-up
Word carries out confidence declaration.
In addition, the logical order in above-mentioned memory 530 can be realized by way of SFU software functional unit and conduct
Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally
Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention
The form of software product embodies, which is stored in a storage medium, including some instructions to
So that a computer equipment (can be personal computer, electronic equipment or the network equipment etc.) executes each reality of the present invention
Apply all or part of the steps of a method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program,
The computer program is implemented to carry out the various embodiments described above offer method when being executed by processor, for example, obtain voice
The acoustic feature of word is waken up in data;Acoustic feature is input to wake-up and determines network, output, which wakes up, to be determined to determine as a result, waking up
As a result it is used to indicate whether to wake up successfully, wakes up and determine that network is obtained based on the training of sample acoustic feature, wake up and determine net
Network is used to carry out confidence declaration to wake-up word.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (11)
1. a kind of voice awakening method characterized by comprising
Obtain the acoustic feature that word is waken up in voice data;
The acoustic feature is input to wake-up and determines network, output, which wakes up, to be determined as a result, the wake-up determines result for referring to
Show and whether wake up success, the wake-up determines that network is obtained based on the training of sample acoustic feature, and the wake-up determines network
For carrying out confidence declaration to the wake-up word.
2. the method according to claim 1, wherein the acoustic feature include in following five kinds of information at least
Any one, following five kinds of information are respectively the wake-up word score for waking up word neutron word, the wake-up word neutron word
Non- wake-up word score, described wake up the corresponding frame number of word neutron word, the score for waking up word neutron word and corresponding to acoustic feature
Distribution and the insertion feature for waking up word neutron word.
3. according to the method described in claim 2, it is characterized in that, the acoustic feature includes that the wake-up word neutron word is corresponding
The score of acoustic feature is distributed;Correspondingly, described that the acoustic feature is input to wake-up judgement network, output, which wakes up, determines knot
Before fruit, further includes:
For any sub- word in the wake-up word, determine that the corresponding acoustic feature of any sub- word is subordinated to each example sound
The probability value of element, and be distributed as the score that any sub- word corresponds to acoustic feature.
4. according to the method described in claim 3, it is characterized in that, the corresponding acoustic feature of the determination any sub- word from
Belong to the probability value of each example phoneme, comprising:
The probability value that each frame acoustic feature in any sub- word is subordinated to each example phoneme is calculated, it is corresponding to obtain each frame
Probability value sequence;
It is regular to the progress of each frame corresponding probability value sequence according to the totalframes that any sub- word includes, obtain described appoint
The corresponding acoustic feature of one sub- word is subordinated to the probability value of each example phoneme.
5. method according to any one of claim 2 to 4, which is characterized in that the sub- word is state, single-tone element, three
Phoneme or syllable.
6. the method according to claim 1, wherein described be input to wake-up judgement net for the acoustic feature
Network, output wake up before determining result, further includes:
The acoustics likelihood ratio for waking up word is obtained, judges whether the acoustics likelihood ratio meets preset condition;
Correspondingly, described that the acoustic feature is input to wake-up judgement network, output, which wakes up, determines result, comprising:
If the acoustics likelihood ratio meets preset condition, the acoustic feature is input to wake-up and determines network, output wakes up
Determine result.
7. according to the method described in claim 6, it is characterized in that, the preset condition is the acoustics likelihood ratio less than default
Threshold value is in preset threshold section.
8. the method according to claim 1, wherein the wake-up determines that network is encoder-decoder mould
Type.
9. a kind of voice Rouser characterized by comprising
Module is obtained, for obtaining the acoustic feature for waking up word in voice data;
Output module determines network for the acoustic feature to be input to wake-up, and output, which wakes up, to be determined as a result, the wake-up is sentenced
Determine result to be used to indicate whether to wake up successfully, the wake-up determines that network is obtained based on the training of sample acoustic feature, described
It wakes up and determines that network is used to carry out confidence declaration to the wake-up word.
10. a kind of electronic equipment characterized by comprising
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy
Enough execute method as described in any of the claims 1 to 8.
11. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Computer instruction is stored up, the computer instruction makes the computer execute method as described in any of the claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811184504.8A CN109273007B (en) | 2018-10-11 | 2018-10-11 | Voice wake-up method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811184504.8A CN109273007B (en) | 2018-10-11 | 2018-10-11 | Voice wake-up method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109273007A true CN109273007A (en) | 2019-01-25 |
CN109273007B CN109273007B (en) | 2022-05-17 |
Family
ID=65196523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811184504.8A Active CN109273007B (en) | 2018-10-11 | 2018-10-11 | Voice wake-up method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109273007B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110310628A (en) * | 2019-06-27 | 2019-10-08 | 百度在线网络技术(北京)有限公司 | Wake up optimization method, device, equipment and the storage medium of model |
CN110473539A (en) * | 2019-08-28 | 2019-11-19 | 苏州思必驰信息科技有限公司 | Promote the method and apparatus that voice wakes up performance |
CN110473536A (en) * | 2019-08-20 | 2019-11-19 | 北京声智科技有限公司 | A kind of awakening method, device and smart machine |
CN111243604A (en) * | 2020-01-13 | 2020-06-05 | 苏州思必驰信息科技有限公司 | Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system |
CN111489737A (en) * | 2020-04-13 | 2020-08-04 | 深圳市友杰智新科技有限公司 | Voice command recognition method and device, storage medium and computer equipment |
CN111696562A (en) * | 2020-04-29 | 2020-09-22 | 华为技术有限公司 | Voice wake-up method, device and storage medium |
CN111862963A (en) * | 2019-04-12 | 2020-10-30 | 阿里巴巴集团控股有限公司 | Voice wake-up method, device and equipment |
CN112164395A (en) * | 2020-09-18 | 2021-01-01 | 北京百度网讯科技有限公司 | Vehicle-mounted voice starting method and device, electronic equipment and storage medium |
CN112669818A (en) * | 2020-12-08 | 2021-04-16 | 北京地平线机器人技术研发有限公司 | Voice wake-up method and device, readable storage medium and electronic equipment |
CN113327618A (en) * | 2021-05-17 | 2021-08-31 | 西安讯飞超脑信息科技有限公司 | Voiceprint distinguishing method and device, computer equipment and storage medium |
CN113327617A (en) * | 2021-05-17 | 2021-08-31 | 西安讯飞超脑信息科技有限公司 | Voiceprint distinguishing method and device, computer equipment and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140214416A1 (en) * | 2013-01-30 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and system for recognizing speech commands |
CN105096939A (en) * | 2015-07-08 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice wake-up method and device |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN106910496A (en) * | 2017-02-28 | 2017-06-30 | 广东美的制冷设备有限公司 | Intelligent electrical appliance control and device |
CN107123417A (en) * | 2017-05-16 | 2017-09-01 | 上海交通大学 | Optimization method and system are waken up based on the customized voice that distinctive is trained |
CN107134279A (en) * | 2017-06-30 | 2017-09-05 | 百度在线网络技术(北京)有限公司 | A kind of voice awakening method, device, terminal and storage medium |
CN107369439A (en) * | 2017-07-31 | 2017-11-21 | 北京捷通华声科技股份有限公司 | A kind of voice awakening method and device |
CN107644638A (en) * | 2017-10-17 | 2018-01-30 | 北京智能管家科技有限公司 | Audio recognition method, device, terminal and computer-readable recording medium |
CN107767861A (en) * | 2016-08-22 | 2018-03-06 | 科大讯飞股份有限公司 | voice awakening method, system and intelligent terminal |
CN107871506A (en) * | 2017-11-15 | 2018-04-03 | 北京云知声信息技术有限公司 | The awakening method and device of speech identifying function |
CN107871499A (en) * | 2017-10-27 | 2018-04-03 | 珠海市杰理科技股份有限公司 | Audio recognition method, system, computer equipment and computer-readable recording medium |
CN108198548A (en) * | 2018-01-25 | 2018-06-22 | 苏州奇梦者网络科技有限公司 | A kind of voice awakening method and its system |
CN108281137A (en) * | 2017-01-03 | 2018-07-13 | 中国科学院声学研究所 | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system |
CN108335696A (en) * | 2018-02-09 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | Voice awakening method and device |
CN108564941A (en) * | 2018-03-22 | 2018-09-21 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, equipment and storage medium |
-
2018
- 2018-10-11 CN CN201811184504.8A patent/CN109273007B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140214416A1 (en) * | 2013-01-30 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and system for recognizing speech commands |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN105096939A (en) * | 2015-07-08 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice wake-up method and device |
CN107767861A (en) * | 2016-08-22 | 2018-03-06 | 科大讯飞股份有限公司 | voice awakening method, system and intelligent terminal |
CN108281137A (en) * | 2017-01-03 | 2018-07-13 | 中国科学院声学研究所 | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system |
CN106910496A (en) * | 2017-02-28 | 2017-06-30 | 广东美的制冷设备有限公司 | Intelligent electrical appliance control and device |
CN107123417A (en) * | 2017-05-16 | 2017-09-01 | 上海交通大学 | Optimization method and system are waken up based on the customized voice that distinctive is trained |
CN107134279A (en) * | 2017-06-30 | 2017-09-05 | 百度在线网络技术(北京)有限公司 | A kind of voice awakening method, device, terminal and storage medium |
CN107369439A (en) * | 2017-07-31 | 2017-11-21 | 北京捷通华声科技股份有限公司 | A kind of voice awakening method and device |
CN107644638A (en) * | 2017-10-17 | 2018-01-30 | 北京智能管家科技有限公司 | Audio recognition method, device, terminal and computer-readable recording medium |
CN107871499A (en) * | 2017-10-27 | 2018-04-03 | 珠海市杰理科技股份有限公司 | Audio recognition method, system, computer equipment and computer-readable recording medium |
CN107871506A (en) * | 2017-11-15 | 2018-04-03 | 北京云知声信息技术有限公司 | The awakening method and device of speech identifying function |
CN108198548A (en) * | 2018-01-25 | 2018-06-22 | 苏州奇梦者网络科技有限公司 | A kind of voice awakening method and its system |
CN108335696A (en) * | 2018-02-09 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | Voice awakening method and device |
CN108564941A (en) * | 2018-03-22 | 2018-09-21 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, equipment and storage medium |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111862963B (en) * | 2019-04-12 | 2024-05-10 | 阿里巴巴集团控股有限公司 | Voice wakeup method, device and equipment |
CN111862963A (en) * | 2019-04-12 | 2020-10-30 | 阿里巴巴集团控股有限公司 | Voice wake-up method, device and equipment |
CN110310628A (en) * | 2019-06-27 | 2019-10-08 | 百度在线网络技术(北京)有限公司 | Wake up optimization method, device, equipment and the storage medium of model |
CN110473536A (en) * | 2019-08-20 | 2019-11-19 | 北京声智科技有限公司 | A kind of awakening method, device and smart machine |
CN110473539B (en) * | 2019-08-28 | 2021-11-09 | 思必驰科技股份有限公司 | Method and device for improving voice awakening performance |
CN110473539A (en) * | 2019-08-28 | 2019-11-19 | 苏州思必驰信息科技有限公司 | Promote the method and apparatus that voice wakes up performance |
CN111243604A (en) * | 2020-01-13 | 2020-06-05 | 苏州思必驰信息科技有限公司 | Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system |
CN111489737A (en) * | 2020-04-13 | 2020-08-04 | 深圳市友杰智新科技有限公司 | Voice command recognition method and device, storage medium and computer equipment |
CN111489737B (en) * | 2020-04-13 | 2020-11-10 | 深圳市友杰智新科技有限公司 | Voice command recognition method and device, storage medium and computer equipment |
CN111696562B (en) * | 2020-04-29 | 2022-08-19 | 华为技术有限公司 | Voice wake-up method, device and storage medium |
CN111696562A (en) * | 2020-04-29 | 2020-09-22 | 华为技术有限公司 | Voice wake-up method, device and storage medium |
CN112164395A (en) * | 2020-09-18 | 2021-01-01 | 北京百度网讯科技有限公司 | Vehicle-mounted voice starting method and device, electronic equipment and storage medium |
CN112669818A (en) * | 2020-12-08 | 2021-04-16 | 北京地平线机器人技术研发有限公司 | Voice wake-up method and device, readable storage medium and electronic equipment |
CN112669818B (en) * | 2020-12-08 | 2022-12-02 | 北京地平线机器人技术研发有限公司 | Voice wake-up method and device, readable storage medium and electronic equipment |
CN113327618A (en) * | 2021-05-17 | 2021-08-31 | 西安讯飞超脑信息科技有限公司 | Voiceprint distinguishing method and device, computer equipment and storage medium |
CN113327617A (en) * | 2021-05-17 | 2021-08-31 | 西安讯飞超脑信息科技有限公司 | Voiceprint distinguishing method and device, computer equipment and storage medium |
CN113327617B (en) * | 2021-05-17 | 2024-04-19 | 西安讯飞超脑信息科技有限公司 | Voiceprint discrimination method, voiceprint discrimination device, computer device and storage medium |
CN113327618B (en) * | 2021-05-17 | 2024-04-19 | 西安讯飞超脑信息科技有限公司 | Voiceprint discrimination method, voiceprint discrimination device, computer device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109273007B (en) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109273007A (en) | Voice awakening method and device | |
CN110838289B (en) | Wake-up word detection method, device, equipment and medium based on artificial intelligence | |
CN109036384B (en) | Audio recognition method and device | |
CN108831439B (en) | Voice recognition method, device, equipment and system | |
CN107767863B (en) | Voice awakening method and system and intelligent terminal | |
CN108615525B (en) | Voice recognition method and device | |
CN108899013B (en) | Voice search method and device and voice recognition system | |
CN110534099A (en) | Voice wakes up processing method, device, storage medium and electronic equipment | |
CN110428820B (en) | Chinese and English mixed speech recognition method and device | |
CN108564940A (en) | Audio recognition method, server and computer readable storage medium | |
CN106940998A (en) | A kind of execution method and device of setting operation | |
CN110085261A (en) | A kind of pronunciation correction method, apparatus, equipment and computer readable storage medium | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
CN105845133A (en) | Voice signal processing method and apparatus | |
CN110298463A (en) | Meeting room preordering method, device, equipment and storage medium based on speech recognition | |
WO2022010471A1 (en) | Identification and utilization of misrecognitions in automatic speech recognition | |
CN113096647B (en) | Voice model training method and device and electronic equipment | |
CN109741735A (en) | The acquisition methods and device of a kind of modeling method, acoustic model | |
CN109074809B (en) | Information processing apparatus, information processing method, and computer-readable storage medium | |
CN110853669A (en) | Audio identification method, device and equipment | |
CN112669818B (en) | Voice wake-up method and device, readable storage medium and electronic equipment | |
CN114171002A (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN111179941B (en) | Intelligent device awakening method, registration method and device | |
CN111640423B (en) | Word boundary estimation method and device and electronic equipment | |
CN113409768A (en) | Pronunciation detection method, pronunciation detection device and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190617 Address after: 710000 Yunhui Valley D Block 101, No. 156 Tiangu Eighth Road, Software New City, Xi'an High-tech Zone, Xi'an High-tech Zone, Xi'an City, Shaanxi Province Applicant after: Xi'an Xunfei Super Brain Information Technology Co., Ltd. Address before: 230088 666 Wangjiang West Road, Hefei hi tech Development Zone, Anhui Applicant before: Iflytek Co., Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |