CN106448663A - Voice wakeup method and voice interaction device - Google Patents
Voice wakeup method and voice interaction device Download PDFInfo
- Publication number
- CN106448663A CN106448663A CN201610901706.4A CN201610901706A CN106448663A CN 106448663 A CN106448663 A CN 106448663A CN 201610901706 A CN201610901706 A CN 201610901706A CN 106448663 A CN106448663 A CN 106448663A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- acoustic model
- wake
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000003993 interaction Effects 0.000 title claims abstract description 19
- 230000002452 interceptive effect Effects 0.000 claims description 14
- 230000007704 transition Effects 0.000 claims description 14
- 238000007476 Maximum Likelihood Methods 0.000 claims description 13
- 230000002618 waking effect Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 description 26
- 230000006870 function Effects 0.000 description 7
- 230000001960 triggered effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 2
- 230000000284 resting effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Circuit For Audible Band Transducer (AREA)
- Telephone Function (AREA)
Abstract
The present invention provides a voice wake-up method and a voice interaction device. The method includes the following steps that: voice input signals are received; the first similarity of the voice input signals and preset wake-up voice signals is determined according to a first acoustic model, and whether the first similarity exceeds a first preset threshold value is judged; and if the first similarity exceeds the first preset threshold value, second similarity between the speech input signals and the preset wake-up voice signals is determined according to a second acoustic model, and whether the second similarity exceeds a second preset threshold value is judged, if the second similarity exceeds the second preset threshold value, a voice interaction function is awaken, wherein the accuracy of the second acoustic model is higher than the accuracy of the first acoustic model. The voice wake-up method and the voice interaction device provided by the embodiment of the invention have the advantages of low power consumption and low wrong wake-up rate.
Description
Technical field
The present embodiments relate to technical field of voice recognition, more particularly, to a kind of voice awakening method and interactive voice dress
Put.
Background technology
Developing rapidly with speech recognition technology, the application scenarios of interactive voice are more and more universal, intelligent television, intelligence
Vehicle-mounted, smart home, intelligent robot be all interactive voice application main application scenarios, simultaneously because man-machine interaction for
The requirement more and more higher of family experience, the distance of man-machine voiced interaction is also increasingly not limited to closely say (within 50cm).Lead to now
Excessive microphone techniques, have been able to realize the remote speech interaction in 3-5 rice.
Meanwhile, remote speech interaction there is also an issue, is exactly when to start to trigger voice radio reception simultaneously
And start to identify.Current technology scheme has two kinds, and one kind is with a low-power chip, receives all the time by microphone array
Sound, after doing corresponding signal processing (signal enhancing, noise suppressed, echo cancellor), then does speech recognition again, judges that user is
No say wake-up word, if, then notify primary module, start radio reception and simultaneously carry out speech recognition, also a kind of mode is front end
Module only do signal processing, radio reception always is come by primary module, and does speech recognition to judge whether user says wake-up word, but
It is that both modes have drawback, former mode requires low-power consumption due to front end processing block, so recognition performance comes relatively
Saying can be relatively low, and false wake-up rate also can be higher simultaneously;And the problem of latter scheme is main chip module needs full speed running always,
Power consumption can ratio larger, and because the requirement to main chip module is higher, the cost of scheme is also higher.There is no at present and take into account
Power consumption and the scheme of false wake-up rate.
Content of the invention
The embodiment of the present invention provides a kind of voice awakening method and voice interaction device, cannot be simultaneous in order to solve prior art
Turn round and look at the problem of power consumption and false wake-up rate.
Embodiment of the present invention first aspect provides a kind of voice awakening method, and the method includes:
Receive voice input signal;
According to the first acoustic model, determine the first phase between described voice input signal and default wake-up voice signal
Like degree, and judge described first similarity whether more than the first predetermined threshold value;
If exceeding, according to the second acoustic model, determine described voice input signal and default wake-up voice signal it
Between the second similarity, and judge described second similarity whether more than the second predetermined threshold value, wherein, described second acoustic model
Accuracy be higher than described first acoustic model accuracy;
If exceeding, wake up voice interactive function.
Embodiment of the present invention second aspect provides a kind of voice interaction device, and this device includes:
Receiver module, for receiving voice input signal;
First determining module, for according to the first acoustic model, determining described voice input signal and default wake-up language
The first similarity between message number, and judge described first similarity whether more than the first predetermined threshold value;
Second determining module, for when described first similarity exceedes described first predetermined threshold value, according to the second acoustics
Model, determines the second similarity between described voice input signal and default wake-up voice signal, and judges described second
Whether more than the second predetermined threshold value, wherein, the accuracy of described second acoustic model is higher than described first acoustic model to similarity
Accuracy;
Wake module, for when described second similarity is more than the second predetermined threshold value, waking up voice interactive function.
The embodiment of the present invention, first pass through the first relatively low acoustic model of accuracy voice input signal is carried out preliminary
Voice wakes up identification, when identifying that the similarity between voice input signal and default wake-up voice signal is default more than first
During threshold value, then second voice wake-up identification is carried out by higher second acoustic model of accuracy to voice input signal, thus
Result according to second identification, it is determined whether wake up voice interactive function.Due to, in first time identification process, using
The relatively low acoustic model of accuracy, therefore, the power consumption in first time identification process is relatively low.And only ought be identified by for the first time,
When i.e. the similarity between voice input signal and default wake-up voice signal is more than the first predetermined threshold value, just enable accuracy
The second higher acoustic model carries out second wake-up identification.So pass through by acoustic model relatively low for accuracy and accuracy relatively
High acoustic model is used in combination, it is to avoid when low accuracy acoustic model is used alone, and it is relatively low to wake up recognition accuracy, calls out by mistake
The higher problem of awake rate, when being also avoided that high accuracy acoustic model is used alone simultaneously, the higher problem of power consumption, and then reach
Take into account power consumption and the purpose of low false wake-up rate.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this
Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, also may be used
So that other accompanying drawings are obtained according to these accompanying drawings.
The schematic flow sheet of the voice awakening method that Fig. 1 provides for one embodiment of the invention;
The Organization Chart of the voice interaction device that Fig. 2 provides for one embodiment of the invention;
The structural representation of the voice interaction device that Fig. 3 provides for one embodiment of the invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work
Embodiment, broadly falls into the scope of protection of the invention.
The term " comprising " and " having " of description and claims of this specification and their any deformation it is intended that
It is to cover non-exclusive comprising, for example, the device of the process or structure that contain series of steps is not necessarily limited to clearly arrange
Those structures going out or step but may include clearly not listing or for the intrinsic other steps of these processes or device
Rapid or structure.
The schematic flow sheet of the voice awakening method that Fig. 1 provides for one embodiment of the invention, the method can be by such as intelligence
Can TV, intelligent vehicle-carried, smart home, intelligent robot etc. has the voice interaction device of voice interactive function to execute.As
Shown in Fig. 1, the method that the present embodiment provides comprises the steps:
Step S101, reception voice input signal.
In practical application, voice interaction device can be by the microphone array that is disposed thereon come receive user or terminal
The voice signal of equipment input, and the voice signal receiving is guaranteed after receiving voice signal by time delay equalization
Integrity, it is to avoid due to missing part of speech signal, and to wake up judgement impact.
Further, obtain this enforcement by pretreatment is carried out to this voice signal after obtaining complete voice signal
" voice input signal " alleged by example.Specifically, in preprocessing process, at least voice signal to be carried out at noise suppressed
Reason, echo cancellation process and sound enhancement process, wherein, above-mentioned process is similar with speech processes process in prior art, at this
In repeat no more.
Step S102, according to the first acoustic model, determine described voice input signal and default wake-up voice signal it
Between the first similarity, and whether judge described first similarity more than the first predetermined threshold value, if not less than terminating this and call out
Wake up and operate, if exceeding, execution step S103.
Wherein, this first predetermined threshold value can by user according to the actual requirements self-defined setting it is also possible to by terminal unit
Default setting, the embodiment of the present invention is not construed as limiting to this.
Particularly, the voice awakening method providing in the present embodiment includes differentiating twice process, wherein, judges for the first time
Journey, can be executed by a DSP module.In first time judge process, the phonetic entry that obtains from step S101 first
In signal, extract characteristic signal.For example, it is possible to be obtained by way of the mel-frequency cepstrum coefficient extracting voice input signal
Take characteristic signal, this process is same as the prior art, repeats no more here.
Further, in actual applications, can in DSP module built-in one simple acoustic model, by should
Acoustic model does decoding process to the characteristic signal of above-mentioned acquisition, and calculates judging characteristic signal using maximum likelihood ratio and call out
Similarity between awake voice signal, its ultimate principle is will to preset in each characteristic point in characteristic signal and acoustic model
Each characteristic point waking up voice signal carries out similarity-rough set, then draws a maximum likelihood value by comprehensive for all of point,
Formula is:
Wherein, xiIt is the sample value of ith feature point in characteristic signal, μ is the value in model, θ calculates for needs
Maximum likelihood value, calculated between current speech input signal and default wake-up voice signal by this maximum likelihood value
Similarity.Wherein, when calculating the similarity obtaining more than preset first threshold value, then unlatching wakes up for second and judges, otherwise
Terminate wake operation.In the present embodiment, DSP module carries out to voice input signal waking up the process of judgement and existing skill for the first time
Art is similar to, and repeats no more here.
Need exist for illustrating, use better simply acoustic model because first time wakes up judge process, therefore,
Requirement to DSP module is relatively low, and the power consumption of DSP module is relatively low.
Certainly above are only and illustrate, rather than the unique restriction to the present invention, for example, in actual applications can also
To calculate the similarity of two sections of voices using the method for packet window DTW, but its maximum problem is the pronunciation wind of voice
Lattice difference can have a strong impact on the discrimination of voice.
Step S103, according to the second acoustic model, determine described voice input signal and default wake-up voice signal it
Between the second similarity, and whether judging described second similarity more than the second predetermined threshold value, if exceeding, waking up interactive voice
Function, does not otherwise wake up.Wherein, the accuracy of described second acoustic model is higher than the accuracy of described first acoustic model.
In the present embodiment, waking up judgement second can be executed by a master chip processing module.Calling out through for the first time
Wake up after judging, if the similarity between voice input signal and default wake-up voice signal exceedes preset first threshold value,
Activation master chip processing module, and then master chip processing module obtains features described above signal from DSP module, and built-in according to it
The higher acoustic model (i.e. the second acoustic model) of accuracy and above-mentioned acquisition characteristic signal, determine voice input signal with
The second similarity between default wake-up voice signal.Further, after obtaining the second similarity, the obtaining will be calculated
Two similarities are contrasted with the second predetermined threshold value, when the second similarity is more than the second predetermined threshold value, wake up interactive voice work(
Can, otherwise do not wake up.
It should be noted that not determining between voice input signal and default wake-up voice signal in DSP module
When similarity exceedes preset first threshold value, master chip processing module is in unactivated state, that is, master chip processing module be in low
Power consumption working condition or resting state;When DSP module determines between voice input signal and default wake-up voice signal
Similarity when exceeding preset first threshold value, corresponding for this voice signal characteristic signal is sent to master chip and processes by DSP module
Module, and then activate master chip processing module.
Particularly, in the present embodiment, wake up the method judging for second different with the method that first time wake-up judges, its
Difference is:Wake up for second and judge to use complicated similarity decoding algorithm, such as Vetebi, it is that a kind of dynamic programming is calculated
Method, can calculate the state relation relation in front and back of voice signal content, and wake up for the first time and judge it is static calculating similarity side
Method, only calculates the maximum likelihood value of each sampled point, both acoustic models are also different simultaneously, the right and wrong in DSP module
Often simple, easily calculate the simple acoustic model processing, in master chip processing module is more complicated, and precision is higher
Complicated acoustic model.
As an example it is assumed that the wake-up word in wake-up voice is " Vidaa, Vidaa ", the calculating process in DSP module
In it is believed that being that this section of speech decomposition is become 256 sampled points, then by maximum likelihood value-based algorithm come Integrated comparative this
In 256 points, the coincidence probability of the maximum likelihood value between the voice input signal that value in acoustic model and collection are come in, be
A kind of static computational methods, as long as such as it is considered that this probability reaches 70%, being considered as user and be possible to sentence "
Vidaa Vidaa”;
Then start second to wake up and judge, voice input signal can be led with waking up voice signal by master chip processing module
Enter the HMM acoustic model of the high accuracy training, high robust, and calculate voice input signal with Veterbi algorithm and call out
Similarity between awake voice signal, this algorithm is dynamic planning algorithm, is to calculate in voice signal each point and front
The transition probability of pronunciation unit afterwards, because when people speaks, the pronunciation of each word is continuous, and this is determined by vocal cords, because
This each phonetic or factor pronunciation characteristic office have determined the transition probability that each is put in front and back, and this part amount of calculation is larger, accuracy
Also very high, therefore, if the similarity calculated of Veterbi more than the second predetermined threshold value (such as 90%) then it is assumed that being to use
" Vidaa Vidaa " the words has veritably been said at family.Certainly above are only and illustrate, be not the unique limit to the present invention
Fixed.
Need exist for illustrating, in the present embodiment, the purpose that second wakes up identification is that voice input signal is entered
Row more accurately identifies, it is to avoid the generation of false wake-up.Therefore, in actual applications, the setting of the second predetermined threshold value should be greater than
Or it is equal to the first predetermined threshold value.
The present embodiment, first passes through the first relatively low acoustic model of accuracy and carries out preliminary voice to voice input signal
Wake up identification, when identifying the similarity between voice input signal and default wake-up voice signal more than the first predetermined threshold value
When, then second voice wake-up identification is carried out by higher second acoustic model of accuracy to voice input signal, thus according to
The result of second identification, it is determined whether wake up voice interactive function.Due in first time identification process, using accurately
Spend relatively low acoustic model, therefore, the power consumption in first time identification process is relatively low.And only ought be identified by for the first time, i.e. language
When similarity between sound input signal and default wake-up voice signal is more than the first predetermined threshold value, just enable accuracy higher
The second acoustic model carry out second wake-up identification.So passing through will be higher to acoustic model relatively low for accuracy and accuracy
Acoustic model is used in combination, it is to avoid when low accuracy acoustic model is used alone, and it is relatively low to wake up recognition accuracy, false wake-up rate
Higher problem, when being also avoided that high accuracy acoustic model is used alone simultaneously, the higher problem of power consumption, and then reached simultaneous
Turn round and look at the purpose of power consumption and low false wake-up rate.
The Organization Chart of the voice interaction device that Fig. 2 provides for one embodiment of the invention, as shown in Fig. 2 interactive voice in Fig. 2
Device includes DSP module and master chip processing module.Wherein, a built-in better simply acoustic model (i.e. accuracy in DSP module
Relatively low acoustic model), it is built-in with an accuracy and the higher acoustic model of robustness in master chip processing module.And master chip
When processing module is not triggered by DSP module, it is in working condition or the resting state of low-power consumption, wherein it is preferred that working as main core
When piece processing module is not triggered by DSP module, master chip processing module in a dormant state, can reduce main core to greatest extent
The power consumption of piece.
In practical application, after microphone array receives voice input signal, DSP module passes through end-point detection (voice
Activity detection, abbreviation VAD) to determine whether voice signal input, such as can in short-term can using existing
Amount and the algorithm of short-time zero-crossing rate, the application in the present embodiment of this algorithm is identical with application in the prior art, here not
Repeat again.After the completion of end-point detection, need to carry out a time delay equalization, to guarantee the complete of voice input signal.Right
Before voice input signal carries out signal processing, need completely to preserve this section of voice input signal, in case being sent to cloud
End server is identified.Signal processing at least includes noise suppressed process, echo cancellation process and sound enhancement process.
In practical application, noise suppressed processes and can carry out on the basis of multi-filter combination.Echo cancellation process and sound strengthen
The execution method processing is same as the prior art, repeats no more here.
Further, after completing above-mentioned signal processing, from voice input signal, first extract characteristic signal, further according to
One in DSP module simple acoustic model, is decoded processing to extracting the characteristic signal obtaining, and calculates characteristic signal
And default wake up voice signal between similarity, when calculate obtain similarity more than the first predetermined threshold value when, then trigger
Master chip processing module, the wake-up carrying out again judges, otherwise exits this wake operation.Need exist for illustrating, DSP
Module, does preliminary wake-up simply by simple acoustic model and judges, therefore, as long as DSP module is in the building ring of low-power consumption
Under border.
Further, when master chip processing module is triggered, master chip processing module can by its with DSP module it
Between data-interface, obtain DSP module and wake up, first, the characteristic signal obtaining in judge process, and built-in accurate according to it
Spend higher acoustic model and features described above signal carries out second wake-up identification to voice input signal, master chip is processed here
Mould carries out second wake-up, and to know method for distinguishing identical with shown in DSP module Fig. 1 embodiment second wake-up knowledge method for distinguishing,
Repeat no more here.
Framework shown in Fig. 2, using the quick low-power consumption of front end DSP module, does preliminary wake-up to voice input signal
Identification, utilizes the computing resource of DSP module simultaneously, has done a feature extraction, is second wake-up of master chip processing module
Identification saves computing resource, and master chip processing module is before being not received by the trigger of DSP module, always low
Power consumption mode runs, and after being triggered, then utilizes the high storage resource of itself and high computing resource, and DSP module sends over
Characteristic signal, can quickly and efficiently voice input signal be carried out waking up identification, therefore whole framework can take into account low-power consumption
And high-accuracy.
The structural representation of the voice interaction device that Fig. 3 provides for one embodiment of the invention, as shown in figure 3, the present embodiment
The device providing includes:
Receiver module 11, for receiving voice input signal;
First determining module 12, for according to the first acoustic model, determining described voice input signal and default wake-up
The first similarity between voice signal, and judge described first similarity whether more than the first predetermined threshold value;
Second determining module 13, for when described first similarity exceedes described first predetermined threshold value, according to the rising tone
Learn model, determine the second similarity between described voice input signal and default wake-up voice signal, and judge described the
Whether more than the second predetermined threshold value, wherein, the accuracy of described second acoustic model is higher than described first acoustic mode to two similarities
The accuracy of type;
Wake module 14, for when described second similarity is more than the second predetermined threshold value, waking up voice interactive function.
Wherein, described second predetermined threshold value is more than or equal to the first predetermined threshold value.
Described first determining module 12, including:
Acquisition submodule 121, for, from described voice input signal, extracting characteristic signal;
First determination sub-module 122, for according to the first acoustic model and described characteristic signal, determining described characteristic signal
And default wake up voice signal between the first maximum likelihood value;
According to described first maximum likelihood value, determine between described voice input signal and default wake-up voice signal
First similarity.
Described second determining module 13, including:
Second determination sub-module 131, is used for
According to described second acoustic model, determine in described characteristic signal pronunciation unit with its before or after pronunciation unit
Between the first transition probability, and corresponding described wake-up voice signal in pronunciation unit with its before or after pronunciation unit
Between the second transition probability;
According to described first transition probability and described second transition probability, determine described characteristic signal and described wake-up voice
The second similarity between signal.
The voice interaction device that the present embodiment provides, can be used in executing the method shown in Fig. 1, its specific executive mode
Similar with embodiment illustrated in fig. 1 with beneficial effect, repeat no more here.
Finally it should be noted that:Various embodiments above only in order to technical scheme to be described, is not intended to limit;To the greatest extent
Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that:Its according to
So the technical scheme described in foregoing embodiments can be modified, or wherein some or all of technical characteristic is entered
Row equivalent;And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology
The scope of scheme.
Claims (10)
1. a kind of voice awakening method is it is characterised in that include:
Receive voice input signal;
According to the first acoustic model, determine that first between described voice input signal and default wake-up voice signal is similar
Degree, and judge described first similarity whether more than the first predetermined threshold value;
If exceeding, according to the second acoustic model, determine between described voice input signal and default wake-up voice signal
Second similarity, and judge described second similarity whether more than the second predetermined threshold value, wherein, the standard of described second acoustic model
Exactness is higher than the accuracy of described first acoustic model;
If exceeding, wake up voice interactive function.
2. method according to claim 1 is it is characterised in that described second predetermined threshold value is more than the described first default threshold
Value.
3. method according to claim 2 it is characterised in that described according to the first acoustic model, determine that described voice is defeated
Enter the first similarity between signal and default wake-up voice signal, including:
From described voice input signal, extract characteristic signal;
According to the first acoustic model and described characteristic signal, determine between described characteristic signal and default wake-up voice signal
First maximum likelihood value;
According to described first maximum likelihood value, determine first between described voice input signal and default wake-up voice signal
Similarity.
4. method according to claim 3 is it is characterised in that when described first similarity exceedes described first predetermined threshold value
When, described according to the second acoustic model, determine the second phase between described voice input signal and default wake-up voice signal
Seemingly spend, including:
According to described second acoustic model, determine in described characteristic signal pronunciation unit and before or after it between pronunciation unit
The first transition probability, and pronunciation unit and before or after it between pronunciation unit in corresponding described wake-up voice signal
The second transition probability;
According to described first transition probability and described second transition probability, determine described characteristic signal and described wake-up voice signal
Between the second similarity.
5. the method according to any one of Claims 1 to 4 is it is characterised in that described first acoustic model is arranged on DSP mould
In block, the second described acoustic model is arranged in master chip processing module.
6. a kind of voice interaction device is it is characterised in that include:
Receiver module, for receiving voice input signal;
First determining module, for according to the first acoustic model, determining described voice input signal and default wake-up voice letter
The first similarity between number, and judge described first similarity whether more than the first predetermined threshold value;
Second determining module, for when described first similarity exceedes described first predetermined threshold value, according to the second acoustic model,
Determine the second similarity between described voice input signal and default wake-up voice signal, and judge described second similarity
Whether more than the second predetermined threshold value, wherein, the accuracy of described second acoustic model is higher than the accurate of described first acoustic model
Degree;
Wake module, for when described second similarity is more than the second predetermined threshold value, waking up voice interactive function.
7. device according to claim 6 is it is characterised in that described second predetermined threshold value is more than the first predetermined threshold value.
8. device according to claim 7 is it is characterised in that described first determining module, including:
Acquisition submodule, for, from described voice input signal, extracting characteristic signal;
First determination sub-module, for according to the first acoustic model and described characteristic signal, determining described characteristic signal and presetting
Wake up voice signal between the first maximum likelihood value;
According to described first maximum likelihood value, determine first between described voice input signal and default wake-up voice signal
Similarity.
9. device according to claim 8 is it is characterised in that described second determining module, including:
Second determination sub-module, for according to described second acoustic model, determine in described characteristic signal pronunciation unit with its before
And/or after the first transition probability between pronunciation unit, and in corresponding described wake-up voice signal pronunciation unit with its before
And/or after the second transition probability between pronunciation unit;
According to described first transition probability and described second transition probability, determine described characteristic signal and described wake-up voice signal
Between the second similarity.
10. the device according to any one of claim 6~9 is it is characterised in that described first acoustic model is arranged on DSP
In module, the second described acoustic model is arranged in master chip processing module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610901706.4A CN106448663B (en) | 2016-10-17 | 2016-10-17 | Voice awakening method and voice interaction device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610901706.4A CN106448663B (en) | 2016-10-17 | 2016-10-17 | Voice awakening method and voice interaction device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106448663A true CN106448663A (en) | 2017-02-22 |
CN106448663B CN106448663B (en) | 2020-10-23 |
Family
ID=58174603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610901706.4A Active CN106448663B (en) | 2016-10-17 | 2016-10-17 | Voice awakening method and voice interaction device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106448663B (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239897A (en) * | 2017-05-31 | 2017-10-10 | 中南大学 | A kind of personality occupation type method of testing and system |
CN107396158A (en) * | 2017-08-21 | 2017-11-24 | 深圳创维-Rgb电子有限公司 | A kind of acoustic control interactive device, acoustic control exchange method and television set |
CN107464565A (en) * | 2017-09-20 | 2017-12-12 | 百度在线网络技术(北京)有限公司 | A kind of far field voice awakening method and equipment |
CN107622770A (en) * | 2017-09-30 | 2018-01-23 | 百度在线网络技术(北京)有限公司 | voice awakening method and device |
CN107680591A (en) * | 2017-09-21 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device and its equipment based on car-mounted terminal |
CN107742516A (en) * | 2017-09-29 | 2018-02-27 | 上海与德通讯技术有限公司 | Intelligent identification Method, robot and computer-readable recording medium |
CN107895573A (en) * | 2017-11-15 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | Method and device for identification information |
CN108122563A (en) * | 2017-12-19 | 2018-06-05 | 北京声智科技有限公司 | Improve voice wake-up rate and the method for correcting DOA |
CN108198548A (en) * | 2018-01-25 | 2018-06-22 | 苏州奇梦者网络科技有限公司 | A kind of voice awakening method and its system |
WO2018205083A1 (en) * | 2017-05-08 | 2018-11-15 | 深圳前海达闼云端智能科技有限公司 | Robot wakeup method and device, and robot |
CN108831477A (en) * | 2018-06-14 | 2018-11-16 | 出门问问信息科技有限公司 | A kind of audio recognition method, device, equipment and storage medium |
CN108877788A (en) * | 2017-05-08 | 2018-11-23 | 瑞昱半导体股份有限公司 | Electronic device and its operating method with voice arousal function |
TWI643123B (en) * | 2017-05-02 | 2018-12-01 | 瑞昱半導體股份有限公司 | Electronic device having wake on voice function and operating method thereof |
CN109036428A (en) * | 2018-10-31 | 2018-12-18 | 广东小天才科技有限公司 | A kind of voice wake-up device, method and computer readable storage medium |
CN109065046A (en) * | 2018-08-30 | 2018-12-21 | 出门问问信息科技有限公司 | Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up |
CN109215647A (en) * | 2018-08-30 | 2019-01-15 | 出门问问信息科技有限公司 | Voice awakening method, electronic equipment and non-transient computer readable storage medium |
CN109360550A (en) * | 2018-12-07 | 2019-02-19 | 上海智臻智能网络科技股份有限公司 | Test method, device, equipment and the storage medium of voice interactive system |
CN109785825A (en) * | 2018-12-29 | 2019-05-21 | 广东长虹日电科技有限公司 | A kind of algorithm and storage medium, the electric appliance using it of speech recognition |
CN109979438A (en) * | 2019-04-04 | 2019-07-05 | Oppo广东移动通信有限公司 | Voice awakening method and electronic equipment |
WO2019179285A1 (en) * | 2018-03-22 | 2019-09-26 | 腾讯科技(深圳)有限公司 | Speech recognition method, apparatus and device, and storage medium |
CN110444193A (en) * | 2018-01-31 | 2019-11-12 | 腾讯科技(深圳)有限公司 | The recognition methods of voice keyword and device |
CN110534102A (en) * | 2019-09-19 | 2019-12-03 | 北京声智科技有限公司 | A kind of voice awakening method, device, equipment and medium |
CN110534099A (en) * | 2019-09-03 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Voice wakes up processing method, device, storage medium and electronic equipment |
CN110570873A (en) * | 2019-09-12 | 2019-12-13 | Oppo广东移动通信有限公司 | voiceprint wake-up method and device, computer equipment and storage medium |
CN110706691A (en) * | 2019-10-12 | 2020-01-17 | 出门问问信息科技有限公司 | Voice verification method and device, electronic equipment and computer readable storage medium |
CN110890087A (en) * | 2018-09-10 | 2020-03-17 | 北京嘉楠捷思信息技术有限公司 | Voice recognition method and device based on cosine similarity |
CN110890093A (en) * | 2019-11-22 | 2020-03-17 | 腾讯科技(深圳)有限公司 | Intelligent device awakening method and device based on artificial intelligence |
CN111161714A (en) * | 2019-12-25 | 2020-05-15 | 联想(北京)有限公司 | Voice information processing method, electronic equipment and storage medium |
CN111831201A (en) * | 2020-05-25 | 2020-10-27 | 中国人民解放军陆军军医大学第二附属医院 | Human-computer interaction system and method for automatically detecting bone marrow cell morphology |
CN112259085A (en) * | 2020-09-28 | 2021-01-22 | 上海声瀚信息科技有限公司 | Two-stage voice awakening algorithm based on model fusion framework |
CN112740321A (en) * | 2018-11-20 | 2021-04-30 | 深圳市欢太科技有限公司 | Method and device for waking up equipment, storage medium and electronic equipment |
CN112885353A (en) * | 2021-01-26 | 2021-06-01 | 维沃移动通信有限公司 | Voice wake-up method and device and electronic equipment |
CN113256937A (en) * | 2021-07-07 | 2021-08-13 | 常州分音塔科技有限公司 | Intelligent home nursing method and system based on intelligent detection of audio event |
CN113593561A (en) * | 2021-07-29 | 2021-11-02 | 普强时代(珠海横琴)信息技术有限公司 | Ultra-low power consumption awakening method and device based on multi-stage trigger mechanism |
CN113611304A (en) * | 2021-08-30 | 2021-11-05 | 深圳鱼亮科技有限公司 | Noise reduction mixing system and method based on large-screen voice awakening recognition |
CN113947855A (en) * | 2021-09-18 | 2022-01-18 | 中标慧安信息技术股份有限公司 | Intelligent building personnel safety alarm system based on voice recognition |
CN117012206A (en) * | 2023-10-07 | 2023-11-07 | 山东省智能机器人应用技术研究院 | Man-machine voice interaction system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120316879A1 (en) * | 2008-05-28 | 2012-12-13 | Koreapowervoice Co., Ltd. | System for detecting speech interval and recognizing continous speech in a noisy environment through real-time recognition of call commands |
CN103811003A (en) * | 2012-11-13 | 2014-05-21 | 联想(北京)有限公司 | Voice recognition method and electronic equipment |
CN104143326A (en) * | 2013-12-03 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Voice command recognition method and device |
CN104599667A (en) * | 2015-01-16 | 2015-05-06 | 联想(北京)有限公司 | Information processing method and electronic device |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
US20160253995A1 (en) * | 2013-11-14 | 2016-09-01 | Huawei Technologies Co., Ltd. | Voice recognition method, voice recognition device, and electronic device |
-
2016
- 2016-10-17 CN CN201610901706.4A patent/CN106448663B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120316879A1 (en) * | 2008-05-28 | 2012-12-13 | Koreapowervoice Co., Ltd. | System for detecting speech interval and recognizing continous speech in a noisy environment through real-time recognition of call commands |
CN103811003A (en) * | 2012-11-13 | 2014-05-21 | 联想(北京)有限公司 | Voice recognition method and electronic equipment |
US20160253995A1 (en) * | 2013-11-14 | 2016-09-01 | Huawei Technologies Co., Ltd. | Voice recognition method, voice recognition device, and electronic device |
CN104143326A (en) * | 2013-12-03 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Voice command recognition method and device |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN104599667A (en) * | 2015-01-16 | 2015-05-06 | 联想(北京)有限公司 | Information processing method and electronic device |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10347252B2 (en) | 2017-05-02 | 2019-07-09 | Realtek Semiconductor Corp. | Electronic device with wake on voice function and operation method thereof |
TWI643123B (en) * | 2017-05-02 | 2018-12-01 | 瑞昱半導體股份有限公司 | Electronic device having wake on voice function and operating method thereof |
CN108877788A (en) * | 2017-05-08 | 2018-11-23 | 瑞昱半导体股份有限公司 | Electronic device and its operating method with voice arousal function |
US11276402B2 (en) | 2017-05-08 | 2022-03-15 | Cloudminds Robotics Co., Ltd. | Method for waking up robot and robot thereof |
WO2018205083A1 (en) * | 2017-05-08 | 2018-11-15 | 深圳前海达闼云端智能科技有限公司 | Robot wakeup method and device, and robot |
CN107239897A (en) * | 2017-05-31 | 2017-10-10 | 中南大学 | A kind of personality occupation type method of testing and system |
CN107396158A (en) * | 2017-08-21 | 2017-11-24 | 深圳创维-Rgb电子有限公司 | A kind of acoustic control interactive device, acoustic control exchange method and television set |
CN107464565A (en) * | 2017-09-20 | 2017-12-12 | 百度在线网络技术(北京)有限公司 | A kind of far field voice awakening method and equipment |
CN107464565B (en) * | 2017-09-20 | 2020-08-04 | 百度在线网络技术(北京)有限公司 | Far-field voice awakening method and device |
CN107680591A (en) * | 2017-09-21 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device and its equipment based on car-mounted terminal |
CN107742516B (en) * | 2017-09-29 | 2020-11-17 | 上海望潮数据科技有限公司 | Intelligent recognition method, robot and computer readable storage medium |
CN107742516A (en) * | 2017-09-29 | 2018-02-27 | 上海与德通讯技术有限公司 | Intelligent identification Method, robot and computer-readable recording medium |
CN107622770B (en) * | 2017-09-30 | 2021-03-16 | 百度在线网络技术(北京)有限公司 | Voice wake-up method and device |
CN107622770A (en) * | 2017-09-30 | 2018-01-23 | 百度在线网络技术(北京)有限公司 | voice awakening method and device |
CN107895573A (en) * | 2017-11-15 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | Method and device for identification information |
US10803861B2 (en) | 2017-11-15 | 2020-10-13 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for identifying information |
CN107895573B (en) * | 2017-11-15 | 2021-08-24 | 百度在线网络技术(北京)有限公司 | Method and device for identifying information |
CN108122563B (en) * | 2017-12-19 | 2021-03-30 | 北京声智科技有限公司 | Method for improving voice awakening rate and correcting DOA |
CN108122563A (en) * | 2017-12-19 | 2018-06-05 | 北京声智科技有限公司 | Improve voice wake-up rate and the method for correcting DOA |
CN108198548A (en) * | 2018-01-25 | 2018-06-22 | 苏州奇梦者网络科技有限公司 | A kind of voice awakening method and its system |
US11222623B2 (en) | 2018-01-31 | 2022-01-11 | Tencent Technology (Shenzhen) Company Limited | Speech keyword recognition method and apparatus, computer-readable storage medium, and computer device |
CN110444193A (en) * | 2018-01-31 | 2019-11-12 | 腾讯科技(深圳)有限公司 | The recognition methods of voice keyword and device |
US11450312B2 (en) | 2018-03-22 | 2022-09-20 | Tencent Technology (Shenzhen) Company Limited | Speech recognition method, apparatus, and device, and storage medium |
WO2019179285A1 (en) * | 2018-03-22 | 2019-09-26 | 腾讯科技(深圳)有限公司 | Speech recognition method, apparatus and device, and storage medium |
CN108831477A (en) * | 2018-06-14 | 2018-11-16 | 出门问问信息科技有限公司 | A kind of audio recognition method, device, equipment and storage medium |
CN109215647A (en) * | 2018-08-30 | 2019-01-15 | 出门问问信息科技有限公司 | Voice awakening method, electronic equipment and non-transient computer readable storage medium |
CN109065046A (en) * | 2018-08-30 | 2018-12-21 | 出门问问信息科技有限公司 | Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up |
CN110890087A (en) * | 2018-09-10 | 2020-03-17 | 北京嘉楠捷思信息技术有限公司 | Voice recognition method and device based on cosine similarity |
CN109036428A (en) * | 2018-10-31 | 2018-12-18 | 广东小天才科技有限公司 | A kind of voice wake-up device, method and computer readable storage medium |
CN112740321A (en) * | 2018-11-20 | 2021-04-30 | 深圳市欢太科技有限公司 | Method and device for waking up equipment, storage medium and electronic equipment |
CN109360550A (en) * | 2018-12-07 | 2019-02-19 | 上海智臻智能网络科技股份有限公司 | Test method, device, equipment and the storage medium of voice interactive system |
CN109785825A (en) * | 2018-12-29 | 2019-05-21 | 广东长虹日电科技有限公司 | A kind of algorithm and storage medium, the electric appliance using it of speech recognition |
CN109979438A (en) * | 2019-04-04 | 2019-07-05 | Oppo广东移动通信有限公司 | Voice awakening method and electronic equipment |
CN110534099A (en) * | 2019-09-03 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Voice wakes up processing method, device, storage medium and electronic equipment |
CN110570873B (en) * | 2019-09-12 | 2022-08-05 | Oppo广东移动通信有限公司 | Voiceprint wake-up method and device, computer equipment and storage medium |
CN110570873A (en) * | 2019-09-12 | 2019-12-13 | Oppo广东移动通信有限公司 | voiceprint wake-up method and device, computer equipment and storage medium |
CN110534102B (en) * | 2019-09-19 | 2020-10-30 | 北京声智科技有限公司 | Voice wake-up method, device, equipment and medium |
CN110534102A (en) * | 2019-09-19 | 2019-12-03 | 北京声智科技有限公司 | A kind of voice awakening method, device, equipment and medium |
CN110706691A (en) * | 2019-10-12 | 2020-01-17 | 出门问问信息科技有限公司 | Voice verification method and device, electronic equipment and computer readable storage medium |
CN110890093A (en) * | 2019-11-22 | 2020-03-17 | 腾讯科技(深圳)有限公司 | Intelligent device awakening method and device based on artificial intelligence |
CN110890093B (en) * | 2019-11-22 | 2024-02-09 | 腾讯科技(深圳)有限公司 | Intelligent equipment awakening method and device based on artificial intelligence |
CN111161714A (en) * | 2019-12-25 | 2020-05-15 | 联想(北京)有限公司 | Voice information processing method, electronic equipment and storage medium |
CN111831201A (en) * | 2020-05-25 | 2020-10-27 | 中国人民解放军陆军军医大学第二附属医院 | Human-computer interaction system and method for automatically detecting bone marrow cell morphology |
CN112259085A (en) * | 2020-09-28 | 2021-01-22 | 上海声瀚信息科技有限公司 | Two-stage voice awakening algorithm based on model fusion framework |
CN112885353A (en) * | 2021-01-26 | 2021-06-01 | 维沃移动通信有限公司 | Voice wake-up method and device and electronic equipment |
CN113256937A (en) * | 2021-07-07 | 2021-08-13 | 常州分音塔科技有限公司 | Intelligent home nursing method and system based on intelligent detection of audio event |
CN113593561A (en) * | 2021-07-29 | 2021-11-02 | 普强时代(珠海横琴)信息技术有限公司 | Ultra-low power consumption awakening method and device based on multi-stage trigger mechanism |
CN113611304A (en) * | 2021-08-30 | 2021-11-05 | 深圳鱼亮科技有限公司 | Noise reduction mixing system and method based on large-screen voice awakening recognition |
CN113611304B (en) * | 2021-08-30 | 2024-02-06 | 深圳鱼亮科技有限公司 | Large-screen voice awakening recognition noise reduction mixing system and method |
CN113947855A (en) * | 2021-09-18 | 2022-01-18 | 中标慧安信息技术股份有限公司 | Intelligent building personnel safety alarm system based on voice recognition |
CN117012206A (en) * | 2023-10-07 | 2023-11-07 | 山东省智能机器人应用技术研究院 | Man-machine voice interaction system |
CN117012206B (en) * | 2023-10-07 | 2024-01-16 | 山东省智能机器人应用技术研究院 | Man-machine voice interaction system |
Also Published As
Publication number | Publication date |
---|---|
CN106448663B (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106448663A (en) | Voice wakeup method and voice interaction device | |
CN110428810B (en) | Voice wake-up recognition method and device and electronic equipment | |
CN110838289B (en) | Wake-up word detection method, device, equipment and medium based on artificial intelligence | |
CN103971685B (en) | Method and system for recognizing voice commands | |
CN105632486B (en) | Voice awakening method and device of intelligent hardware | |
US11915699B2 (en) | Account association with device | |
CN110364143B (en) | Voice awakening method and device and intelligent electronic equipment | |
CN107767863B (en) | Voice awakening method and system and intelligent terminal | |
WO2017114201A1 (en) | Method and device for executing setting operation | |
CN104575504A (en) | Method for personalized television voice wake-up by voiceprint and voice identification | |
CN109272991B (en) | Voice interaction method, device, equipment and computer-readable storage medium | |
CN104143326A (en) | Voice command recognition method and device | |
CN104282307A (en) | Method, device and terminal for awakening voice control system | |
CN108766441A (en) | A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition | |
CN111161728B (en) | Awakening method, awakening device, awakening equipment and awakening medium of intelligent equipment | |
CN111462756B (en) | Voiceprint recognition method and device, electronic equipment and storage medium | |
US11308946B2 (en) | Methods and apparatus for ASR with embedded noise reduction | |
CN109697981B (en) | Voice interaction method, device, equipment and storage medium | |
CN103077708A (en) | Method for improving rejection capability of speech recognition system | |
CN103680505A (en) | Voice recognition method and voice recognition system | |
CN111091819A (en) | Voice recognition device and method, voice interaction system and method | |
US11044567B1 (en) | Microphone degradation detection and compensation | |
CN111179944B (en) | Voice awakening and age detection method and device and computer readable storage medium | |
WO2023029615A1 (en) | Wake-on-voice method and apparatus, device, storage medium, and program product | |
CN114708856A (en) | Voice processing method and related equipment thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |