CN105374352A - Voice activation method and system - Google Patents
Voice activation method and system Download PDFInfo
- Publication number
- CN105374352A CN105374352A CN201410418850.3A CN201410418850A CN105374352A CN 105374352 A CN105374352 A CN 105374352A CN 201410418850 A CN201410418850 A CN 201410418850A CN 105374352 A CN105374352 A CN 105374352A
- Authority
- CN
- China
- Prior art keywords
- phoneme
- voice
- judgement
- speech sound
- confidence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004913 activation Effects 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000001629 suppression Effects 0.000 claims abstract description 11
- 238000012706 support-vector machine Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 abstract description 5
- 230000007547 defect Effects 0.000 abstract description 3
- 238000005259 measurement Methods 0.000 abstract 2
- 238000005516 engineering process Methods 0.000 description 5
- 238000004378 air conditioning Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 239000000945 filler Substances 0.000 description 3
- 206010038743 Restlessness Diseases 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 208000035126 Facies Diseases 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005713 exacerbation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000009931 harmful effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000004804 winding Methods 0.000 description 1
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention relates to a voice activation method. The method comprises steps: an acoustic model is built, and decoding network space is built on the basis of the acoustic model; according to a noise environment level, a corresponding silence suppression configuration parameter is selected, and an input voice stream is cut into voice fractions; voice features of the voice fractions are extracted; the voice features are inputted to the decoding network space for decoding and recognition, and recognition voice phonemes are acquired; from all measurements capable of representing a confidence level of a pronouncing unit, a plurality of measurements are selected to serve as a plurality of confidence levels of the recognition voice phonemes; the plurality of confidence levels of the recognition voice phonemes are calculated; secondary judgment, comprising pre-judgment and second judgment, is carried out on the plurality of confidence levels of the recognition voice phonemes, and a final recognition result is outputted. The method overcomes defects existing in a finger starting device, good activation effects can be realized, and convenience is provided for using a voice recognition device by people.
Description
Technical field
The invention belongs to technical field of voice recognition, specifically, the present invention relates to a kind of voice activated method and system.
Background technology
Current speech recognition technology is when being subject to the affecting of the factor such as noise and natural spoken language, and recognition correct rate can seriously reduce.Therefore in daily life, the interactive mode based on continuous speech recognition technology is difficult to realize.Current common solution is the mode opening voice identification equipment adopting finger key, and such user just can carry out voice typing under relatively quiet ambient condition, thus ensures good speech recognition effect, finishing man-machine interaction work then.
The opening ways of finger key can bring various inconvenience to user.First, finger key requires that the distance between user and speech recognition apparatus can not exceed arm, brings operational difficulty can to so far away the or handicapped customer group of distance equipment; Secondly, in the environment of dark, user is not easy the position finding button; Again, finger key is not suitable for the occupied user of both hands, and for user's steering vehicle, now user is inconvenient to use finger key mode.In sum, the dependence of this opening ways of finger key is limited to the promotion and application of speech recognition technology.
Voice activation techniques provides a kind of scheme overcoming above-mentioned defect, contributes to the application and development advancing man-machine interaction.Document [1] (" Wake-Up-WordSpeechRecognition ", VetonKepuska (2011), SpeechTechnologies, Prof.Lvolpsic (Ed.), ISBN:978-953-307-996-7.) the voice activation algorithm based on speech recognition framework is described in, this algorithm does not consider various neighbourhood noise in actual application environment, there is good voice activation performance in environment facies are to quiet laboratory, but be applied to voice activation performance possibility severe exacerbation in the larger environment of ground unrest; And only adopting sorter to carry out judging confidence in document [1], its judgement accuracy places one's entire reliance upon the training sample of sorter, if the improper meeting of training sample selection directly affects voice activation performance.
Summary of the invention
The object of the invention is to overcome the various defects that finger key opening voice identification equipment exists, provide a kind of voice activation this brand-new device start pattern, thus use speech recognition apparatus to provide convenience for people.
To achieve these goals, the invention provides a kind of voice activated method, comprising:
Set up acoustic model, acoustic model basis is set up decoding network space;
According to the silence suppression configuration parameter that noise circumstance hierarchical selection is corresponding, input voice flow is cut into sound bite; Extract the phonetic feature of sound bite; Phonetic feature is inputted decoding network space and carry out decoding identification, obtain and identify phoneme of speech sound; From all tolerance that can characterize pronunciation unit credibility, choose several tolerance as several degree of confidence identifying phoneme of speech sound, calculate several degree of confidence identifying phoneme of speech sound; To identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
In technique scheme, described decoding network space of setting up comprises: the rubbish phoneme sub-network by the rubbish phoneme parallel connection in phone set being circulation, activation word phone string is linked in sequence into the phoneme that the activation word of specifying comprises, then add described rubbish phoneme sub-network at activation word phone string head and the tail, the rubbish phoneme sub-network of head and the tail strides across activation word phone string and is directly connected.
In technique scheme, described noise circumstance grade is: strong noise environment, medium noise circumstance, quiet environment; Noise circumstance grade is classified according to the sound pressure level of neighbourhood noise.
In technique scheme, several degree of confidence of described identification phoneme of speech sound comprise: the phoneme log-likelihood of the regular duration of phoneme, Time alignment, phoneme log posterior probability, duration are the state number of a frame, minimum syllable duration, the total duration of identification voice.
In technique scheme, described pre-judgement comprises:
If the phoneme log posterior probability sub-minimum < first threshold of all identification phoneme of speech sound, then directly judgement is inactive word; If the phoneme number > Second Threshold that the phoneme log posterior probability value of all identification phoneme of speech sound is less than-1, then directly judgement is inactive word; If duration is state number > the 3rd threshold value of a frame, then directly judgement is inactive word; If minimum syllable duration <=the 4th threshold value, then directly judgement is inactive word; If identify that the total duration of voice is less than to identify phoneme of speech sound number * 6 frame or be greater than identification phoneme of speech sound number * 15 frame, then directly judgement is inactive word; Described first threshold, Second Threshold, the 3rd threshold value and the 4th threshold value are preferably obtained by experience and statistical law.
In technique scheme, the judgement of described second time adopts sorter to realize, and described sorter is linear classifier or mixed Gauss model sorter or support vector machine classifier.
In addition, present invention also offers a kind of voice-activation system, described system comprises:
Silence suppression module, for according to silence suppression configuration parameter corresponding to noise circumstance hierarchical selection, is cut into sound bite by the continuous speech stream of collection;
Characteristic extracting module, for extracting the phonetic feature of sound bite;
Acoustic model, for describing the phonetic feature regularity of distribution of each pronunciation unit in acoustic space;
Decoder module, for setting up decoding network space on the basis of acoustic model, Veterbi decoding is carried out to the phonetic feature of sound bite, in decoding network space, find optimum phoneme path as identification voice path, all non-junk phonemes on optimum phoneme path are identification phoneme of speech sound;
Confidence calculations module, for choosing several tolerance as several degree of confidence identifying phoneme of speech sound from all tolerance that can characterize pronunciation unit credibility, calculates several degree of confidence of identification phoneme of speech sound;
Second judgement module, for identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
The invention has the advantages that:
1, by the classification process to noise circumstance, voice-activation system provided by the invention has good robustness in noise circumstance;
2, by setting up specific decoding network space, the ground unrest that exists in actual application environment can be eliminated to the harmful effect of speech recognition performance;
3, by the second judgement to voice identification result, the identification error rate of voice is dropped to minimum, reach excellent voice activation effect;
4, voice-activation system provided by the invention, has broad application prospects in interactively controlling intelligent household appliances, Wearable etc.
Accompanying drawing explanation
Fig. 1 is the building mode schematic diagram in decoding network space of the present invention;
Fig. 2 is the module composition diagram of voice-activation system of the present invention.
Embodiment
Describe further below in conjunction with accompanying drawing and doing specific embodiment of the invention.
Voice activated method provided by the invention comprises the following steps:
Step 1) set up acoustic model;
Phone set comprise 65 Chinese without tuning element, 15 rubbish phonemes (filler), represent the sp phoneme of quiet sil phoneme and expression minibreak; Each phoneme utilizes context extension to become three-tone, and each three-tone is connected by three sequence of states.Described 15 rubbish phonemes are obtained by statistical method, according to obscuring and degree of correlation between each phoneme, all phonemes are gathered into multiple Similarity Class, and each Similarity Class is as a rubbish phoneme.
By the mode of decision tree, identical central phoneme, same position, different contextual state group are carried out cluster, obtain 3970 states, i.e. 3970 unit, each unit is described by the mixed Gauss model (GMM) comprising 8 gaussian component; Based on phone set and 3970 cell formation acoustic models.
Step 2) on acoustic model basis, set up decoding network space;
With reference to figure 1, the mode of setting up in decoding network space for: by step 1) described in 15 rubbish phonemes parallel connections rubbish phoneme sub-network that is circulation, activation word phone string is linked in sequence into the phoneme that the activation word of specifying comprises, then add described rubbish phoneme sub-network at activation word phone string head and the tail, the rubbish phoneme sub-network of head and the tail strides across activation word phone string and is directly connected.
The decoding network space of above-mentioned foundation can complete five class sound bites forces alignment accurately, described five class sound bites are: activate word, the activation word that the activation word of rubbish voice, the activation word of rear belt rubbish voice are with in front portion, all there is rubbish voice front and back, full rubbish voice, this five classes sound bite covers all possible voice to be identified.
For the activation word of specifying be: " your good air-conditioning ", the activation word phone string of series winding is " n-i-h-ao-k-ong-t-iao ".
Step 3) according to VAD (silence suppression) configuration parameter corresponding to noise circumstance hierarchical selection, input voice flow is cut into sound bite; Extract the phonetic feature of sound bite; Phonetic feature is inputted decoding network space and carry out decoding identification, obtain and identify phoneme of speech sound; From all tolerance that can characterize pronunciation unit credibility, choose several tolerance as several degree of confidence identifying phoneme of speech sound, calculate several degree of confidence identifying phoneme of speech sound; To identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
In technique scheme, described step 3) comprise further:
Step 301) according to VAD (silence suppression) configuration parameter corresponding to noise circumstance hierarchical selection, input voice flow is cut into sound bite;
Noise circumstance is divided into Three Estate: strong noise environment, medium noise circumstance, quiet environment, grade is classified according to the sound pressure level of neighbourhood noise, and the computing method of sound pressure level are as follows:
Lp=20*lg(p/p0)
Wherein, Lp is sound pressure level, and unit is decibel; P is acoustic pressure; P0 is reference acoustic pressure, in atmosphere p0=2 × 10
-5.
Noise circumstance grade separation standard is as follows:
According to the VAD configuration parameter that noise circumstance hierarchical selection is corresponding, the continuous speech stream of input is cut into little sound bite, the target of cutting is that the intermediary position of speaking people disconnects, and namely guarantee one section of continuous voice is placed in a sound bite as far as possible.Different VAD configuration parameters can ensure that voice flow cutting does not have notable difference with the fluctuations of neighbourhood noise, obtains sound bite accurately with this, reduces the cut-off phenomenon of complete speech and occurs.
Step 302) extract the phonetic feature of sound bite;
8K sampling rate is adopted to gather voice, voice sub-frame processing adopts 25 milliseconds of windows length, 10 milliseconds of windows to move, extract 12 dimensions PLP (perception linear predictor coefficient) and 1 and tie up energy as the static nature of voice, have employed two jumps and divide parameter extraction 39 dimensional feature as the behavioral characteristics of voice.Have employed HLDA (the linear distinction analysis of Singular variance) technology and static nature and behavioral characteristics are converted to improve the ability of distinguishing characteristic.
Step 303) phonetic feature inputted decoding network space and carry out decoding and identify, obtain and identify phoneme of speech sound;
Decoding identifies and adopts Viterbi (Viterbi) algorithm, and spatially find optimum phoneme path as identification voice path at decoding network, all phonemes optimum phoneme path removed beyond filler are identification phoneme of speech sound.If all identification phoneme of speech sound are all filler, then directly judge to identify that voice are as inactive word, proceed to step 305-3); Otherwise, proceed to step 304).
Step 304) calculate several degree of confidence identifying phoneme of speech sound;
Several described degree of confidence are: the phoneme log-likelihood of the regular duration of phoneme, Time alignment, phoneme log posterior probability, duration are the state number of a frame, minimum syllable duration, the total duration of identification voice.
The computing method of the regular duration of phoneme are as follows:
Wherein, p
ibe i-th and identify phoneme of speech sound; dur
nOR(p
i) be i-th regular duration identifying phoneme of speech sound; Dur (p
i) be i-th duration identifying phoneme of speech sound; S is the total number of phoneme identifying that voice packet contains.
The phoneme log-likelihood value calculating method of Time alignment is as follows:
Wherein, LL
nor(p
i) be i-th Time alignment log-likelihood identifying phoneme of speech sound; P (O|p
i) be i-th likelihood value identifying phoneme of speech sound, lnP (O|p
i) all can obtain in the decoded result of routine.
Phoneme log posterior probability computing method are as follows:
Wherein, GOP (p
i) be i-th phoneme log posterior probability value identifying phoneme of speech sound;
be the likelihood value sum of all phonemes in phone set, Q is above-mentioned steps 1) in phone set.
Step 305) to identifying that several degree of confidence of phoneme of speech sound carry out second judgement, export final recognition result.Comprise:
Step 305-1) to identifying that several degree of confidence of phoneme of speech sound are adjudicated in advance, if court verdict is inactive word, proceed to 305-3); Otherwise, proceed to 305-2);
Described pre-judgement comprises:
If the phoneme log posterior probability sub-minimum < first threshold of all identification phoneme of speech sound, then directly judgement is inactive word; Described first threshold preferably can be obtained by experience and statistical law, and first threshold gets-4.0 in the present embodiment;
If the phoneme number > Second Threshold that the phoneme log posterior probability value of all identification phoneme of speech sound is less than-1, then directly judgement is inactive word; Described Second Threshold preferably can be obtained by experience and statistical law, and Second Threshold gets 4 in the present embodiment;
If duration is state number > the 3rd threshold value of a frame, then directly judgement is inactive word; Described 3rd threshold value preferably can be obtained by experience and statistical law, and the 3rd threshold value gets 12 in the present embodiment;
If minimum syllable duration <=the 4th threshold value, then directly judgement is inactive word; Described 4th threshold value preferably can be obtained by experience and statistical law, and the 4th threshold value gets 6 in the present embodiment;
If identify that the total duration of voice is less than to identify phoneme of speech sound number * 6 frame or be greater than identification phoneme of speech sound number * 15 frame, then directly judgement is inactive word.
Step 305-2) second time judgement is carried out to the degree of confidence vector that anticipation must not directly be adjudicated as the identification voice of inactive word;
The degree of confidence vector of described identification voice to identify the vector that several degree of confidence of each phoneme of voice are formed by phoneme sequence arrangement, identifies that the dimension of the degree of confidence vector of voice is the number of several degree of confidence of number * identifying phoneme of speech sound;
To identify that voice are " your good air-conditioning ", each Chinese character is made up of initial consonant and simple or compound vowel of a Chinese syllable two phonemes, and so the dimension of the degree of confidence vector of " your good air-conditioning " is 6*8=48 dimension.
The judgement of described second time adopts sorter to realize, and sorter is linear classifier or mixed Gauss model sorter or support vector machine (SVM) sorter, and the sorter that the present embodiment adopts is SVM classifier.
Before the judgement of described second time, first train a SVM classifier with the positive sample of equivalent and negative sample; Described positive sample is specify the sound bite activating word content, and described negative sample is the sound bite of non-designated activation word content;
Described second time judgement comprises: will identify that the degree of confidence vector input SVM classifier of voice is classified, the output of SVM classifier is 1 or 2, and wherein 1 represents it is activate word, and 2 represent inactive word.
Step 305-3) export final recognition result.
With reference to figure 2, the present invention also provides a kind of voice-activation system, comprising:
VAD (silence suppression) module, for according to silence suppression configuration parameter corresponding to noise circumstance hierarchical selection, is cut into sound bite by the continuous speech stream of collection;
Characteristic extracting module, for extracting the phonetic feature of sound bite;
Acoustic model, for describing the phonetic feature regularity of distribution of each pronunciation unit in acoustic space;
Decoder module, for setting up decoding network space on the basis of acoustic model, Veterbi decoding is carried out to the phonetic feature of sound bite, in decoding network space, find optimum phoneme path as identification voice path, all non-junk phonemes on optimum phoneme path are identification phoneme of speech sound;
Confidence calculations module, for choosing several tolerance as several degree of confidence identifying phoneme of speech sound from all tolerance that can characterize pronunciation unit credibility, calculates several degree of confidence of identification phoneme of speech sound;
Second judgement module, for identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
The present embodiment activates word with " your good air-conditioning " for specifying, acoustic model adopts the bright read data compared with quiet environment of 150 hours, the bright read data of 10 people is recorded as the test set passing judgment on activity ratio under actual scene, actual scene is divided into Four types: peace and quiet, echo, noise, echo+noise, and everyone reads aloud 20 and activates word; 10 people 24 hr light data are recorded as the test set passing judgment on false-alarm under same four kinds of scenes.The performance of voice-activation system of the present invention is as table 1:
Table 1
Quiet | Echo | Noise | Echo+noise | |
Activity ratio | 91.3% | 89.5% | 80.2% | 75.1% |
False-alarm | 0 | 1 time/hour | 2 times/hour | 2.6 times/hour |
Claims (7)
1. a voice activated method, comprising:
Set up acoustic model, acoustic model basis is set up decoding network space;
According to the silence suppression configuration parameter that noise circumstance hierarchical selection is corresponding, input voice flow is cut into sound bite; Extract the phonetic feature of sound bite; Phonetic feature is inputted decoding network space and carry out decoding identification, obtain and identify phoneme of speech sound; From all tolerance that can characterize pronunciation unit credibility, choose several tolerance as several degree of confidence identifying phoneme of speech sound, calculate several degree of confidence identifying phoneme of speech sound; To identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
2. voice activated method according to claim 1, it is characterized in that, described decoding network space of setting up comprises: the rubbish phoneme sub-network by the rubbish phoneme parallel connection in phone set being circulation, activation word phone string is linked in sequence into the phoneme that the activation word of specifying comprises, then add described rubbish phoneme sub-network at activation word phone string head and the tail, the rubbish phoneme sub-network of head and the tail strides across activation word phone string and is directly connected.
3. voice activated method according to claim 1, is characterized in that, described noise circumstance grade is: strong noise environment, medium noise circumstance, quiet environment; Noise circumstance grade is classified according to the sound pressure level of neighbourhood noise.
4. voice activated method according to claim 1, it is characterized in that, several degree of confidence of described identification phoneme of speech sound comprise: the phoneme log-likelihood of the regular duration of phoneme, Time alignment, phoneme log posterior probability, duration are the state number of a frame, minimum syllable duration, the total duration of identification voice.
5. voice activated method according to claim 4, is characterized in that, described pre-judgement comprises:
If the phoneme log posterior probability sub-minimum < first threshold of all identification phoneme of speech sound, then directly judgement is inactive word; If the phoneme number > Second Threshold that the phoneme log posterior probability value of all identification phoneme of speech sound is less than-1, then directly judgement is inactive word; If duration is state number > the 3rd threshold value of a frame, then directly judgement is inactive word; If minimum syllable duration <=the 4th threshold value, then directly judgement is inactive word; If identify that the total duration of voice is less than to identify phoneme of speech sound number * 6 frame or be greater than identification phoneme of speech sound number * 15 frame, then directly judgement is inactive word; Described first threshold, Second Threshold, the 3rd threshold value and the 4th threshold value are preferably obtained by experience and statistical law.
6. voice activated method according to claim 1, is characterized in that, the judgement of described second time adopts sorter to realize, and described sorter is linear classifier or mixed Gauss model sorter or support vector machine classifier.
7. a voice-activation system, is characterized in that, described system comprises:
Silence suppression module, for according to silence suppression configuration parameter corresponding to noise circumstance hierarchical selection, is cut into sound bite by the continuous speech stream of collection;
Characteristic extracting module, for extracting the phonetic feature of sound bite;
Acoustic model, for describing the phonetic feature regularity of distribution of each pronunciation unit in acoustic space;
Decoder module, for setting up decoding network space on the basis of acoustic model, Veterbi decoding is carried out to the phonetic feature of sound bite, in decoding network space, find optimum phoneme path as identification voice path, all non-junk phonemes on optimum phoneme path are identification phoneme of speech sound;
Confidence calculations module, for choosing several tolerance as several degree of confidence identifying phoneme of speech sound from all tolerance that can characterize pronunciation unit credibility, calculates several degree of confidence of identification phoneme of speech sound;
Second judgement module, for identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410418850.3A CN105374352B (en) | 2014-08-22 | 2014-08-22 | A kind of voice activated method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410418850.3A CN105374352B (en) | 2014-08-22 | 2014-08-22 | A kind of voice activated method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105374352A true CN105374352A (en) | 2016-03-02 |
CN105374352B CN105374352B (en) | 2019-06-18 |
Family
ID=55376483
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410418850.3A Expired - Fee Related CN105374352B (en) | 2014-08-22 | 2014-08-22 | A kind of voice activated method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105374352B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484147A (en) * | 2016-10-21 | 2017-03-08 | 深圳仝安技术有限公司 | A kind of method and device of new activation voice system |
CN107767863A (en) * | 2016-08-22 | 2018-03-06 | 科大讯飞股份有限公司 | voice awakening method, system and intelligent terminal |
CN107767861A (en) * | 2016-08-22 | 2018-03-06 | 科大讯飞股份有限公司 | voice awakening method, system and intelligent terminal |
CN107919116A (en) * | 2016-10-11 | 2018-04-17 | 芋头科技(杭州)有限公司 | A kind of voice-activation detecting method and device |
CN108231089A (en) * | 2016-12-09 | 2018-06-29 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on artificial intelligence |
CN108281137A (en) * | 2017-01-03 | 2018-07-13 | 中国科学院声学研究所 | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system |
CN108831459A (en) * | 2018-05-30 | 2018-11-16 | 出门问问信息科技有限公司 | Audio recognition method and device |
CN110111779A (en) * | 2018-01-29 | 2019-08-09 | 阿里巴巴集团控股有限公司 | Syntactic model generation method and device, audio recognition method and device |
CN110364142A (en) * | 2019-06-28 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Phoneme of speech sound recognition methods and device, storage medium and electronic device |
CN110992929A (en) * | 2019-11-26 | 2020-04-10 | 苏宁云计算有限公司 | Voice keyword detection method, device and system based on neural network |
CN111009234A (en) * | 2019-12-25 | 2020-04-14 | 上海忆益信息科技有限公司 | Voice conversion method, device and equipment |
CN111429901A (en) * | 2020-03-16 | 2020-07-17 | 云知声智能科技股份有限公司 | IoT chip-oriented multi-stage voice intelligent awakening method and system |
CN111653276A (en) * | 2020-06-22 | 2020-09-11 | 四川长虹电器股份有限公司 | Voice awakening system and method |
CN112652306A (en) * | 2020-12-29 | 2021-04-13 | 珠海市杰理科技股份有限公司 | Voice wake-up method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739869A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Priori knowledge-based pronunciation evaluation and diagnosis system |
CN102044243A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method and device for voice activity detection (VAD) and encoder |
CN102982811A (en) * | 2012-11-24 | 2013-03-20 | 安徽科大讯飞信息科技股份有限公司 | Voice endpoint detection method based on real-time decoding |
CN103810996A (en) * | 2014-02-21 | 2014-05-21 | 北京凌声芯语音科技有限公司 | Processing method, device and system for voice to be tested |
US20140214416A1 (en) * | 2013-01-30 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and system for recognizing speech commands |
-
2014
- 2014-08-22 CN CN201410418850.3A patent/CN105374352B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739869A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Priori knowledge-based pronunciation evaluation and diagnosis system |
CN102044243A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method and device for voice activity detection (VAD) and encoder |
CN102982811A (en) * | 2012-11-24 | 2013-03-20 | 安徽科大讯飞信息科技股份有限公司 | Voice endpoint detection method based on real-time decoding |
US20140214416A1 (en) * | 2013-01-30 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and system for recognizing speech commands |
CN103810996A (en) * | 2014-02-21 | 2014-05-21 | 北京凌声芯语音科技有限公司 | Processing method, device and system for voice to be tested |
Non-Patent Citations (1)
Title |
---|
VETON KEPUSKA: "Wake-Up-Word Speech Recognition", 《SPEECH TECHNOLOGIES》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767863A (en) * | 2016-08-22 | 2018-03-06 | 科大讯飞股份有限公司 | voice awakening method, system and intelligent terminal |
CN107767861A (en) * | 2016-08-22 | 2018-03-06 | 科大讯飞股份有限公司 | voice awakening method, system and intelligent terminal |
CN107919116B (en) * | 2016-10-11 | 2019-09-13 | 芋头科技(杭州)有限公司 | A kind of voice-activation detecting method and device |
CN107919116A (en) * | 2016-10-11 | 2018-04-17 | 芋头科技(杭州)有限公司 | A kind of voice-activation detecting method and device |
WO2018068649A1 (en) * | 2016-10-11 | 2018-04-19 | 芋头科技(杭州)有限公司 | Method and device for detecting voice activation |
CN106484147A (en) * | 2016-10-21 | 2017-03-08 | 深圳仝安技术有限公司 | A kind of method and device of new activation voice system |
CN108231089B (en) * | 2016-12-09 | 2020-11-03 | 百度在线网络技术(北京)有限公司 | Speech processing method and device based on artificial intelligence |
CN108231089A (en) * | 2016-12-09 | 2018-06-29 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on artificial intelligence |
CN108281137A (en) * | 2017-01-03 | 2018-07-13 | 中国科学院声学研究所 | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system |
CN110111779A (en) * | 2018-01-29 | 2019-08-09 | 阿里巴巴集团控股有限公司 | Syntactic model generation method and device, audio recognition method and device |
CN110111779B (en) * | 2018-01-29 | 2023-12-26 | 阿里巴巴集团控股有限公司 | Grammar model generation method and device and voice recognition method and device |
CN108831459A (en) * | 2018-05-30 | 2018-11-16 | 出门问问信息科技有限公司 | Audio recognition method and device |
CN110364142B (en) * | 2019-06-28 | 2022-03-25 | 腾讯科技(深圳)有限公司 | Speech phoneme recognition method and device, storage medium and electronic device |
CN110364142A (en) * | 2019-06-28 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Phoneme of speech sound recognition methods and device, storage medium and electronic device |
CN110992929A (en) * | 2019-11-26 | 2020-04-10 | 苏宁云计算有限公司 | Voice keyword detection method, device and system based on neural network |
CN111009234A (en) * | 2019-12-25 | 2020-04-14 | 上海忆益信息科技有限公司 | Voice conversion method, device and equipment |
CN111429901A (en) * | 2020-03-16 | 2020-07-17 | 云知声智能科技股份有限公司 | IoT chip-oriented multi-stage voice intelligent awakening method and system |
CN111653276A (en) * | 2020-06-22 | 2020-09-11 | 四川长虹电器股份有限公司 | Voice awakening system and method |
CN112652306A (en) * | 2020-12-29 | 2021-04-13 | 珠海市杰理科技股份有限公司 | Voice wake-up method and device, computer equipment and storage medium |
CN112652306B (en) * | 2020-12-29 | 2023-10-03 | 珠海市杰理科技股份有限公司 | Voice wakeup method, voice wakeup device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105374352B (en) | 2019-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105374352A (en) | Voice activation method and system | |
CN105529028B (en) | Speech analysis method and apparatus | |
CN110364143B (en) | Voice awakening method and device and intelligent electronic equipment | |
CN103928023B (en) | A kind of speech assessment method and system | |
CN101930735B (en) | Speech emotion recognition equipment and speech emotion recognition method | |
CN102800314B (en) | English sentence recognizing and evaluating system with feedback guidance and method | |
CN103971678B (en) | Keyword spotting method and apparatus | |
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
CN101118745B (en) | Confidence degree quick acquiring method in speech identification system | |
US20170154640A1 (en) | Method and electronic device for voice recognition based on dynamic voice model selection | |
CN103177733B (en) | Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
CN105632486A (en) | Voice wake-up method and device of intelligent hardware | |
CN104050965A (en) | English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof | |
Ferrer et al. | A prosody-based approach to end-of-utterance detection that does not require speech recognition | |
CN107329996A (en) | A kind of chat robots system and chat method based on fuzzy neural network | |
CN104464724A (en) | Speaker recognition method for deliberately pretended voices | |
CN101751919A (en) | Spoken Chinese stress automatic detection method | |
Levitan et al. | Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. | |
CN101645269A (en) | Language recognition system and method | |
CN106782508A (en) | The cutting method of speech audio and the cutting device of speech audio | |
CN106548775A (en) | A kind of audio recognition method and system | |
CN106875943A (en) | A kind of speech recognition system for big data analysis | |
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
CN110019741A (en) | Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190618 |
|
CF01 | Termination of patent right due to non-payment of annual fee |