CN105374352A - Voice activation method and system - Google Patents

Voice activation method and system Download PDF

Info

Publication number
CN105374352A
CN105374352A CN201410418850.3A CN201410418850A CN105374352A CN 105374352 A CN105374352 A CN 105374352A CN 201410418850 A CN201410418850 A CN 201410418850A CN 105374352 A CN105374352 A CN 105374352A
Authority
CN
China
Prior art keywords
phoneme
voice
judgement
speech sound
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410418850.3A
Other languages
Chinese (zh)
Other versions
CN105374352B (en
Inventor
葛凤培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201410418850.3A priority Critical patent/CN105374352B/en
Publication of CN105374352A publication Critical patent/CN105374352A/en
Application granted granted Critical
Publication of CN105374352B publication Critical patent/CN105374352B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a voice activation method. The method comprises steps: an acoustic model is built, and decoding network space is built on the basis of the acoustic model; according to a noise environment level, a corresponding silence suppression configuration parameter is selected, and an input voice stream is cut into voice fractions; voice features of the voice fractions are extracted; the voice features are inputted to the decoding network space for decoding and recognition, and recognition voice phonemes are acquired; from all measurements capable of representing a confidence level of a pronouncing unit, a plurality of measurements are selected to serve as a plurality of confidence levels of the recognition voice phonemes; the plurality of confidence levels of the recognition voice phonemes are calculated; secondary judgment, comprising pre-judgment and second judgment, is carried out on the plurality of confidence levels of the recognition voice phonemes, and a final recognition result is outputted. The method overcomes defects existing in a finger starting device, good activation effects can be realized, and convenience is provided for using a voice recognition device by people.

Description

A kind of voice activated method and system
Technical field
The invention belongs to technical field of voice recognition, specifically, the present invention relates to a kind of voice activated method and system.
Background technology
Current speech recognition technology is when being subject to the affecting of the factor such as noise and natural spoken language, and recognition correct rate can seriously reduce.Therefore in daily life, the interactive mode based on continuous speech recognition technology is difficult to realize.Current common solution is the mode opening voice identification equipment adopting finger key, and such user just can carry out voice typing under relatively quiet ambient condition, thus ensures good speech recognition effect, finishing man-machine interaction work then.
The opening ways of finger key can bring various inconvenience to user.First, finger key requires that the distance between user and speech recognition apparatus can not exceed arm, brings operational difficulty can to so far away the or handicapped customer group of distance equipment; Secondly, in the environment of dark, user is not easy the position finding button; Again, finger key is not suitable for the occupied user of both hands, and for user's steering vehicle, now user is inconvenient to use finger key mode.In sum, the dependence of this opening ways of finger key is limited to the promotion and application of speech recognition technology.
Voice activation techniques provides a kind of scheme overcoming above-mentioned defect, contributes to the application and development advancing man-machine interaction.Document [1] (" Wake-Up-WordSpeechRecognition ", VetonKepuska (2011), SpeechTechnologies, Prof.Lvolpsic (Ed.), ISBN:978-953-307-996-7.) the voice activation algorithm based on speech recognition framework is described in, this algorithm does not consider various neighbourhood noise in actual application environment, there is good voice activation performance in environment facies are to quiet laboratory, but be applied to voice activation performance possibility severe exacerbation in the larger environment of ground unrest; And only adopting sorter to carry out judging confidence in document [1], its judgement accuracy places one's entire reliance upon the training sample of sorter, if the improper meeting of training sample selection directly affects voice activation performance.
Summary of the invention
The object of the invention is to overcome the various defects that finger key opening voice identification equipment exists, provide a kind of voice activation this brand-new device start pattern, thus use speech recognition apparatus to provide convenience for people.
To achieve these goals, the invention provides a kind of voice activated method, comprising:
Set up acoustic model, acoustic model basis is set up decoding network space;
According to the silence suppression configuration parameter that noise circumstance hierarchical selection is corresponding, input voice flow is cut into sound bite; Extract the phonetic feature of sound bite; Phonetic feature is inputted decoding network space and carry out decoding identification, obtain and identify phoneme of speech sound; From all tolerance that can characterize pronunciation unit credibility, choose several tolerance as several degree of confidence identifying phoneme of speech sound, calculate several degree of confidence identifying phoneme of speech sound; To identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
In technique scheme, described decoding network space of setting up comprises: the rubbish phoneme sub-network by the rubbish phoneme parallel connection in phone set being circulation, activation word phone string is linked in sequence into the phoneme that the activation word of specifying comprises, then add described rubbish phoneme sub-network at activation word phone string head and the tail, the rubbish phoneme sub-network of head and the tail strides across activation word phone string and is directly connected.
In technique scheme, described noise circumstance grade is: strong noise environment, medium noise circumstance, quiet environment; Noise circumstance grade is classified according to the sound pressure level of neighbourhood noise.
In technique scheme, several degree of confidence of described identification phoneme of speech sound comprise: the phoneme log-likelihood of the regular duration of phoneme, Time alignment, phoneme log posterior probability, duration are the state number of a frame, minimum syllable duration, the total duration of identification voice.
In technique scheme, described pre-judgement comprises:
If the phoneme log posterior probability sub-minimum < first threshold of all identification phoneme of speech sound, then directly judgement is inactive word; If the phoneme number > Second Threshold that the phoneme log posterior probability value of all identification phoneme of speech sound is less than-1, then directly judgement is inactive word; If duration is state number > the 3rd threshold value of a frame, then directly judgement is inactive word; If minimum syllable duration <=the 4th threshold value, then directly judgement is inactive word; If identify that the total duration of voice is less than to identify phoneme of speech sound number * 6 frame or be greater than identification phoneme of speech sound number * 15 frame, then directly judgement is inactive word; Described first threshold, Second Threshold, the 3rd threshold value and the 4th threshold value are preferably obtained by experience and statistical law.
In technique scheme, the judgement of described second time adopts sorter to realize, and described sorter is linear classifier or mixed Gauss model sorter or support vector machine classifier.
In addition, present invention also offers a kind of voice-activation system, described system comprises:
Silence suppression module, for according to silence suppression configuration parameter corresponding to noise circumstance hierarchical selection, is cut into sound bite by the continuous speech stream of collection;
Characteristic extracting module, for extracting the phonetic feature of sound bite;
Acoustic model, for describing the phonetic feature regularity of distribution of each pronunciation unit in acoustic space;
Decoder module, for setting up decoding network space on the basis of acoustic model, Veterbi decoding is carried out to the phonetic feature of sound bite, in decoding network space, find optimum phoneme path as identification voice path, all non-junk phonemes on optimum phoneme path are identification phoneme of speech sound;
Confidence calculations module, for choosing several tolerance as several degree of confidence identifying phoneme of speech sound from all tolerance that can characterize pronunciation unit credibility, calculates several degree of confidence of identification phoneme of speech sound;
Second judgement module, for identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
The invention has the advantages that:
1, by the classification process to noise circumstance, voice-activation system provided by the invention has good robustness in noise circumstance;
2, by setting up specific decoding network space, the ground unrest that exists in actual application environment can be eliminated to the harmful effect of speech recognition performance;
3, by the second judgement to voice identification result, the identification error rate of voice is dropped to minimum, reach excellent voice activation effect;
4, voice-activation system provided by the invention, has broad application prospects in interactively controlling intelligent household appliances, Wearable etc.
Accompanying drawing explanation
Fig. 1 is the building mode schematic diagram in decoding network space of the present invention;
Fig. 2 is the module composition diagram of voice-activation system of the present invention.
Embodiment
Describe further below in conjunction with accompanying drawing and doing specific embodiment of the invention.
Voice activated method provided by the invention comprises the following steps:
Step 1) set up acoustic model;
Phone set comprise 65 Chinese without tuning element, 15 rubbish phonemes (filler), represent the sp phoneme of quiet sil phoneme and expression minibreak; Each phoneme utilizes context extension to become three-tone, and each three-tone is connected by three sequence of states.Described 15 rubbish phonemes are obtained by statistical method, according to obscuring and degree of correlation between each phoneme, all phonemes are gathered into multiple Similarity Class, and each Similarity Class is as a rubbish phoneme.
By the mode of decision tree, identical central phoneme, same position, different contextual state group are carried out cluster, obtain 3970 states, i.e. 3970 unit, each unit is described by the mixed Gauss model (GMM) comprising 8 gaussian component; Based on phone set and 3970 cell formation acoustic models.
Step 2) on acoustic model basis, set up decoding network space;
With reference to figure 1, the mode of setting up in decoding network space for: by step 1) described in 15 rubbish phonemes parallel connections rubbish phoneme sub-network that is circulation, activation word phone string is linked in sequence into the phoneme that the activation word of specifying comprises, then add described rubbish phoneme sub-network at activation word phone string head and the tail, the rubbish phoneme sub-network of head and the tail strides across activation word phone string and is directly connected.
The decoding network space of above-mentioned foundation can complete five class sound bites forces alignment accurately, described five class sound bites are: activate word, the activation word that the activation word of rubbish voice, the activation word of rear belt rubbish voice are with in front portion, all there is rubbish voice front and back, full rubbish voice, this five classes sound bite covers all possible voice to be identified.
For the activation word of specifying be: " your good air-conditioning ", the activation word phone string of series winding is " n-i-h-ao-k-ong-t-iao ".
Step 3) according to VAD (silence suppression) configuration parameter corresponding to noise circumstance hierarchical selection, input voice flow is cut into sound bite; Extract the phonetic feature of sound bite; Phonetic feature is inputted decoding network space and carry out decoding identification, obtain and identify phoneme of speech sound; From all tolerance that can characterize pronunciation unit credibility, choose several tolerance as several degree of confidence identifying phoneme of speech sound, calculate several degree of confidence identifying phoneme of speech sound; To identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
In technique scheme, described step 3) comprise further:
Step 301) according to VAD (silence suppression) configuration parameter corresponding to noise circumstance hierarchical selection, input voice flow is cut into sound bite;
Noise circumstance is divided into Three Estate: strong noise environment, medium noise circumstance, quiet environment, grade is classified according to the sound pressure level of neighbourhood noise, and the computing method of sound pressure level are as follows:
Lp=20*lg(p/p0)
Wherein, Lp is sound pressure level, and unit is decibel; P is acoustic pressure; P0 is reference acoustic pressure, in atmosphere p0=2 × 10 -5.
Noise circumstance grade separation standard is as follows:
According to the VAD configuration parameter that noise circumstance hierarchical selection is corresponding, the continuous speech stream of input is cut into little sound bite, the target of cutting is that the intermediary position of speaking people disconnects, and namely guarantee one section of continuous voice is placed in a sound bite as far as possible.Different VAD configuration parameters can ensure that voice flow cutting does not have notable difference with the fluctuations of neighbourhood noise, obtains sound bite accurately with this, reduces the cut-off phenomenon of complete speech and occurs.
Step 302) extract the phonetic feature of sound bite;
8K sampling rate is adopted to gather voice, voice sub-frame processing adopts 25 milliseconds of windows length, 10 milliseconds of windows to move, extract 12 dimensions PLP (perception linear predictor coefficient) and 1 and tie up energy as the static nature of voice, have employed two jumps and divide parameter extraction 39 dimensional feature as the behavioral characteristics of voice.Have employed HLDA (the linear distinction analysis of Singular variance) technology and static nature and behavioral characteristics are converted to improve the ability of distinguishing characteristic.
Step 303) phonetic feature inputted decoding network space and carry out decoding and identify, obtain and identify phoneme of speech sound;
Decoding identifies and adopts Viterbi (Viterbi) algorithm, and spatially find optimum phoneme path as identification voice path at decoding network, all phonemes optimum phoneme path removed beyond filler are identification phoneme of speech sound.If all identification phoneme of speech sound are all filler, then directly judge to identify that voice are as inactive word, proceed to step 305-3); Otherwise, proceed to step 304).
Step 304) calculate several degree of confidence identifying phoneme of speech sound;
Several described degree of confidence are: the phoneme log-likelihood of the regular duration of phoneme, Time alignment, phoneme log posterior probability, duration are the state number of a frame, minimum syllable duration, the total duration of identification voice.
The computing method of the regular duration of phoneme are as follows:
dur NOR ( p i ) = dur ( p i ) &Sigma; i = 0 S dur ( p i )
Wherein, p ibe i-th and identify phoneme of speech sound; dur nOR(p i) be i-th regular duration identifying phoneme of speech sound; Dur (p i) be i-th duration identifying phoneme of speech sound; S is the total number of phoneme identifying that voice packet contains.
The phoneme log-likelihood value calculating method of Time alignment is as follows:
LL Nor ( p i ) = ln ( P ( O | p i ) ) dur ( p i )
Wherein, LL nor(p i) be i-th Time alignment log-likelihood identifying phoneme of speech sound; P (O|p i) be i-th likelihood value identifying phoneme of speech sound, lnP (O|p i) all can obtain in the decoded result of routine.
Phoneme log posterior probability computing method are as follows:
GOP ( p i ) = ln P ( p i | O ) dur ( p i ) = ln ( P ( O | p i ) P ( p i ) &Sigma; q &Element; Q P ( O | q ) P ( q ) ) / dur ( p i ) &ap; ln ( P ( O | p i ) &Sigma; q &Element; Q P ( O | q ) ) / dur ( p i )
Wherein, GOP (p i) be i-th phoneme log posterior probability value identifying phoneme of speech sound; be the likelihood value sum of all phonemes in phone set, Q is above-mentioned steps 1) in phone set.
Step 305) to identifying that several degree of confidence of phoneme of speech sound carry out second judgement, export final recognition result.Comprise:
Step 305-1) to identifying that several degree of confidence of phoneme of speech sound are adjudicated in advance, if court verdict is inactive word, proceed to 305-3); Otherwise, proceed to 305-2);
Described pre-judgement comprises:
If the phoneme log posterior probability sub-minimum < first threshold of all identification phoneme of speech sound, then directly judgement is inactive word; Described first threshold preferably can be obtained by experience and statistical law, and first threshold gets-4.0 in the present embodiment;
If the phoneme number > Second Threshold that the phoneme log posterior probability value of all identification phoneme of speech sound is less than-1, then directly judgement is inactive word; Described Second Threshold preferably can be obtained by experience and statistical law, and Second Threshold gets 4 in the present embodiment;
If duration is state number > the 3rd threshold value of a frame, then directly judgement is inactive word; Described 3rd threshold value preferably can be obtained by experience and statistical law, and the 3rd threshold value gets 12 in the present embodiment;
If minimum syllable duration <=the 4th threshold value, then directly judgement is inactive word; Described 4th threshold value preferably can be obtained by experience and statistical law, and the 4th threshold value gets 6 in the present embodiment;
If identify that the total duration of voice is less than to identify phoneme of speech sound number * 6 frame or be greater than identification phoneme of speech sound number * 15 frame, then directly judgement is inactive word.
Step 305-2) second time judgement is carried out to the degree of confidence vector that anticipation must not directly be adjudicated as the identification voice of inactive word;
The degree of confidence vector of described identification voice to identify the vector that several degree of confidence of each phoneme of voice are formed by phoneme sequence arrangement, identifies that the dimension of the degree of confidence vector of voice is the number of several degree of confidence of number * identifying phoneme of speech sound;
To identify that voice are " your good air-conditioning ", each Chinese character is made up of initial consonant and simple or compound vowel of a Chinese syllable two phonemes, and so the dimension of the degree of confidence vector of " your good air-conditioning " is 6*8=48 dimension.
The judgement of described second time adopts sorter to realize, and sorter is linear classifier or mixed Gauss model sorter or support vector machine (SVM) sorter, and the sorter that the present embodiment adopts is SVM classifier.
Before the judgement of described second time, first train a SVM classifier with the positive sample of equivalent and negative sample; Described positive sample is specify the sound bite activating word content, and described negative sample is the sound bite of non-designated activation word content;
Described second time judgement comprises: will identify that the degree of confidence vector input SVM classifier of voice is classified, the output of SVM classifier is 1 or 2, and wherein 1 represents it is activate word, and 2 represent inactive word.
Step 305-3) export final recognition result.
With reference to figure 2, the present invention also provides a kind of voice-activation system, comprising:
VAD (silence suppression) module, for according to silence suppression configuration parameter corresponding to noise circumstance hierarchical selection, is cut into sound bite by the continuous speech stream of collection;
Characteristic extracting module, for extracting the phonetic feature of sound bite;
Acoustic model, for describing the phonetic feature regularity of distribution of each pronunciation unit in acoustic space;
Decoder module, for setting up decoding network space on the basis of acoustic model, Veterbi decoding is carried out to the phonetic feature of sound bite, in decoding network space, find optimum phoneme path as identification voice path, all non-junk phonemes on optimum phoneme path are identification phoneme of speech sound;
Confidence calculations module, for choosing several tolerance as several degree of confidence identifying phoneme of speech sound from all tolerance that can characterize pronunciation unit credibility, calculates several degree of confidence of identification phoneme of speech sound;
Second judgement module, for identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
The present embodiment activates word with " your good air-conditioning " for specifying, acoustic model adopts the bright read data compared with quiet environment of 150 hours, the bright read data of 10 people is recorded as the test set passing judgment on activity ratio under actual scene, actual scene is divided into Four types: peace and quiet, echo, noise, echo+noise, and everyone reads aloud 20 and activates word; 10 people 24 hr light data are recorded as the test set passing judgment on false-alarm under same four kinds of scenes.The performance of voice-activation system of the present invention is as table 1:
Table 1
Quiet Echo Noise Echo+noise
Activity ratio 91.3% 89.5% 80.2% 75.1%
False-alarm 0 1 time/hour 2 times/hour 2.6 times/hour

Claims (7)

1. a voice activated method, comprising:
Set up acoustic model, acoustic model basis is set up decoding network space;
According to the silence suppression configuration parameter that noise circumstance hierarchical selection is corresponding, input voice flow is cut into sound bite; Extract the phonetic feature of sound bite; Phonetic feature is inputted decoding network space and carry out decoding identification, obtain and identify phoneme of speech sound; From all tolerance that can characterize pronunciation unit credibility, choose several tolerance as several degree of confidence identifying phoneme of speech sound, calculate several degree of confidence identifying phoneme of speech sound; To identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
2. voice activated method according to claim 1, it is characterized in that, described decoding network space of setting up comprises: the rubbish phoneme sub-network by the rubbish phoneme parallel connection in phone set being circulation, activation word phone string is linked in sequence into the phoneme that the activation word of specifying comprises, then add described rubbish phoneme sub-network at activation word phone string head and the tail, the rubbish phoneme sub-network of head and the tail strides across activation word phone string and is directly connected.
3. voice activated method according to claim 1, is characterized in that, described noise circumstance grade is: strong noise environment, medium noise circumstance, quiet environment; Noise circumstance grade is classified according to the sound pressure level of neighbourhood noise.
4. voice activated method according to claim 1, it is characterized in that, several degree of confidence of described identification phoneme of speech sound comprise: the phoneme log-likelihood of the regular duration of phoneme, Time alignment, phoneme log posterior probability, duration are the state number of a frame, minimum syllable duration, the total duration of identification voice.
5. voice activated method according to claim 4, is characterized in that, described pre-judgement comprises:
If the phoneme log posterior probability sub-minimum < first threshold of all identification phoneme of speech sound, then directly judgement is inactive word; If the phoneme number > Second Threshold that the phoneme log posterior probability value of all identification phoneme of speech sound is less than-1, then directly judgement is inactive word; If duration is state number > the 3rd threshold value of a frame, then directly judgement is inactive word; If minimum syllable duration <=the 4th threshold value, then directly judgement is inactive word; If identify that the total duration of voice is less than to identify phoneme of speech sound number * 6 frame or be greater than identification phoneme of speech sound number * 15 frame, then directly judgement is inactive word; Described first threshold, Second Threshold, the 3rd threshold value and the 4th threshold value are preferably obtained by experience and statistical law.
6. voice activated method according to claim 1, is characterized in that, the judgement of described second time adopts sorter to realize, and described sorter is linear classifier or mixed Gauss model sorter or support vector machine classifier.
7. a voice-activation system, is characterized in that, described system comprises:
Silence suppression module, for according to silence suppression configuration parameter corresponding to noise circumstance hierarchical selection, is cut into sound bite by the continuous speech stream of collection;
Characteristic extracting module, for extracting the phonetic feature of sound bite;
Acoustic model, for describing the phonetic feature regularity of distribution of each pronunciation unit in acoustic space;
Decoder module, for setting up decoding network space on the basis of acoustic model, Veterbi decoding is carried out to the phonetic feature of sound bite, in decoding network space, find optimum phoneme path as identification voice path, all non-junk phonemes on optimum phoneme path are identification phoneme of speech sound;
Confidence calculations module, for choosing several tolerance as several degree of confidence identifying phoneme of speech sound from all tolerance that can characterize pronunciation unit credibility, calculates several degree of confidence of identification phoneme of speech sound;
Second judgement module, for identifying that several degree of confidence of phoneme of speech sound carry out second judgement, comprising pre-judgement and second time judgement, exporting final recognition result.
CN201410418850.3A 2014-08-22 2014-08-22 A kind of voice activated method and system Expired - Fee Related CN105374352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410418850.3A CN105374352B (en) 2014-08-22 2014-08-22 A kind of voice activated method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410418850.3A CN105374352B (en) 2014-08-22 2014-08-22 A kind of voice activated method and system

Publications (2)

Publication Number Publication Date
CN105374352A true CN105374352A (en) 2016-03-02
CN105374352B CN105374352B (en) 2019-06-18

Family

ID=55376483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410418850.3A Expired - Fee Related CN105374352B (en) 2014-08-22 2014-08-22 A kind of voice activated method and system

Country Status (1)

Country Link
CN (1) CN105374352B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484147A (en) * 2016-10-21 2017-03-08 深圳仝安技术有限公司 A kind of method and device of new activation voice system
CN107767863A (en) * 2016-08-22 2018-03-06 科大讯飞股份有限公司 voice awakening method, system and intelligent terminal
CN107767861A (en) * 2016-08-22 2018-03-06 科大讯飞股份有限公司 voice awakening method, system and intelligent terminal
CN107919116A (en) * 2016-10-11 2018-04-17 芋头科技(杭州)有限公司 A kind of voice-activation detecting method and device
CN108231089A (en) * 2016-12-09 2018-06-29 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
CN108281137A (en) * 2017-01-03 2018-07-13 中国科学院声学研究所 A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN108831459A (en) * 2018-05-30 2018-11-16 出门问问信息科技有限公司 Audio recognition method and device
CN110111779A (en) * 2018-01-29 2019-08-09 阿里巴巴集团控股有限公司 Syntactic model generation method and device, audio recognition method and device
CN110364142A (en) * 2019-06-28 2019-10-22 腾讯科技(深圳)有限公司 Phoneme of speech sound recognition methods and device, storage medium and electronic device
CN110992929A (en) * 2019-11-26 2020-04-10 苏宁云计算有限公司 Voice keyword detection method, device and system based on neural network
CN111009234A (en) * 2019-12-25 2020-04-14 上海忆益信息科技有限公司 Voice conversion method, device and equipment
CN111429901A (en) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 IoT chip-oriented multi-stage voice intelligent awakening method and system
CN111653276A (en) * 2020-06-22 2020-09-11 四川长虹电器股份有限公司 Voice awakening system and method
CN112652306A (en) * 2020-12-29 2021-04-13 珠海市杰理科技股份有限公司 Voice wake-up method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739869A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Priori knowledge-based pronunciation evaluation and diagnosis system
CN102044243A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method and device for voice activity detection (VAD) and encoder
CN102982811A (en) * 2012-11-24 2013-03-20 安徽科大讯飞信息科技股份有限公司 Voice endpoint detection method based on real-time decoding
CN103810996A (en) * 2014-02-21 2014-05-21 北京凌声芯语音科技有限公司 Processing method, device and system for voice to be tested
US20140214416A1 (en) * 2013-01-30 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and system for recognizing speech commands

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739869A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Priori knowledge-based pronunciation evaluation and diagnosis system
CN102044243A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method and device for voice activity detection (VAD) and encoder
CN102982811A (en) * 2012-11-24 2013-03-20 安徽科大讯飞信息科技股份有限公司 Voice endpoint detection method based on real-time decoding
US20140214416A1 (en) * 2013-01-30 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and system for recognizing speech commands
CN103810996A (en) * 2014-02-21 2014-05-21 北京凌声芯语音科技有限公司 Processing method, device and system for voice to be tested

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VETON KEPUSKA: "Wake-Up-Word Speech Recognition", 《SPEECH TECHNOLOGIES》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767863A (en) * 2016-08-22 2018-03-06 科大讯飞股份有限公司 voice awakening method, system and intelligent terminal
CN107767861A (en) * 2016-08-22 2018-03-06 科大讯飞股份有限公司 voice awakening method, system and intelligent terminal
CN107919116B (en) * 2016-10-11 2019-09-13 芋头科技(杭州)有限公司 A kind of voice-activation detecting method and device
CN107919116A (en) * 2016-10-11 2018-04-17 芋头科技(杭州)有限公司 A kind of voice-activation detecting method and device
WO2018068649A1 (en) * 2016-10-11 2018-04-19 芋头科技(杭州)有限公司 Method and device for detecting voice activation
CN106484147A (en) * 2016-10-21 2017-03-08 深圳仝安技术有限公司 A kind of method and device of new activation voice system
CN108231089B (en) * 2016-12-09 2020-11-03 百度在线网络技术(北京)有限公司 Speech processing method and device based on artificial intelligence
CN108231089A (en) * 2016-12-09 2018-06-29 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
CN108281137A (en) * 2017-01-03 2018-07-13 中国科学院声学研究所 A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN110111779A (en) * 2018-01-29 2019-08-09 阿里巴巴集团控股有限公司 Syntactic model generation method and device, audio recognition method and device
CN110111779B (en) * 2018-01-29 2023-12-26 阿里巴巴集团控股有限公司 Grammar model generation method and device and voice recognition method and device
CN108831459A (en) * 2018-05-30 2018-11-16 出门问问信息科技有限公司 Audio recognition method and device
CN110364142B (en) * 2019-06-28 2022-03-25 腾讯科技(深圳)有限公司 Speech phoneme recognition method and device, storage medium and electronic device
CN110364142A (en) * 2019-06-28 2019-10-22 腾讯科技(深圳)有限公司 Phoneme of speech sound recognition methods and device, storage medium and electronic device
CN110992929A (en) * 2019-11-26 2020-04-10 苏宁云计算有限公司 Voice keyword detection method, device and system based on neural network
CN111009234A (en) * 2019-12-25 2020-04-14 上海忆益信息科技有限公司 Voice conversion method, device and equipment
CN111429901A (en) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 IoT chip-oriented multi-stage voice intelligent awakening method and system
CN111653276A (en) * 2020-06-22 2020-09-11 四川长虹电器股份有限公司 Voice awakening system and method
CN112652306A (en) * 2020-12-29 2021-04-13 珠海市杰理科技股份有限公司 Voice wake-up method and device, computer equipment and storage medium
CN112652306B (en) * 2020-12-29 2023-10-03 珠海市杰理科技股份有限公司 Voice wakeup method, voice wakeup device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN105374352B (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN105374352A (en) Voice activation method and system
CN105529028B (en) Speech analysis method and apparatus
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
CN103928023B (en) A kind of speech assessment method and system
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
CN102800314B (en) English sentence recognizing and evaluating system with feedback guidance and method
CN103971678B (en) Keyword spotting method and apparatus
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN101118745B (en) Confidence degree quick acquiring method in speech identification system
US20170154640A1 (en) Method and electronic device for voice recognition based on dynamic voice model selection
CN103177733B (en) Standard Chinese suffixation of a nonsyllabic &#34;r&#34; sound voice quality evaluating method and system
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN105632486A (en) Voice wake-up method and device of intelligent hardware
CN104050965A (en) English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
Ferrer et al. A prosody-based approach to end-of-utterance detection that does not require speech recognition
CN107329996A (en) A kind of chat robots system and chat method based on fuzzy neural network
CN104464724A (en) Speaker recognition method for deliberately pretended voices
CN101751919A (en) Spoken Chinese stress automatic detection method
Levitan et al. Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
CN101645269A (en) Language recognition system and method
CN106782508A (en) The cutting method of speech audio and the cutting device of speech audio
CN106548775A (en) A kind of audio recognition method and system
CN106875943A (en) A kind of speech recognition system for big data analysis
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
CN110019741A (en) Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190618

CF01 Termination of patent right due to non-payment of annual fee