CN1588535A - Automatic sound identifying treating method for embedded sound identifying system - Google Patents
Automatic sound identifying treating method for embedded sound identifying system Download PDFInfo
- Publication number
- CN1588535A CN1588535A CNA2004100667967A CN200410066796A CN1588535A CN 1588535 A CN1588535 A CN 1588535A CN A2004100667967 A CNA2004100667967 A CN A2004100667967A CN 200410066796 A CN200410066796 A CN 200410066796A CN 1588535 A CN1588535 A CN 1588535A
- Authority
- CN
- China
- Prior art keywords
- voice
- energy
- template
- training
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention is a voice automatic-identification processing method of embedded voice identifying system, composed of a front-end processing part, a real-time identifying part, a back-end process part and a template training part, adopting self-adapting end-point detecting technique to draw voiced segments, adopting synchronous mode to identify input voice, applying vector-supporting algorithm to realize fast rejection of non-command voice, thus improving identification reliability and practicality, and adopting multistage vector quantization method to train the voice template and assorted with McE/GPD distinctive training to optimize the voice template so as to improve identifying property. The used acoustic model has a small memory space, thus effectively increasing the identification ratio of the system to above 95%, its algorithm load is small, its memory space is small and its identification rejection ratio is higher than 80%.
Description
Technical field
The present invention relates to a kind of automatic speech recognition disposal route, specifically is a kind of automatic speech recognition disposal route of built-in speech recognition system.Be used for the intelligent information processing technology field.
Background technology
The application of speech recognition technology can be divided into two developing direction: a direction is a large vocabulary continuous speech recognition system, be mainly used in the dictation machine of computing machine, and the voice messaging inquiry service that combines with telephone network or internet, these systems all realize on computer platform; The developing direction built-in speech recognition system that another one is important, it is the application of miniaturization, portable voice product, application as the aspects such as interactive voice of the voice control of the dialing on the wireless phone, automobile equipment, intelligent toy, household remote, PDA(Personal Digital Assistant), these application systems mostly use special hardware system to realize, as MCU, DSP and speech recognition special chip.Particularly for mobile devices such as mobile phones, phonetic entry is ideal input method, not only can eliminate loaded down with trivial details keyboard input, and helps the miniaturization of product.Generally all be based on the PC platform for large vocabulary continuous speech recognition system wherein, built-in speech recognition system then generally adopts the MCU or the dsp chip of low-power consumption, low price, and its arithmetic speed, memory capacity are all very limited.Simultaneously, it generally also requires identification is real-time, and have that volume is little, reliability is high, power consumptive province, characteristics such as cheap.The finiteness of these application characteristics and resource is that present built-in speech recognition system is pushed practical difficult point to, therefore too complexity is calculated in its identification under the prerequisite that guarantees certain discrimination, mostly having a large vocabulary of identification is middle or small vocabulary, promptly between 10 ~ 100 order speech.
Existing built-in speech recognition system is specific people's speech recognition a bit, promptly needs the user to allow system the entry of being discerned is learnt earlier or be trained before use.This class recognition function to languages, dialect without limits, discrimination is very high, but recording before using and training are very inconvenient.The system that has can realize the unspecified person speech recognition, promptly in advance the speech model that will discern is trained and the storer of the system of packing into, does not need when the user uses to learn again and directly uses.But this class recognition function only is applicable to the languages and the dialect of regulation, and the statement of being discerned is only limited to the statement that has trained in advance, and discrimination is lower than specific robot system, is still waiting further raising.The Tiny-Voice system of Brad for example based on microcomputer development.This system is a speaker dependent's a little vocabulary recognition system.The order number of identification is 16.The manual button that is input as of order is indicated.Length to input command also has requirement, is 0.2 to 1.6 second.The time of identification is roughly within 100 milliseconds.The hardware computing unit is HC705.Price is about 5 U.S. dollars.The little vocabulary recognition system of the unspecified person of TI company.What it adopted is the template of HMM model.Discern 15 different orders.Difference male voice and female voice.Also set up the model of grammer layer simultaneously, supported the input of simple grammer.Applied scene is the speech recognition of dialing.Discrimination is greater than 90%.The hardware computing unit is TMS320C2x and TMS320C5x.Price is more expensive, is about 200 U.S. dollars.The robustness of these systems is not high, the performance meeting control breakdown under the low signal-to-noise ratio situation, and the command set of identification is very little.
Find through open source literature retrieval prior art, the patent No. 99123747.1, name is called: " training of phonetic command controller and recognition methods ", this patent have proposed a kind of voice recognition processing method that is used for embedded system.Its direct application training compress speech forms template, does not consider the difference performance between the voice command template, has influenced the effect of identification.It adopts the recognition methods based on probability, and calculation of complex is not suitable for using in the demanding embedded system of real-time performance.Simultaneously, the end-point detecting method that it adopts also need improve the adaptive faculty to environment, to non-command word refuse know too simply, its performance remains further to be improved.
Summary of the invention
The objective of the invention is for overcoming the deficiencies in the prior art, a kind of automatic speech recognition disposal route of built-in speech recognition system of low price is proposed, make its real-time voice that is used for various Embedded Application field identification and control, improved the discrimination of system effectively, discrimination reaches more than 95%, algorithm pressure is little, and storage space is little, is well suited for real time execution in hardware environment.
The present invention is achieved by the following technical solutions, and the present invention is made up of front-end processing, Real time identification, back-end processing and four parts of template training, adopts self-adaptation end-point detection technology to extract sound section; Adopt method of synchronization identification input voice; Use the non-fast order voice of algorithm of support vector machine realization and refuse to know, improve the reliability and the practicality of identification; Adopt multistage vector quantization method training utterance template, and be aided with the training of MCE/GPD distinctiveness, optimize sound template and improve recognition performance.
Below to further instruction of the present invention:
1. front-end processing
Be made up of end-point detection and feature extraction two parts, wherein end-point detection adopts voice status figure accurately to detect the beginning and the end of voice based on adaptive energy and speech waveform feature.End-point detecting method has comprised a forward process of measuring short-time energy based on the speech energy status change.Earlier estimate the background average energy that voice signal, measure the speech energy profile on this basis, each Short Time Speech energy is converted to state value by the certain energy threshold value with the method for adaptive equalization energy.According to the size and the duration of energy, whole voice process is divided into six states, be respectively initial state (0), quiet attitude (1), energy rising attitude (2), energy lasting attitude (3), energy decreases attitude (4) and rise and fall attitude (5), the condition of its status change depends on the condition of transformation.Carry out end-point detection by the logical relation of energy threshold value and energy state sequence of values at last.Owing to considered when pronunciation speech waveform whole fluctuating process from start to end, adopt adaptive energy as the foundation of judging simultaneously, so the accuracy of end-point detection is improved, and ground unrest has been had certain adaptive ability.
2. Real time identification
Recognizer adopts improved DTW algorithm, revises the classical weight of using in the DTW algorithm, and the extension direction in restriction path approaches diagonal line.Redefine after the weight, the weight sum in path is no longer definite fully by terminal point coordinate.In the middle of the weight comparison procedure of path extension, weight must be done equilibrium with the weight sum on the path, make the weight sum be independent of path.Simultaneously, consider the uncertainty of end-point detection, allow the terminal in path relax, improve the identification error that causes because of end-point detection is inaccurate.By repeatedly experiment, can select optimum weight and lax scope.Dynamic time warping algorithm through revising can further improve the discrimination of system under applied environment.
3. back-end processing mainly comprises the knowledge of refusing of non-order voice, directly utilizes the result of calculation of identification to realize refusing to know function here, and calculating is simple, does not influence the real-time of identification.It is characterized in that directly utilizing the identification score of top n candidate word in the voice identification result, (Support Vector Machine SVM) realizes refusing fast to know to adopt support vector machine.Algorithm utilizes the maximum of Statistical Learning Theory on classification problem to promote ability, not have in calculated amount further to have improved performance under the situation of increase, is better than traditional based on SLP (single-layer perceptron) or MLP (multilayer perceptron) neural net method.
4. template training adopts multistage vector quantization (the Multi-Section VectorQuantization based on dynamic programming algorithm, MSVQ) method, to belong to of a sort training statement earlier and be divided into several sections in time, generate a standard VQ code book with the LBG method in every section then according to dynamic programming algorithm.The MSVQ template has comprised all speakers' phonetic feature in the training set, and has kept the temporal aspect of voice, thereby representative strong, and discrimination is higher.Template has some characteristic of CDHMM template simultaneously, and can reduce the volume of template greatly, improves recognition speed, has recognition effect preferably, is applicable to the embedded recognition system of resource-constrained.On MSVQ template basis, at discerning the DTW recognition technology that is adopted, use MCE/GPD distinctiveness training algorithm from minimum misclassification rate (Minimum Classification Error, MCE) angle improves the separating capacity of template, after the distinctiveness training, the template of more being optimized, discrimination is significantly improved.
The present invention is based on 16 fixed DSP TMS320C5402 chips, is a kind of lower-cost portable units, not only can be independently as the better simply sound-controlled apparatus of function, and can be applied to various Embedded Application field easily.Compare with existing built-in speech recognition system, the storage space of the used acoustic model of the present invention is little, and each only needs 96 * 16, and promptly 192 bytes help the extended command set capacity; Adopted the distinctiveness training method during template training, considered the separating capacity of template, rather than described the difference of training data as far as possible accurately, improved the discrimination of system effectively from the angle that minimizes misclassification rate (MCE); Identifying and phonetic entry are carried out synchronously, have guaranteed the real-time of identification, and discrimination reaches more than 95%; Front-end processing middle-end point detection algorithm is undertaken by the logical relation of energy threshold value and energy state sequence of values, and algorithm pressure is little, and storage space is little, is well suited for real time execution in hardware environment; Back-end processing can effectively refuse to know command set speech or pronunciation in addition, and does not influence the real-time of identification, and reject rate is higher than 80%.
Description of drawings
Fig. 1 synoptic diagram of the present invention
Fig. 2 end-point detection algorithm synoptic diagram
Fig. 3 distinctiveness training synoptic diagram
Fig. 4 system hardware structure synoptic diagram
Embodiment
The embodiment of the invention is described in detail as follows in conjunction with each figure:
The structure of Embedded Speech Recognition System nuclear comprises being used to the DSP unit that calculates and control as shown in Figure 4; The FlashROM that is used to the program of depositing and speech recognition template; Be used for the A/D converter and the microphone of phonetic entry and the programmable logic device (CPLD) that is used to decipher and export control.Illustrate: MIC: microphone, A/D: analog to digital converter, DSP: digital signal processor, RAM: random access storage device, FlashROM: flash memory, CPLD: programmable logic device (PLD).
Speech processes process of the present invention can be divided into front-end processing, Real time identification, back-end processing and four parts of template training, is described as follows in conjunction with Fig. 1:
1. front-end processing:
(1) by A/D (modulus) converter voice signal is sampled, and the voice after the sampling are carried out pre-emphasis and windowing divide frame to handle.Wherein sample frequency is 8kHz, and sampled data is preserved in 16 modes.
(2) carry out end-point detection and calculate obtaining speech data, after detecting voice and beginning, carry out following step up to the end that detects voice, otherwise continue to detect the starting point of voice signal.According to the size and the duration of energy, whole voice process is divided into six states, be respectively initial state (0), quiet attitude (1), energy rising attitude (2), energy lasting attitude (3), energy decreases attitude (4) and rise and fall attitude (5).The condition of its status change depends on the condition of transformation.When finding that frame data are in " voice and spirit are arranged ", just can identify sound and begin.To the frame of back, system can begin to do the process of signal Processing and identification.For the frame of front, can abandon fully, because they all are useless.When finding voice the residence time reaches certain-length in " decline attitude " state, just can judge that voice are through with.Some threshold coefficient that identify among Fig. 2 are used for adjusting the end-point detection performance.Different parameter settings will obtain different end-point detection performances.Parameter declaration is as follows:
E: energy, what get a frame energy is the logarithm value at the end with 2
L1: energy threshold value 1, take from adaptation average energy+232
L2: energy threshold value 2, take from adaptation average energy+432
Backgroundframe: the statistics frame number of background average energy
Artifact: interfering energy frame number (for example lip grating, recall are inhaled, the tooth grating all is the projection interference)
WordGap: minimum interval frame number between two acoustic segment
MinWord: minimum acoustic segment frame number
MaxWord: maximum acoustic segment frame number
(3) characteristic parameter of extraction voice signal promptly extracts the LPCC characteristic parameter.
2. Real time identification:
(1) phonetic feature that previous step is obtained and all command template are carried out the DTW coupling and are calculated.
(2) preserve the DTW result of preceding 10 candidates order of coupling, and the template that will mate most is as the result of identification.
3. the checking of recognition result
(1) checking of recognition result adopt support vector machine (Support Vector Machine, SVM) the theoretical realization:
Suppose data (x is arranged
1, y
1), Λ, (x
M, y
M), x wherein
i∈ R
n, i=1,2, Λ, M are d dimension training samples, y
i∈+1, and-1}, i=1,2, Λ, M show the class in affiliated two classes of this vector.Then can distinguish of the find the solution acquisition of the support vector function of two class data by following problem
C>0th wherein, the constant of control punishment degree.Each Lagrange's multiplier α
iCorresponding training sample x
i, corresponding α
i>0 training sample just is called as " support vector ".The support vector machine classification function that then obtains at last is
(2) according to the result of each identification, establish q
1, q
2, Λ, q
10Be the identification score of preceding 10 candidate word, by series arrangement from small to large.Then its normalization is discerned to such an extent that be divided into:
Corresponding normalization first order difference is:
Proper vector { d with their compositions
1, Λ, d
10, d
1', Λ, d
9' as the input of support vector machine (SVM), calculate the y=f as a result of support vector machine classification function
SVM(x).
(3) classification function that utilizes support vector machine output y=f as a result
SVMWhether (x) ∈ [1,1] according to the class under the current recognition result of its symbol decision (order and non-order two classes), is the order speech thereby judge recognition result fast, and the voice that do not belong to the order speech is refused to know.Wherein SVM is obtained by training set before identification, and the data in the training set obtain as stated above.
4. template training
(1) adopts multistage vector quantization (Multi-Section Vector Quantization, MSVQ) method training original template.If being the T voice signal, frame length represents: X={x by a feature vector sequence
1, x
2..., x
T, MSVQ in chronological sequence order adopts the LBG method to generate a standard VQ code book according to the segment information that obtains respectively to each section the even segmentation of statement then, the average (barycenter) that the present invention here gets all vectors of this section as this segment encode this.
(2) in conjunction with the MSVQ code book, (GeneralizedProbabilistic Descent, GPD) distinctiveness training algorithm (MCE/GPD) carries out retraining to template, trains flow process as shown in Figure 3 to adopt the extensive probability of discerning based on DTW to descend.
A given training statement collection ={ x
1, x
2..., x
N, x wherein
iBelong to M speech C
i, i=1,2 ..., among the M one.
Be by P
iIndividual frame is formed, and every frame is a S dimension speech characteristic vector, is made up of cepstrum coefficient usually.Each order speech is represented by a reference template.Reference template collection Λ={ λ
i={ (R
i, W
i), i=1,2 ..., M} wherein
Be the cepstrum coefficient sequence,
Be the difference weighting function be used for revising template apart from score value.Target of the present invention is, according to the GPD algorithm, reference template collection Λ carried out the distinctiveness training based on training set , makes the identification error rate reach minimum.
(2.1) definition training statement x and speech C
jReference template r
jBetween distance as distinctive function:
W wherein
q jBe speech C
jThe difference weight of reference template.δ
Pq jBe in the optimal path that after the DTW coupling, obtains, speech C
jQ frame of reference template and x in corresponding p
qDistance between the frame.Here adopt Euclidean distance:
Can obtain a continuous distinctive function g that can carry out the gradient operation by above definition to it
k(x; Λ).
(2.2) the definition misclassification is estimated, and recognition result is embedded wherein
Wherein η is an arithmetic number.
(2.3) cost function is as giving a definition:
It can correctly be similar to the identification error rate.
(2.4) adjust the reference template parameter adaptively with the GPD algorithm, thereby make cost function reach minimum.Given one belongs to speech C
kTraining statement x, the regulation rule of reference template parameter is as follows:
During j=k,
Wherein
v
k=l
k(d
k)(1-l
k(d
k)) (11)
T represents iteration the t time, and T is a maximum iteration time, ε
0It is a less positive number.Generally just can obtain convergency value through tens iteration.The distinctiveness that minimizes the classification error rate by the realization of gradient descending method is trained the command template after can obtaining to optimize.
Claims (5)
1, a kind of automatic speech recognition disposal route of built-in speech recognition system, it is characterized in that, form by front-end processing, Real time identification, back-end processing and four parts of template training, adopt self-adaptation end-point detection technology to extract sound section, adopt method of synchronization identification input voice, using the non-fast order voice of algorithm of support vector machine realization refuses to know, improve the reliability and the practicality of identification, adopt multistage vector quantization method training utterance template, and be aided with the training of MCE/GPD distinctiveness, optimize sound template and improve recognition performance.
2, the automatic speech recognition disposal route of built-in speech recognition system as claimed in claim 1 is characterized in that, described front-end processing is specific as follows:
Form by end-point detection and feature extraction two parts, wherein end-point detection is based on adaptive energy and speech waveform feature, adopt voice status figure accurately to detect the beginning and the end of voice, end-point detecting method is based on the speech energy status change, comprised a forward process of measuring short-time energy, earlier estimate the background average energy that voice signal with the method for adaptive equalization energy, measure the speech energy profile on this basis, each Short Time Speech energy is converted to state value by the certain energy threshold value, according to the size and the duration of energy, whole voice process is divided into six states, it is respectively initial state, quiet attitude, energy rising attitude, energy continues attitude, energy decreases attitude and rise and fall attitude, the condition of its status change depends on the condition of transformation, carries out end-point detection by the logical relation of energy threshold value and energy state sequence of values at last.
3, the automatic speech recognition disposal route of built-in speech recognition system as claimed in claim 1 is characterized in that, described Real time identification is specific as follows:
Recognizer adopts improved DTW algorithm, revise the classical weight of using in the DTW algorithm, the extension direction in restriction path approaches diagonal line, redefines after the weight, in the middle of the weight comparison procedure of path extension, must do weight balanced with the weight sum on the path, make the weight sum be independent of path, simultaneously, consider the poor stability of end-point detection, allow the terminal in path relax, improve the identification error that causes because of the end-point detection poor stability.
4, the automatic speech recognition disposal route of built-in speech recognition system as claimed in claim 1 is characterized in that, described back-end processing is specific as follows:
Comprise the knowledge of refusing of non-order voice, directly utilize the identification score of top n candidate word in the voice identification result, adopt support vector machine to realize refusing fast to know.
5, the automatic speech recognition disposal route of built-in speech recognition system as claimed in claim 1, it is characterized in that, described template training, specific as follows: template training adopts the multistage vector quantization method based on dynamic programming algorithm, to belong to of a sort training statement earlier and be divided into several sections in time according to dynamic programming algorithm, generate a standard VQ code book with the LBG method in every section then, the MSVQ template has comprised all speakers' phonetic feature in the training set, and the temporal aspect that has kept voice, on MSVQ template basis, at discerning the DTW recognition technology that is adopted, use MCE/GPD distinctiveness training algorithm improves template from the angle of minimum misclassification rate separating capacity, after the distinctiveness training, the template of more being optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100667967A CN1300763C (en) | 2004-09-29 | 2004-09-29 | Automatic sound identifying treating method for embedded sound identifying system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100667967A CN1300763C (en) | 2004-09-29 | 2004-09-29 | Automatic sound identifying treating method for embedded sound identifying system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1588535A true CN1588535A (en) | 2005-03-02 |
CN1300763C CN1300763C (en) | 2007-02-14 |
Family
ID=34604097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100667967A Expired - Fee Related CN1300763C (en) | 2004-09-29 | 2004-09-29 | Automatic sound identifying treating method for embedded sound identifying system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1300763C (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267362B (en) * | 2008-05-16 | 2010-11-17 | 亿阳信通股份有限公司 | A dynamic identification method and its device for normal fluctuation range of performance normal value |
CN101894548A (en) * | 2010-06-23 | 2010-11-24 | 清华大学 | Modeling method and modeling device for language identification |
CN101339765B (en) * | 2007-07-04 | 2011-04-13 | 黎自奋 | National language single tone recognizing method |
CN102543075A (en) * | 2012-01-12 | 2012-07-04 | 东北石油大学 | Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology |
CN102810311A (en) * | 2011-06-01 | 2012-12-05 | 株式会社理光 | Speaker estimation method and speaker estimation equipment |
CN103971685A (en) * | 2013-01-30 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and system for recognizing voice commands |
CN105489222A (en) * | 2015-12-11 | 2016-04-13 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
CN107799126A (en) * | 2017-10-16 | 2018-03-13 | 深圳狗尾草智能科技有限公司 | Sound end detecting method and device based on Supervised machine learning |
CN108074562A (en) * | 2016-11-11 | 2018-05-25 | 株式会社东芝 | Speech recognition equipment, audio recognition method and storage medium |
CN108281147A (en) * | 2018-03-31 | 2018-07-13 | 南京火零信息科技有限公司 | Voiceprint recognition system based on LPCC and ADTW |
CN110136749A (en) * | 2019-06-14 | 2019-08-16 | 苏州思必驰信息科技有限公司 | The relevant end-to-end speech end-point detecting method of speaker and device |
CN110234472A (en) * | 2017-01-30 | 2019-09-13 | 阿克托梅德股份有限公司 | For generating the surgical assistant system and method for the control signal of the robot kinematics moved in a manner of the motor control of voice control surgical assistant system |
CN112259101A (en) * | 2020-10-19 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Voice keyword recognition method and device, computer equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5440662A (en) * | 1992-12-11 | 1995-08-08 | At&T Corp. | Keyword/non-keyword classification in isolated word speech recognition |
US5613037A (en) * | 1993-12-21 | 1997-03-18 | Lucent Technologies Inc. | Rejection of non-digit strings for connected digit speech recognition |
CN1101025C (en) * | 1999-11-19 | 2003-02-05 | 清华大学 | Phonetic command controller |
CN1141696C (en) * | 2000-03-31 | 2004-03-10 | 清华大学 | Non-particular human speech recognition and prompt method based on special speech recognition chip |
JP3911246B2 (en) * | 2003-03-04 | 2007-05-09 | 株式会社国際電気通信基礎技術研究所 | Speech recognition apparatus and computer program |
-
2004
- 2004-09-29 CN CNB2004100667967A patent/CN1300763C/en not_active Expired - Fee Related
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101339765B (en) * | 2007-07-04 | 2011-04-13 | 黎自奋 | National language single tone recognizing method |
CN101267362B (en) * | 2008-05-16 | 2010-11-17 | 亿阳信通股份有限公司 | A dynamic identification method and its device for normal fluctuation range of performance normal value |
CN101894548A (en) * | 2010-06-23 | 2010-11-24 | 清华大学 | Modeling method and modeling device for language identification |
CN101894548B (en) * | 2010-06-23 | 2012-07-04 | 清华大学 | Modeling method and modeling device for language identification |
CN102810311A (en) * | 2011-06-01 | 2012-12-05 | 株式会社理光 | Speaker estimation method and speaker estimation equipment |
CN102810311B (en) * | 2011-06-01 | 2014-12-03 | 株式会社理光 | Speaker estimation method and speaker estimation equipment |
CN102543075A (en) * | 2012-01-12 | 2012-07-04 | 东北石油大学 | Speaker VQ-SVM (Vector Quantization-Support Vector Machine) parallel identification system based on virtual instrument technology |
CN103971685A (en) * | 2013-01-30 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and system for recognizing voice commands |
CN103971685B (en) * | 2013-01-30 | 2015-06-10 | 腾讯科技(深圳)有限公司 | Method and system for recognizing voice commands |
US9805715B2 (en) | 2013-01-30 | 2017-10-31 | Tencent Technology (Shenzhen) Company Limited | Method and system for recognizing speech commands using background and foreground acoustic models |
WO2017096778A1 (en) * | 2015-12-11 | 2017-06-15 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
CN105489222A (en) * | 2015-12-11 | 2016-04-13 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
CN105489222B (en) * | 2015-12-11 | 2018-03-09 | 百度在线网络技术(北京)有限公司 | Audio recognition method and device |
US10685647B2 (en) | 2015-12-11 | 2020-06-16 | Baidu Online Network Technology (Beijing) Co., Ltd. | Speech recognition method and device |
CN108074562A (en) * | 2016-11-11 | 2018-05-25 | 株式会社东芝 | Speech recognition equipment, audio recognition method and storage medium |
CN108074562B (en) * | 2016-11-11 | 2021-12-03 | 株式会社东芝 | Speech recognition apparatus, speech recognition method, and storage medium |
CN110234472A (en) * | 2017-01-30 | 2019-09-13 | 阿克托梅德股份有限公司 | For generating the surgical assistant system and method for the control signal of the robot kinematics moved in a manner of the motor control of voice control surgical assistant system |
CN110234472B (en) * | 2017-01-30 | 2023-01-06 | 阿克托梅德股份有限公司 | Operation assisting system and method |
CN107799126A (en) * | 2017-10-16 | 2018-03-13 | 深圳狗尾草智能科技有限公司 | Sound end detecting method and device based on Supervised machine learning |
CN107799126B (en) * | 2017-10-16 | 2020-10-16 | 苏州狗尾草智能科技有限公司 | Voice endpoint detection method and device based on supervised machine learning |
CN108281147A (en) * | 2018-03-31 | 2018-07-13 | 南京火零信息科技有限公司 | Voiceprint recognition system based on LPCC and ADTW |
CN110136749A (en) * | 2019-06-14 | 2019-08-16 | 苏州思必驰信息科技有限公司 | The relevant end-to-end speech end-point detecting method of speaker and device |
CN112259101A (en) * | 2020-10-19 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Voice keyword recognition method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN1300763C (en) | 2007-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
López-Espejo et al. | Deep spoken keyword spotting: An overview | |
CN110364143B (en) | Voice awakening method and device and intelligent electronic equipment | |
Cai et al. | A novel learnable dictionary encoding layer for end-to-end language identification | |
CN103065629A (en) | Speech recognition system of humanoid robot | |
Verma et al. | Frequency Estimation from Waveforms Using Multi-Layered Neural Networks. | |
CN1300763C (en) | Automatic sound identifying treating method for embedded sound identifying system | |
CN1013525B (en) | Real-time phonetic recognition method and device with or without function of identifying a person | |
CN111461173A (en) | Attention mechanism-based multi-speaker clustering system and method | |
CN1123862C (en) | Speech recognition special-purpose chip based speaker-dependent speech recognition and speech playback method | |
CN1141696C (en) | Non-particular human speech recognition and prompt method based on special speech recognition chip | |
CN1160450A (en) | System for recognizing spoken sounds from continuous speech and method of using same | |
CN1300049A (en) | Method and apparatus for identifying speech sound of chinese language common speech | |
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
Mistry et al. | Overview: Speech recognition technology, mel-frequency cepstral coefficients (mfcc), artificial neural network (ann) | |
US12087280B2 (en) | System and method for robust wakeword detection in presence of noise in new unseen environments without additional data | |
CN1924994A (en) | Embedded language synthetic method and system | |
Benelli et al. | A low power keyword spotting algorithm for memory constrained embedded systems | |
Espi et al. | Spectrogram patch based acoustic event detection and classification in speech overlapping conditions | |
CN116386633A (en) | Intelligent terminal equipment control method and system suitable for noise condition | |
CN115331658B (en) | Voice recognition method | |
CN1296887C (en) | Training method for embedded automatic sound identification system | |
Aggarwal et al. | Application of genetically optimized neural networks for hindi speech recognition system | |
Kotti et al. | Speaker-independent negative emotion recognition | |
Tailor et al. | Deep learning approach for spoken digit recognition in Gujarati language | |
Sharma et al. | A Natural Human-Machine Interaction via an Efficient Speech Recognition System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20070214 Termination date: 20091029 |