CN102789779A - Speech recognition system and recognition method thereof - Google Patents

Speech recognition system and recognition method thereof Download PDF

Info

Publication number
CN102789779A
CN102789779A CN201210242311XA CN201210242311A CN102789779A CN 102789779 A CN102789779 A CN 102789779A CN 201210242311X A CN201210242311X A CN 201210242311XA CN 201210242311 A CN201210242311 A CN 201210242311A CN 102789779 A CN102789779 A CN 102789779A
Authority
CN
China
Prior art keywords
voice
grouping
module
vector
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210242311XA
Other languages
Chinese (zh)
Inventor
张晶
覃本灼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Foreign Studies
Original Assignee
Guangdong University of Foreign Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Foreign Studies filed Critical Guangdong University of Foreign Studies
Priority to CN201210242311XA priority Critical patent/CN102789779A/en
Publication of CN102789779A publication Critical patent/CN102789779A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a speech recognition system and a recognition method thereof. The system comprises a speech acquiring module, a speech pretreatment module, a speech feature extraction module, a grouping judgment module and a speech recognition module, wherein the grouping judgment module is used for grouping speeches in a clustering mode; the speech acquiring module is connected with the speech pretreatment module, the speech pretreatment module is connected with the speech feature extraction module, the speech feature extraction module is connected with the grouping judgment module, and the grouping judgment module is connected with the speech recognition module; the grouping judgment module comprises a grouping judgment unit and at least two grouping models; the speech feature extraction module is connected with the grouping judgment unit, and the grouping judgment unit is respectively connected with the at least two grouping models; and the grouping models are connected with the speech recognition module.

Description

A kind of speech recognition system and recognition methods thereof
Technical field
The present invention relates to the speech recognition technology field, particularly a kind of operating system that realizes local speech identifying function based on android operating system.The invention still further relates to the audio recognition method of this speech recognition system.
Background technology
In embedded OS, realize speech identifying function, need carry out pre-service to the voice of input usually, get characteristic parameter, pattern match, output again.Wherein, pattern match takes traditional DHMM model to carry out pattern match usually, and Zhang Weiqing " research of speech recognition algorithm " provides detailed HMM.Hidden Markov model (HMM) can use five units usually to describe, and comprises 2 state sets and 3 probability matrixs, generally can use λ=(A, B, π) next succinct hidden Markov model of expression of tlv triple.Hidden Markov model is actually the expansion of standard markov model, has added the probabilistic relation between may observe state set and these states and the implicit state.And in traditional DHMM algorithm pattern matching process, respectively all templates are mated, when template number increased, time that matching process consumed and thereupon increasing, also promptly when the voice quantity that will discern greatly the time, real-time was relatively poor.
Summary of the invention
The objective of the invention is to design and under the bigger situation of voice quantity, to realize in real time, and to have the speech recognition system of higher discrimination.Another object of the present invention is to provide the audio recognition method of this translation system.
In order to realize the foregoing invention purpose; The present invention includes following technical characterictic: a kind of speech recognition system; It comprises voice acquisition module, voice pre-processing module, pronunciation extracting module, grouping determination module and sound identification module, and the grouping determination module is used for that voice are carried out cluster and divides into groups; Voice acquisition module is connected with the voice pre-processing module, and the voice pre-processing module is connected with pronunciation extracting module, and phonetic feature extracts and is connected with the grouping determination module, and the grouping determination module is connected with sound identification module; The grouping determination module comprises the grouping judging unit and is no less than 2 grouping model; Pronunciation extracting module is connected with the grouping judging unit, and the grouping judging unit is connected with the grouping model that is no less than 2 respectively; Grouping model is connected with sound identification module.
Said voice pre-processing module comprises pre-emphasis unit, branch frame processing unit, window processing unit and the end-point detection unit that connects successively; Pre-emphasis unit is connected with voice acquisition module, and the end-point detection unit is connected with pronunciation extracting module.
The present invention also comprises a kind of audio recognition method of speech recognition system, and it may further comprise the steps:
(1) voice to input carry out pre-emphasis, divide frame, the pre-service of windowing, end-point detection;
(2) extract the MFCC phonetic feature as recognition feature, generate speech characteristic parameter;
(3) the Co vector of calculating input voice through the Euclidean distance of Co vector with each grouping characteristic of correspondence parameter, is judged its affiliated classification;
(4) with whole voice of importing in the affiliated classification of voice, carry out pattern match through traditional DHMM model.This model of DHMM can directly draw matching result according to the characteristic parameter of input voice.Its decision method is: in the voice of all input DHMM models, the maximum voice of the probability of output just are matching result.
Said step (3) comprising:
A, establish voice W nThe MFCC characteristic parameter be the matrix of Nn * Mm, all provisional capitals are spliced to the first row back, make W nThe MFCC characteristic parameter characterize with the vectorial Co of row of a Nn * Mm dimension;
B, repeatedly use the K-means algorithm to carry out cluster to the corresponding Co Vector Groups of whole voice, write down each voice, classification number under each cluster is represented with a vectorial Vn of row;
C, calculate the average En of each voice corresponding row vector Vn, variances sigma n, with En and σ nProduct Pn characterize each voice;
D, use the K-means algorithm to carry out cluster, draw the voice that each classification comprises the vector formed by Pn;
E, to the corresponding Co vector of all voice in each classification Fe that averages, Fe is the grouping feature parameter.
The MFCC characteristic parameter, the Co vector, product Pn is used for characterizing voice, and just product Pn is a value, and than two of fronts, dimension is low, and data volume is little.
Vector Vn and average En thereof, variances sigma n, be some intermediate parameters, purpose is to obtain product Pn.And just can draw cluster result through Pn.
Grouping feature parameter F e is used for characterizing a classification.Utilize it to carry out the judgement of classification at cognitive phase.
Having occurred the K-means algorithm in this process twice, is in order to carry out cluster analysis, to obtain product Pn and characterize voice for the first time.Be only for the second time and ask cluster result.
The present invention is the operating system that realizes local speech identifying function on a kind of android operating system, through the voice signal that collects is anticipated, makes that system's efficient when the later stage speech recognition is higher, and recognition accuracy is also higher.The voice that will have a close acoustic feature through clustering algorithm gather and are same group, before the input voice are discerned, divide into groups under judging it earlier, only in this groupings, carry out the calculating of pattern match to importing voice then.Improve recognition accuracy through increasing redundant voice, after the interpolation redundancy, the grouping model storehouse is little, and the identifying expense is also little, has improved real-time and accuracy of identification greatly.
Description of drawings
Fig. 1 is module principle figure of the present invention;
Fig. 2 is a process flow diagram of the present invention;
Fig. 3 is that Co vector of the present invention generates figure;
Fig. 4 is the redundant procedure chart that adds of the present invention.
Embodiment
The present invention is the operating system that realizes local speech identifying function on a kind of android operating system, through the voice signal that collects is anticipated, makes that system's efficient when the later stage speech recognition is higher, and recognition accuracy is also higher.In traditional DHMM algorithm pattern matching process, take to travel through all hmm templates, mate (hidden Markov model (HMM) can use five units usually to describe, and comprises 2 state sets and 3 probability matrixs) respectively:
1. implicit state S
Satisfying Markov property between these states, is the actual state that is implied in the Markov model.These states can't obtain through direct observation usually.(for example S1, S2, S3 or the like)
2. may observe state O
In model, be associated, can obtain through direct observation with implicit state.(it is consistent with the number of implicit state that for example O1, O2, O3 or the like, the number of may observe state not necessarily want.)
3. original state probability matrix π
The implicit state of expression is at the probability matrix of initial time t=1, (for example during t=1, P (S1)=p1, P (S2)=P2, P (S3)=p3, then original state probability matrix π=[p1 p2 p3].
4. implicit state transition probability matrix A.
Transition probability between each state in the HMM model has been described.Aij=P (Sj|Si) wherein, 1≤i,, j≤N., be illustrated in t constantly, state is under the condition of Si, t+1 constantly state be the probability of Sj.
5. observer state transition probability matrix B
Make the implicit state number of N representative, M represents the may observe state number, and then: Bij=P (Oi|Sj), 1≤i≤M, 1≤j≤N. are illustrated in the t moment, implicit state is under the Sj condition, and observation state is the probability of Oi.
General, can use λ=(A, B, π) next succinct hidden Markov model of expression of tlv triple.Hidden Markov model is actually the expansion of standard markov model, has added the probabilistic relation between may observe state set and these states and the implicit state.When the voice quantity that will discern was big, real-time was relatively poor.
Module principle figure of the present invention is as shown in Figure 1; Gather the voice signal of input through voice acquisition module 1; Carry out pre-emphasis through 2 pairs of voice signals of voice pre-processing module, divide frame, windowing; Processing such as end-point detection, what realize above-mentioned processing capacity is pre-emphasis unit 21, branch frame processing unit 22, window processing unit 23 and end-point detection unit 24.Carry out feature extraction through 3 pairs of voice messagings of pronunciation extracting module then, through determination module voice are carried out cluster and divide into groups, again output.
Respectively each modular unit that relates to is described below:
1. pre-service
Pre-service mainly comprises pre-emphasis, divides frame, windowing, end-point detection.
1.1 pre-emphasis
In the pre-emphasis process, input signal is moved suitable frequency range through wave filter.
Transport function is: H (z)=1-0.9375z -1
The signal that obtains is: S ~ ( n ) = S ( n ) - 0.9375 S ( n - 1 )
1.2 dividing frame handles
Voice signal is a transient change, but is metastable in 10~20ms, so can regard the signal of this section in the relatively stable time as a base unit---frame.
1.3 window processing:
To the end-on error of LPC coefficient, we have adopted Hamming window function carry out windowization during for fear of rectangular window.That is:
Figure BDA00001881727900052
0≤n≤N-1.
Wherein: w ( n ) = 0.54 - 0.46 Cos ( 2 π n N - 1 ) , 0≤n≤N-1.
1.4 end-point detection
The end-point detection purpose is to detect the existence that has or not voice signal, promptly from a segment signal that comprises voice, determines the starting point and the terminating point of voice.Effectively end point detects and can not only make the processing time reduce to minimum, and can get rid of the noise of unvoiced segments, thereby makes recognition system have good recognition performance.Common method is through two coefficients: the short-time zero-crossing rate of signal and short-time energy, detect end points.The formula of two coefficients is following:
Short-time energy: e ( i ) = Σ n = 1 N | x i ( n ) |
Short-time zero-crossing rate: ZCR ( i ) = Σ n = 1 N - 1 | x i ( n ) - x i ( n + 1 ) |
2. characteristic parameter extraction
What adopt is the MFCC characteristic parameter.Its calculation process is roughly following:
1. signal is carried out Fast Fourier Transform (FFT) and obtain energy frequency spectrum.
2. the energy frequency spectrum energy multiply by one group of n V-belt bandpass filter, try to achieve the logarithm energy (Log Energy) of each wave filter output, n altogether.This n V-belt bandpass filter is evenly distributed on " Mei Er frequency " (Mel Frequency), and the relational expression of Mei Er frequency and general frequency f is: mel (f)=2595*log10 (1+f/700).
3. discrete cosine transform (Discrete cosine transform, or DCT).Bring above-mentioned n logarithm energy E k into discrete cosine transform, obtain the Mel-scale Cepstrum parameter on L rank, L gets 12 usually here.The discrete cosine transform formula is following:
Cm=Sk=1Ncos[m*(k-0.5)*p/N]*Ek,m=1,2,...,L
Wherein Ek is the triangular filter that calculated by previous step and the inner product value of spectrum energy, and N is the number of triangular filter here.
4. logarithm energy (Log energy).The energy of a sound frame; It also is the key character of voice; Therefore we add the logarithm energy (be defined as the quadratic sum of signal in the sound frame, get the denary logarithm value again, multiply by 10 again) of a sound frame usually; Make the basic phonetic feature of each sound frame that 13 dimensions just arranged, comprised 1 logarithm energy and 12 cepstrum parameters.
5. residual quantity cepstrum parameter (Delta cepstrum).Though obtained 13 characteristic parameters, yet when being applied to speech recognition, we can add residual quantity cepstrum parameter usually, to show that the cepstrum parameter is to change of time.Its meaning is the cepstrum parameter with respect to the slope of time, just represents the dynamic change in time of cepstrum parameter, and formula is following:
△Cm(t)=[St=-MMCm(t+t)t]/[St=-MMt2]
3. the generation step of packets of voice and grouping feature parameter
(1) carrying out the voice cluster divides into groups
If voice W nThe MFCC characteristic parameter be the matrix of Nn * Mm.If all provisional capitals are spliced to first row back, the W so nJust can use the capable vector of a Nn * Mm dimension to characterize, this vector we and be referred to as the Co vector.As shown in Figure 3.
Then, n the Co vector corresponding to n voice uses the K-means algorithm to carry out cluster, will obtain the affiliated classification of each voice and the cluster centre of each classification.But because when using the K-means algorithm to carry out cluster, it is very big related that its cluster result and initial cluster center have, so only can not be as the grouping feature parameter with a cluster gained cluster centre.We should use different initial cluster center as much as possible to carry out cluster, then these a large amount of different cluster results are analyzed, thereby are drawn final cluster result.Each classification is compiled sequence number, so, belong to the voice of same classification, the average classification number under them is close, and variation of type numbering also will be consistent under it.Utilize this point, the present invention uses following method to analyze:
After carrying out m cluster, write down each voice classification number under the cluster each time, and represent with a vectorial Vn of row.Vn has characterized the situation of the affiliated classifications of these voice.Average classification number representes that with the average En of Vn the variation of affiliated classification is with the standard deviation sigma of Vn nRepresent, then can characterize the situation of the affiliated grouping of these voice with the product Pn of En and Vn.So, voice just can characterize its grouping situation with a numerical value Pn.So far, voice W nFrom representing to be converted into a product Pn and represent, a n dimension has been gone the clustering problem of vector thereby the clustering problem of n voice just is converted into a Nn * capable vector of Mm dimension.
A lot of clustering methods are little for this data volume, and the vector of simply going is all suitable.For ease, system has still adopted the kmeans algorithm that this row vector is carried out cluster when realizing.
(2) generate the grouping feature parameter
According to the cluster result that generates in the step (1), the Co vector that all voice are corresponding in each classification to be averaged, this value is grouping feature parameter F e.
(3) Euclidean distance of calculating Co vector and each grouping corresponding packet characteristic parameter Fe is based on the grouping under the distance size judgement input voice that draw.Be the judgement classification apart from reckling.
Because the accuracy of dividing into groups has direct influence to the accuracy of speech recognition, so the grouping accuracy must be enough high.Adopting following method is that respective classes increases redundant voice: the sample that collects is input in the system, judges classification under it through the grouping feature parameter.If correct judgment is then imported next sample, otherwise in such, add the corresponding voice of current sample.Like Fig. 4.
After adding redundancy, the voice quantity that each classification comprised will increase.If the voice number that classification comprises is K P, all the voice number is n, the packets of voice number is m,
Figure BDA00001881727900081
α is more little, and then the expense of identifying is just more little.Existing technology then is to be defined as an object to each coefficient, and normal with other objects inheritance is arranged, thereby when making to the template sequence preservation, the file of generation is very big, is about five times of template file of the present invention's preservation.

Claims (4)

1. speech recognition system is characterized in that: comprise voice acquisition module, voice pre-processing module, pronunciation extracting module, grouping determination module and sound identification module, the grouping determination module is used for that voice are carried out cluster and divides into groups; Voice acquisition module is connected with the voice pre-processing module, and the voice pre-processing module is connected with pronunciation extracting module, and phonetic feature extracts and is connected with the grouping determination module, and the grouping determination module is connected with sound identification module; The grouping determination module comprises the grouping judging unit and is no less than 2 grouping model; Pronunciation extracting module is connected with the grouping judging unit, and the grouping judging unit is connected with the grouping model that is no less than 2 respectively; Grouping model is connected with sound identification module.
2. speech recognition system according to claim 1 is characterized in that: said voice pre-processing module comprises pre-emphasis unit, branch frame processing unit, window processing unit and the end-point detection unit that connects successively; Pre-emphasis unit is connected with voice acquisition module, and the end-point detection unit is connected with pronunciation extracting module.
3. the audio recognition method of a speech recognition system according to claim 2 is characterized in that, may further comprise the steps:
(1) voice to input carry out pre-emphasis, divide frame, the pre-service of windowing, end-point detection;
(2) extract the MFCC phonetic feature as recognition feature, generate speech characteristic parameter;
(3) the Co vector of calculating input voice through the Euclidean distance of Co vector with each grouping characteristic of correspondence parameter, is judged its affiliated classification;
(4) with whole voice of importing in the affiliated classification of voice, carry out pattern match through traditional DHMM model.
4. audio recognition method according to claim 3 is characterized in that: said step (3) comprising:
A, establish voice W nThe MFCC characteristic parameter be the matrix of Nn * Mm, all provisional capitals are spliced to the first row back, make W nThe MFCC characteristic parameter characterize with the vectorial Co of row of a Nn * Mm dimension;
B, repeatedly use the K-means algorithm to carry out cluster to the corresponding Co Vector Groups of whole voice, write down each voice, classification number under each cluster is represented with a vectorial Vn of row;
C, calculate the average En of each voice corresponding row vector Vn, variances sigma n, with En and σ nProduct Pn characterize each voice;
D, use the K-means algorithm to carry out cluster, draw the voice that each classification comprises the vector formed by Pn;
E, to the corresponding Co vector of all voice in each classification Fe that averages, Fe is the grouping feature parameter.
CN201210242311XA 2012-07-12 2012-07-12 Speech recognition system and recognition method thereof Pending CN102789779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210242311XA CN102789779A (en) 2012-07-12 2012-07-12 Speech recognition system and recognition method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210242311XA CN102789779A (en) 2012-07-12 2012-07-12 Speech recognition system and recognition method thereof

Publications (1)

Publication Number Publication Date
CN102789779A true CN102789779A (en) 2012-11-21

Family

ID=47155166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210242311XA Pending CN102789779A (en) 2012-07-12 2012-07-12 Speech recognition system and recognition method thereof

Country Status (1)

Country Link
CN (1) CN102789779A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732972A (en) * 2015-03-12 2015-06-24 广东外语外贸大学 HMM voiceprint recognition signing-in method and system based on grouping statistics
CN105913840A (en) * 2016-06-20 2016-08-31 西可通信技术设备(河源)有限公司 Speech recognition device and mobile terminal
CN105931637A (en) * 2016-04-01 2016-09-07 金陵科技学院 User-defined instruction recognition speech photographing system
CN106448657A (en) * 2016-10-26 2017-02-22 安徽省云逸智能科技有限公司 Continuous speech recognition system for restaurant robot servant
CN106531158A (en) * 2016-11-30 2017-03-22 北京理工大学 Method and device for recognizing answer voice
CN106782550A (en) * 2016-11-28 2017-05-31 黑龙江八农垦大学 A kind of automatic speech recognition system based on dsp chip
CN107773982A (en) * 2017-10-20 2018-03-09 科大讯飞股份有限公司 Game voice interactive method and device
CN107910020A (en) * 2017-10-24 2018-04-13 深圳和而泰智能控制股份有限公司 Sound of snoring detection method, device, equipment and storage medium
CN108536304A (en) * 2018-06-25 2018-09-14 广州市锐尚展柜制作有限公司 A kind of multi-modal interactive device of smart home
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN109255106A (en) * 2017-07-13 2019-01-22 Tcl集团股份有限公司 A kind of text handling method and terminal
CN116189671A (en) * 2023-04-27 2023-05-30 凌语国际文化艺术传播股份有限公司 Data mining method and system for language teaching

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007138875A1 (en) * 2006-05-31 2007-12-06 Nec Corporation Speech recognition word dictionary/language model making system, method, and program, and speech recognition system
CN102237083A (en) * 2010-04-23 2011-11-09 广东外语外贸大学 Portable interpretation system based on WinCE platform and language recognition method thereof
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN102436809A (en) * 2011-10-21 2012-05-02 东南大学 Network speech recognition method in English oral language machine examination system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007138875A1 (en) * 2006-05-31 2007-12-06 Nec Corporation Speech recognition word dictionary/language model making system, method, and program, and speech recognition system
CN102237083A (en) * 2010-04-23 2011-11-09 广东外语外贸大学 Portable interpretation system based on WinCE platform and language recognition method thereof
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN102436809A (en) * 2011-10-21 2012-05-02 东南大学 Network speech recognition method in English oral language machine examination system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈晓霖: "基于隐马尔可夫模型的语音识别方法的研究", 《中国优秀硕士学位论文全文数据库》 *
高清伦等: "基于离散隐马尔科夫模型的语音识别技术", 《河北省科学院学报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732972B (en) * 2015-03-12 2018-02-27 广东外语外贸大学 A kind of HMM Application on Voiceprint Recognition based on classified statistics is registered method and system
CN104732972A (en) * 2015-03-12 2015-06-24 广东外语外贸大学 HMM voiceprint recognition signing-in method and system based on grouping statistics
CN105931637A (en) * 2016-04-01 2016-09-07 金陵科技学院 User-defined instruction recognition speech photographing system
CN105913840A (en) * 2016-06-20 2016-08-31 西可通信技术设备(河源)有限公司 Speech recognition device and mobile terminal
CN106448657A (en) * 2016-10-26 2017-02-22 安徽省云逸智能科技有限公司 Continuous speech recognition system for restaurant robot servant
CN106782550A (en) * 2016-11-28 2017-05-31 黑龙江八农垦大学 A kind of automatic speech recognition system based on dsp chip
CN106531158A (en) * 2016-11-30 2017-03-22 北京理工大学 Method and device for recognizing answer voice
CN109255106A (en) * 2017-07-13 2019-01-22 Tcl集团股份有限公司 A kind of text handling method and terminal
CN107773982A (en) * 2017-10-20 2018-03-09 科大讯飞股份有限公司 Game voice interactive method and device
CN107910020A (en) * 2017-10-24 2018-04-13 深圳和而泰智能控制股份有限公司 Sound of snoring detection method, device, equipment and storage medium
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN108922543B (en) * 2018-06-11 2022-08-16 平安科技(深圳)有限公司 Model base establishing method, voice recognition method, device, equipment and medium
CN108536304A (en) * 2018-06-25 2018-09-14 广州市锐尚展柜制作有限公司 A kind of multi-modal interactive device of smart home
CN116189671A (en) * 2023-04-27 2023-05-30 凌语国际文化艺术传播股份有限公司 Data mining method and system for language teaching

Similar Documents

Publication Publication Date Title
CN102789779A (en) Speech recognition system and recognition method thereof
CN107680582B (en) Acoustic model training method, voice recognition method, device, equipment and medium
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN101980336B (en) Hidden Markov model-based vehicle sound identification method
CN101136199B (en) Voice data processing method and equipment
CN102968990B (en) Speaker identifying method and system
CN103310789B (en) A kind of sound event recognition method of the parallel model combination based on improving
Chavan et al. An overview of speech recognition using HMM
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN105206270A (en) Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM)
CN103794207A (en) Dual-mode voice identity recognition method
CN102024455A (en) Speaker recognition system and method
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
Das et al. Bangladeshi dialect recognition using Mel frequency cepstral coefficient, delta, delta-delta and Gaussian mixture model
Deshmukh et al. Speech based emotion recognition using machine learning
CN109192200A (en) A kind of audio recognition method
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
CN104732972A (en) HMM voiceprint recognition signing-in method and system based on grouping statistics
CN110827844A (en) Noise classification method based on BP network
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network
Shivakumar et al. Simplified and supervised i-vector modeling for speaker age regression
Ye et al. Phoneme classification using naive bayes classifier in reconstructed phase space
Prazak et al. Speaker diarization using PLDA-based speaker clustering
Nyodu et al. Automatic identification of Arunachal language using K-nearest neighbor algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121121