CN102789779A - Speech recognition system and recognition method thereof - Google Patents
Speech recognition system and recognition method thereof Download PDFInfo
- Publication number
- CN102789779A CN102789779A CN201210242311XA CN201210242311A CN102789779A CN 102789779 A CN102789779 A CN 102789779A CN 201210242311X A CN201210242311X A CN 201210242311XA CN 201210242311 A CN201210242311 A CN 201210242311A CN 102789779 A CN102789779 A CN 102789779A
- Authority
- CN
- China
- Prior art keywords
- voice
- grouping
- module
- vector
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a speech recognition system and a recognition method thereof. The system comprises a speech acquiring module, a speech pretreatment module, a speech feature extraction module, a grouping judgment module and a speech recognition module, wherein the grouping judgment module is used for grouping speeches in a clustering mode; the speech acquiring module is connected with the speech pretreatment module, the speech pretreatment module is connected with the speech feature extraction module, the speech feature extraction module is connected with the grouping judgment module, and the grouping judgment module is connected with the speech recognition module; the grouping judgment module comprises a grouping judgment unit and at least two grouping models; the speech feature extraction module is connected with the grouping judgment unit, and the grouping judgment unit is respectively connected with the at least two grouping models; and the grouping models are connected with the speech recognition module.
Description
Technical field
The present invention relates to the speech recognition technology field, particularly a kind of operating system that realizes local speech identifying function based on android operating system.The invention still further relates to the audio recognition method of this speech recognition system.
Background technology
In embedded OS, realize speech identifying function, need carry out pre-service to the voice of input usually, get characteristic parameter, pattern match, output again.Wherein, pattern match takes traditional DHMM model to carry out pattern match usually, and Zhang Weiqing " research of speech recognition algorithm " provides detailed HMM.Hidden Markov model (HMM) can use five units usually to describe, and comprises 2 state sets and 3 probability matrixs, generally can use λ=(A, B, π) next succinct hidden Markov model of expression of tlv triple.Hidden Markov model is actually the expansion of standard markov model, has added the probabilistic relation between may observe state set and these states and the implicit state.And in traditional DHMM algorithm pattern matching process, respectively all templates are mated, when template number increased, time that matching process consumed and thereupon increasing, also promptly when the voice quantity that will discern greatly the time, real-time was relatively poor.
Summary of the invention
The objective of the invention is to design and under the bigger situation of voice quantity, to realize in real time, and to have the speech recognition system of higher discrimination.Another object of the present invention is to provide the audio recognition method of this translation system.
In order to realize the foregoing invention purpose; The present invention includes following technical characterictic: a kind of speech recognition system; It comprises voice acquisition module, voice pre-processing module, pronunciation extracting module, grouping determination module and sound identification module, and the grouping determination module is used for that voice are carried out cluster and divides into groups; Voice acquisition module is connected with the voice pre-processing module, and the voice pre-processing module is connected with pronunciation extracting module, and phonetic feature extracts and is connected with the grouping determination module, and the grouping determination module is connected with sound identification module; The grouping determination module comprises the grouping judging unit and is no less than 2 grouping model; Pronunciation extracting module is connected with the grouping judging unit, and the grouping judging unit is connected with the grouping model that is no less than 2 respectively; Grouping model is connected with sound identification module.
Said voice pre-processing module comprises pre-emphasis unit, branch frame processing unit, window processing unit and the end-point detection unit that connects successively; Pre-emphasis unit is connected with voice acquisition module, and the end-point detection unit is connected with pronunciation extracting module.
The present invention also comprises a kind of audio recognition method of speech recognition system, and it may further comprise the steps:
(1) voice to input carry out pre-emphasis, divide frame, the pre-service of windowing, end-point detection;
(2) extract the MFCC phonetic feature as recognition feature, generate speech characteristic parameter;
(3) the Co vector of calculating input voice through the Euclidean distance of Co vector with each grouping characteristic of correspondence parameter, is judged its affiliated classification;
(4) with whole voice of importing in the affiliated classification of voice, carry out pattern match through traditional DHMM model.This model of DHMM can directly draw matching result according to the characteristic parameter of input voice.Its decision method is: in the voice of all input DHMM models, the maximum voice of the probability of output just are matching result.
Said step (3) comprising:
A, establish voice W
nThe MFCC characteristic parameter be the matrix of Nn * Mm, all provisional capitals are spliced to the first row back, make W
nThe MFCC characteristic parameter characterize with the vectorial Co of row of a Nn * Mm dimension;
B, repeatedly use the K-means algorithm to carry out cluster to the corresponding Co Vector Groups of whole voice, write down each voice, classification number under each cluster is represented with a vectorial Vn of row;
C, calculate the average En of each voice corresponding row vector Vn, variances sigma
n, with En and σ
nProduct Pn characterize each voice;
D, use the K-means algorithm to carry out cluster, draw the voice that each classification comprises the vector formed by Pn;
E, to the corresponding Co vector of all voice in each classification Fe that averages, Fe is the grouping feature parameter.
The MFCC characteristic parameter, the Co vector, product Pn is used for characterizing voice, and just product Pn is a value, and than two of fronts, dimension is low, and data volume is little.
Vector Vn and average En thereof, variances sigma
n, be some intermediate parameters, purpose is to obtain product Pn.And just can draw cluster result through Pn.
Grouping feature parameter F e is used for characterizing a classification.Utilize it to carry out the judgement of classification at cognitive phase.
Having occurred the K-means algorithm in this process twice, is in order to carry out cluster analysis, to obtain product Pn and characterize voice for the first time.Be only for the second time and ask cluster result.
The present invention is the operating system that realizes local speech identifying function on a kind of android operating system, through the voice signal that collects is anticipated, makes that system's efficient when the later stage speech recognition is higher, and recognition accuracy is also higher.The voice that will have a close acoustic feature through clustering algorithm gather and are same group, before the input voice are discerned, divide into groups under judging it earlier, only in this groupings, carry out the calculating of pattern match to importing voice then.Improve recognition accuracy through increasing redundant voice, after the interpolation redundancy, the grouping model storehouse is little, and the identifying expense is also little, has improved real-time and accuracy of identification greatly.
Description of drawings
Fig. 1 is module principle figure of the present invention;
Fig. 2 is a process flow diagram of the present invention;
Fig. 3 is that Co vector of the present invention generates figure;
Fig. 4 is the redundant procedure chart that adds of the present invention.
Embodiment
The present invention is the operating system that realizes local speech identifying function on a kind of android operating system, through the voice signal that collects is anticipated, makes that system's efficient when the later stage speech recognition is higher, and recognition accuracy is also higher.In traditional DHMM algorithm pattern matching process, take to travel through all hmm templates, mate (hidden Markov model (HMM) can use five units usually to describe, and comprises 2 state sets and 3 probability matrixs) respectively:
1. implicit state S
Satisfying Markov property between these states, is the actual state that is implied in the Markov model.These states can't obtain through direct observation usually.(for example S1, S2, S3 or the like)
2. may observe state O
In model, be associated, can obtain through direct observation with implicit state.(it is consistent with the number of implicit state that for example O1, O2, O3 or the like, the number of may observe state not necessarily want.)
3. original state probability matrix π
The implicit state of expression is at the probability matrix of initial time t=1, (for example during t=1, P (S1)=p1, P (S2)=P2, P (S3)=p3, then original state probability matrix π=[p1 p2 p3].
4. implicit state transition probability matrix A.
Transition probability between each state in the HMM model has been described.Aij=P (Sj|Si) wherein, 1≤i,, j≤N., be illustrated in t constantly, state is under the condition of Si, t+1 constantly state be the probability of Sj.
5. observer state transition probability matrix B
Make the implicit state number of N representative, M represents the may observe state number, and then: Bij=P (Oi|Sj), 1≤i≤M, 1≤j≤N. are illustrated in the t moment, implicit state is under the Sj condition, and observation state is the probability of Oi.
General, can use λ=(A, B, π) next succinct hidden Markov model of expression of tlv triple.Hidden Markov model is actually the expansion of standard markov model, has added the probabilistic relation between may observe state set and these states and the implicit state.When the voice quantity that will discern was big, real-time was relatively poor.
Module principle figure of the present invention is as shown in Figure 1; Gather the voice signal of input through voice acquisition module 1; Carry out pre-emphasis through 2 pairs of voice signals of voice pre-processing module, divide frame, windowing; Processing such as end-point detection, what realize above-mentioned processing capacity is pre-emphasis unit 21, branch frame processing unit 22, window processing unit 23 and end-point detection unit 24.Carry out feature extraction through 3 pairs of voice messagings of pronunciation extracting module then, through determination module voice are carried out cluster and divide into groups, again output.
Respectively each modular unit that relates to is described below:
1. pre-service
Pre-service mainly comprises pre-emphasis, divides frame, windowing, end-point detection.
1.1 pre-emphasis
In the pre-emphasis process, input signal is moved suitable frequency range through wave filter.
Transport function is: H (z)=1-0.9375z
-1
The signal that obtains is:
1.2 dividing frame handles
Voice signal is a transient change, but is metastable in 10~20ms, so can regard the signal of this section in the relatively stable time as a base unit---frame.
1.3 window processing:
To the end-on error of LPC coefficient, we have adopted Hamming window function carry out windowization during for fear of rectangular window.That is:
0≤n≤N-1.
Wherein:
0≤n≤N-1.
1.4 end-point detection
The end-point detection purpose is to detect the existence that has or not voice signal, promptly from a segment signal that comprises voice, determines the starting point and the terminating point of voice.Effectively end point detects and can not only make the processing time reduce to minimum, and can get rid of the noise of unvoiced segments, thereby makes recognition system have good recognition performance.Common method is through two coefficients: the short-time zero-crossing rate of signal and short-time energy, detect end points.The formula of two coefficients is following:
Short-time energy:
Short-time zero-crossing rate:
2. characteristic parameter extraction
What adopt is the MFCC characteristic parameter.Its calculation process is roughly following:
1. signal is carried out Fast Fourier Transform (FFT) and obtain energy frequency spectrum.
2. the energy frequency spectrum energy multiply by one group of n V-belt bandpass filter, try to achieve the logarithm energy (Log Energy) of each wave filter output, n altogether.This n V-belt bandpass filter is evenly distributed on " Mei Er frequency " (Mel Frequency), and the relational expression of Mei Er frequency and general frequency f is: mel (f)=2595*log10 (1+f/700).
3. discrete cosine transform (Discrete cosine transform, or DCT).Bring above-mentioned n logarithm energy E k into discrete cosine transform, obtain the Mel-scale Cepstrum parameter on L rank, L gets 12 usually here.The discrete cosine transform formula is following:
Cm=Sk=1Ncos[m*(k-0.5)*p/N]*Ek,m=1,2,...,L
Wherein Ek is the triangular filter that calculated by previous step and the inner product value of spectrum energy, and N is the number of triangular filter here.
4. logarithm energy (Log energy).The energy of a sound frame; It also is the key character of voice; Therefore we add the logarithm energy (be defined as the quadratic sum of signal in the sound frame, get the denary logarithm value again, multiply by 10 again) of a sound frame usually; Make the basic phonetic feature of each sound frame that 13 dimensions just arranged, comprised 1 logarithm energy and 12 cepstrum parameters.
5. residual quantity cepstrum parameter (Delta cepstrum).Though obtained 13 characteristic parameters, yet when being applied to speech recognition, we can add residual quantity cepstrum parameter usually, to show that the cepstrum parameter is to change of time.Its meaning is the cepstrum parameter with respect to the slope of time, just represents the dynamic change in time of cepstrum parameter, and formula is following:
△Cm(t)=[St=-MMCm(t+t)t]/[St=-MMt2]
3. the generation step of packets of voice and grouping feature parameter
(1) carrying out the voice cluster divides into groups
If voice W
nThe MFCC characteristic parameter be the matrix of Nn * Mm.If all provisional capitals are spliced to first row back, the W so
nJust can use the capable vector of a Nn * Mm dimension to characterize, this vector we and be referred to as the Co vector.As shown in Figure 3.
Then, n the Co vector corresponding to n voice uses the K-means algorithm to carry out cluster, will obtain the affiliated classification of each voice and the cluster centre of each classification.But because when using the K-means algorithm to carry out cluster, it is very big related that its cluster result and initial cluster center have, so only can not be as the grouping feature parameter with a cluster gained cluster centre.We should use different initial cluster center as much as possible to carry out cluster, then these a large amount of different cluster results are analyzed, thereby are drawn final cluster result.Each classification is compiled sequence number, so, belong to the voice of same classification, the average classification number under them is close, and variation of type numbering also will be consistent under it.Utilize this point, the present invention uses following method to analyze:
After carrying out m cluster, write down each voice classification number under the cluster each time, and represent with a vectorial Vn of row.Vn has characterized the situation of the affiliated classifications of these voice.Average classification number representes that with the average En of Vn the variation of affiliated classification is with the standard deviation sigma of Vn
nRepresent, then can characterize the situation of the affiliated grouping of these voice with the product Pn of En and Vn.So, voice just can characterize its grouping situation with a numerical value Pn.So far, voice W
nFrom representing to be converted into a product Pn and represent, a n dimension has been gone the clustering problem of vector thereby the clustering problem of n voice just is converted into a Nn * capable vector of Mm dimension.
A lot of clustering methods are little for this data volume, and the vector of simply going is all suitable.For ease, system has still adopted the kmeans algorithm that this row vector is carried out cluster when realizing.
(2) generate the grouping feature parameter
According to the cluster result that generates in the step (1), the Co vector that all voice are corresponding in each classification to be averaged, this value is grouping feature parameter F e.
(3) Euclidean distance of calculating Co vector and each grouping corresponding packet characteristic parameter Fe is based on the grouping under the distance size judgement input voice that draw.Be the judgement classification apart from reckling.
Because the accuracy of dividing into groups has direct influence to the accuracy of speech recognition, so the grouping accuracy must be enough high.Adopting following method is that respective classes increases redundant voice: the sample that collects is input in the system, judges classification under it through the grouping feature parameter.If correct judgment is then imported next sample, otherwise in such, add the corresponding voice of current sample.Like Fig. 4.
After adding redundancy, the voice quantity that each classification comprised will increase.If the voice number that classification comprises is K
P, all the voice number is n, the packets of voice number is m,
α is more little, and then the expense of identifying is just more little.Existing technology then is to be defined as an object to each coefficient, and normal with other objects inheritance is arranged, thereby when making to the template sequence preservation, the file of generation is very big, is about five times of template file of the present invention's preservation.
Claims (4)
1. speech recognition system is characterized in that: comprise voice acquisition module, voice pre-processing module, pronunciation extracting module, grouping determination module and sound identification module, the grouping determination module is used for that voice are carried out cluster and divides into groups; Voice acquisition module is connected with the voice pre-processing module, and the voice pre-processing module is connected with pronunciation extracting module, and phonetic feature extracts and is connected with the grouping determination module, and the grouping determination module is connected with sound identification module; The grouping determination module comprises the grouping judging unit and is no less than 2 grouping model; Pronunciation extracting module is connected with the grouping judging unit, and the grouping judging unit is connected with the grouping model that is no less than 2 respectively; Grouping model is connected with sound identification module.
2. speech recognition system according to claim 1 is characterized in that: said voice pre-processing module comprises pre-emphasis unit, branch frame processing unit, window processing unit and the end-point detection unit that connects successively; Pre-emphasis unit is connected with voice acquisition module, and the end-point detection unit is connected with pronunciation extracting module.
3. the audio recognition method of a speech recognition system according to claim 2 is characterized in that, may further comprise the steps:
(1) voice to input carry out pre-emphasis, divide frame, the pre-service of windowing, end-point detection;
(2) extract the MFCC phonetic feature as recognition feature, generate speech characteristic parameter;
(3) the Co vector of calculating input voice through the Euclidean distance of Co vector with each grouping characteristic of correspondence parameter, is judged its affiliated classification;
(4) with whole voice of importing in the affiliated classification of voice, carry out pattern match through traditional DHMM model.
4. audio recognition method according to claim 3 is characterized in that: said step (3) comprising:
A, establish voice W
nThe MFCC characteristic parameter be the matrix of Nn * Mm, all provisional capitals are spliced to the first row back, make W
nThe MFCC characteristic parameter characterize with the vectorial Co of row of a Nn * Mm dimension;
B, repeatedly use the K-means algorithm to carry out cluster to the corresponding Co Vector Groups of whole voice, write down each voice, classification number under each cluster is represented with a vectorial Vn of row;
C, calculate the average En of each voice corresponding row vector Vn, variances sigma
n, with En and σ
nProduct Pn characterize each voice;
D, use the K-means algorithm to carry out cluster, draw the voice that each classification comprises the vector formed by Pn;
E, to the corresponding Co vector of all voice in each classification Fe that averages, Fe is the grouping feature parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210242311XA CN102789779A (en) | 2012-07-12 | 2012-07-12 | Speech recognition system and recognition method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210242311XA CN102789779A (en) | 2012-07-12 | 2012-07-12 | Speech recognition system and recognition method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102789779A true CN102789779A (en) | 2012-11-21 |
Family
ID=47155166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210242311XA Pending CN102789779A (en) | 2012-07-12 | 2012-07-12 | Speech recognition system and recognition method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102789779A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104732972A (en) * | 2015-03-12 | 2015-06-24 | 广东外语外贸大学 | HMM voiceprint recognition signing-in method and system based on grouping statistics |
CN105913840A (en) * | 2016-06-20 | 2016-08-31 | 西可通信技术设备(河源)有限公司 | Speech recognition device and mobile terminal |
CN105931637A (en) * | 2016-04-01 | 2016-09-07 | 金陵科技学院 | User-defined instruction recognition speech photographing system |
CN106448657A (en) * | 2016-10-26 | 2017-02-22 | 安徽省云逸智能科技有限公司 | Continuous speech recognition system for restaurant robot servant |
CN106531158A (en) * | 2016-11-30 | 2017-03-22 | 北京理工大学 | Method and device for recognizing answer voice |
CN106782550A (en) * | 2016-11-28 | 2017-05-31 | 黑龙江八农垦大学 | A kind of automatic speech recognition system based on dsp chip |
CN107773982A (en) * | 2017-10-20 | 2018-03-09 | 科大讯飞股份有限公司 | Game voice interactive method and device |
CN107910020A (en) * | 2017-10-24 | 2018-04-13 | 深圳和而泰智能控制股份有限公司 | Sound of snoring detection method, device, equipment and storage medium |
CN108536304A (en) * | 2018-06-25 | 2018-09-14 | 广州市锐尚展柜制作有限公司 | A kind of multi-modal interactive device of smart home |
CN108922543A (en) * | 2018-06-11 | 2018-11-30 | 平安科技(深圳)有限公司 | Model library method for building up, audio recognition method, device, equipment and medium |
CN109255106A (en) * | 2017-07-13 | 2019-01-22 | Tcl集团股份有限公司 | A kind of text handling method and terminal |
CN116189671A (en) * | 2023-04-27 | 2023-05-30 | 凌语国际文化艺术传播股份有限公司 | Data mining method and system for language teaching |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007138875A1 (en) * | 2006-05-31 | 2007-12-06 | Nec Corporation | Speech recognition word dictionary/language model making system, method, and program, and speech recognition system |
CN102237083A (en) * | 2010-04-23 | 2011-11-09 | 广东外语外贸大学 | Portable interpretation system based on WinCE platform and language recognition method thereof |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN102436809A (en) * | 2011-10-21 | 2012-05-02 | 东南大学 | Network speech recognition method in English oral language machine examination system |
-
2012
- 2012-07-12 CN CN201210242311XA patent/CN102789779A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007138875A1 (en) * | 2006-05-31 | 2007-12-06 | Nec Corporation | Speech recognition word dictionary/language model making system, method, and program, and speech recognition system |
CN102237083A (en) * | 2010-04-23 | 2011-11-09 | 广东外语外贸大学 | Portable interpretation system based on WinCE platform and language recognition method thereof |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN102436809A (en) * | 2011-10-21 | 2012-05-02 | 东南大学 | Network speech recognition method in English oral language machine examination system |
Non-Patent Citations (2)
Title |
---|
陈晓霖: "基于隐马尔可夫模型的语音识别方法的研究", 《中国优秀硕士学位论文全文数据库》 * |
高清伦等: "基于离散隐马尔科夫模型的语音识别技术", 《河北省科学院学报》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104732972B (en) * | 2015-03-12 | 2018-02-27 | 广东外语外贸大学 | A kind of HMM Application on Voiceprint Recognition based on classified statistics is registered method and system |
CN104732972A (en) * | 2015-03-12 | 2015-06-24 | 广东外语外贸大学 | HMM voiceprint recognition signing-in method and system based on grouping statistics |
CN105931637A (en) * | 2016-04-01 | 2016-09-07 | 金陵科技学院 | User-defined instruction recognition speech photographing system |
CN105913840A (en) * | 2016-06-20 | 2016-08-31 | 西可通信技术设备(河源)有限公司 | Speech recognition device and mobile terminal |
CN106448657A (en) * | 2016-10-26 | 2017-02-22 | 安徽省云逸智能科技有限公司 | Continuous speech recognition system for restaurant robot servant |
CN106782550A (en) * | 2016-11-28 | 2017-05-31 | 黑龙江八农垦大学 | A kind of automatic speech recognition system based on dsp chip |
CN106531158A (en) * | 2016-11-30 | 2017-03-22 | 北京理工大学 | Method and device for recognizing answer voice |
CN109255106A (en) * | 2017-07-13 | 2019-01-22 | Tcl集团股份有限公司 | A kind of text handling method and terminal |
CN107773982A (en) * | 2017-10-20 | 2018-03-09 | 科大讯飞股份有限公司 | Game voice interactive method and device |
CN107910020A (en) * | 2017-10-24 | 2018-04-13 | 深圳和而泰智能控制股份有限公司 | Sound of snoring detection method, device, equipment and storage medium |
CN108922543A (en) * | 2018-06-11 | 2018-11-30 | 平安科技(深圳)有限公司 | Model library method for building up, audio recognition method, device, equipment and medium |
CN108922543B (en) * | 2018-06-11 | 2022-08-16 | 平安科技(深圳)有限公司 | Model base establishing method, voice recognition method, device, equipment and medium |
CN108536304A (en) * | 2018-06-25 | 2018-09-14 | 广州市锐尚展柜制作有限公司 | A kind of multi-modal interactive device of smart home |
CN116189671A (en) * | 2023-04-27 | 2023-05-30 | 凌语国际文化艺术传播股份有限公司 | Data mining method and system for language teaching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102789779A (en) | Speech recognition system and recognition method thereof | |
CN107680582B (en) | Acoustic model training method, voice recognition method, device, equipment and medium | |
CN102800316B (en) | Optimal codebook design method for voiceprint recognition system based on nerve network | |
CN101980336B (en) | Hidden Markov model-based vehicle sound identification method | |
CN101136199B (en) | Voice data processing method and equipment | |
CN102968990B (en) | Speaker identifying method and system | |
CN103310789B (en) | A kind of sound event recognition method of the parallel model combination based on improving | |
Chavan et al. | An overview of speech recognition using HMM | |
CN101226743A (en) | Method for recognizing speaker based on conversion of neutral and affection sound-groove model | |
CN105206270A (en) | Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM) | |
CN103794207A (en) | Dual-mode voice identity recognition method | |
CN102024455A (en) | Speaker recognition system and method | |
CN104078039A (en) | Voice recognition system of domestic service robot on basis of hidden Markov model | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
Das et al. | Bangladeshi dialect recognition using Mel frequency cepstral coefficient, delta, delta-delta and Gaussian mixture model | |
Deshmukh et al. | Speech based emotion recognition using machine learning | |
CN109192200A (en) | A kind of audio recognition method | |
CN104240706A (en) | Speaker recognition method based on GMM Token matching similarity correction scores | |
CN104732972A (en) | HMM voiceprint recognition signing-in method and system based on grouping statistics | |
CN110827844A (en) | Noise classification method based on BP network | |
CN112562725A (en) | Mixed voice emotion classification method based on spectrogram and capsule network | |
Shivakumar et al. | Simplified and supervised i-vector modeling for speaker age regression | |
Ye et al. | Phoneme classification using naive bayes classifier in reconstructed phase space | |
Prazak et al. | Speaker diarization using PLDA-based speaker clustering | |
Nyodu et al. | Automatic identification of Arunachal language using K-nearest neighbor algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20121121 |