CN1295676C - State structure regulating method in sound identification - Google Patents

State structure regulating method in sound identification Download PDF

Info

Publication number
CN1295676C
CN1295676C CNB2004100667929A CN200410066792A CN1295676C CN 1295676 C CN1295676 C CN 1295676C CN B2004100667929 A CNB2004100667929 A CN B2004100667929A CN 200410066792 A CN200410066792 A CN 200410066792A CN 1295676 C CN1295676 C CN 1295676C
Authority
CN
China
Prior art keywords
state
voice
model
self
adaptation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100667929A
Other languages
Chinese (zh)
Other versions
CN1588536A (en
Inventor
朱杰
徐向华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CNB2004100667929A priority Critical patent/CN1295676C/en
Publication of CN1588536A publication Critical patent/CN1588536A/en
Application granted granted Critical
Publication of CN1295676C publication Critical patent/CN1295676C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention relates to a state structure regulating method in voice recognition in the field of voice recognition. The present invention comprises the steps that in the step of creation of a continuous voice recognition system with a large vocabulary, voice characteristics use 13 dimensions which comprise 12 levels of cepstrum and short time energy as basic characteristics, and first-order differences and second-order differences are added to form 39 characteristic dimensions; in the step of state structure regulation, self-adaptive voice and training voice are used to adjust the state structure of a model, and a hypothesis that an error generated when a baseline system recognizes the training voice can be also generated when the baseline system recognizes testing voice is made so as to utilize training voice material to adjust the structure of a residual state; in the step of speaker adaptation, self-adaptive voice material is used to process the adjusted model in the mode of self-adaptation by using maximum likelihood linear regression algorithm. The present invention increases the posteriori check probability of the model to samples and increases the utilization ratio of the self-adaptive voice material; thus, the problem of low recognition rate because a training voice material decision tree does not match with a testing voice material decision tree in structure.

Description

Status architecture method of adjustment in a kind of speech recognition
Technical field
The present invention relates to a kind of status architecture adjustment algorithm of field of speech recognition, specifically is the status architecture method of adjustment in a kind of speech recognition.
Background technology
Since the nineties, unspecified person (SI), large vocabulary continuous speech recognition (LVCSR) based on continuous probability HMM obtain a very large progress, for setting up more precise analytic model, the LVCSR system generally all adopts context-sensitive three-tone model, utilizes the performance of further improving model based on the state sharing policy of acoustics decision tree.Simultaneously, in the SI system, different speakers' property difference can bring the reduction of system performance, and this makes the speaker adaptation technology become the key that the SI system moves towards practicability.Adaptive approach commonly used comprises Bayes's (MAP) method and linear (MLLR) method that returns of maximum likelihood, all is based on the self-adaptation language material parameter of model is done conversion, does not have to consider the structure of decision tree is done self-adaptation.Merging in the decision tree between the state or division are based on that the variation of likelihood value in the corpus and sampled data output carry out, the structure of the decision tree that obtains can not reflect the feature of testing material effectively, especially when the characteristic difference of corpus and self-adaptation language material was bigger, the deviation of this structure directly can cause the reduction of system performance.
In order to solve the do not match reduction of the discrimination that causes of corpus decision tree and testing material decision tree structure, must the structure of corpus decision tree be adjusted, because after directly adjusting the corpus decision tree structure, can make the inconsistent of decision tree structure and corpus again, cause model accuracy to descend.
Find by literature search, A.Nakamura is in international acoustics, voice and signal Processing meeting (" ICASSP ", vol.1, pp.649-652,1998) propose to adjust Gaussian Mixture distribution function method among " a kind of method of in the unspecified person Acoustic Modeling, adjusting the Gaussian Mixture function structure " (the Restructuring Gaussian mixture density functions in speakerindependent acoustic models) that delivers in, in this scheme, for given voice X, t observation vector o constantly t, corresponding actual Gaussian function is f t a(μ, δ 2), belong to state s a, and the Gaussian function of the identification that obtains by Viterbi (Viterbi) decoding algorithm is f t b(μ, δ 2), belong to state s bs aWith s bShare Gaussian function f t b(μ, δ 2), thereby adjust s aThe distribution function of middle Gaussian Mixture.Adjusted state comprises the Gaussian function of varying number, and certain Gaussian function can be shared by a plurality of states.Yet the training process of this method more at random, and this is based on corpus, can not reflect the information of tested speech to a certain extent.
Summary of the invention
The present invention is directed to above shortcomings and defective in the prior art, status architecture in a kind of speech recognition method of adjustment is provided, make it improve the posterior probability of model to sample, enhancing is to the utilization factor of self-adaptation language material, and increase state confidential reference items quantity, enlarge the description power of model, limited to the increase of system's Headquarters of the General Staff quantity, thus reduce the do not match reduction of the discrimination that causes of corpus and testing material decision tree structure.
The present invention is achieved by the following technical solutions, according to degree of obscuring between state, adopts and to obscure between state Gauss's weighting and share status architecture is adjusted, and concrete steps are as follows:
(1) set up large vocabulary continuous speech recognition system: phonetic feature adopts 12 rank Mel cepstrum features and short-time energy totally 13 to tie up as essential characteristic, adds its first order difference and second order difference, and last intrinsic dimensionality is 39, and process is with general speech recognition.Extract the feature of every words of training utterance, utilize HTK (HMMToolKit) instrument at first to select initial consonant and band to transfer simple or compound vowel of a Chinese syllable, set up band and transfer the single-tone submodel as basic modeling unit according to the sentence content; Then model is expanded to context-sensitive three-tone model by single-tone, the three-tone model has been considered different inter-syllables left and right sides sound mother's situation simultaneously, the three-tone model that different linguistic context is corresponding different; Utilize the acoustics decision tree that the state based on all three-tone models of same single-tone is carried out cluster at last, the state after the cluster expands to a plurality of mixed Gaussians gradually by single Gaussian distribution and distributes.
(2) status architecture adjustment: comprise and utilize adaptive voice to the model state structural adjustment with utilize training utterance to the model state structural adjustment.Adaptive voice and tested speech are from same tester, and the wrong meeting equally that occurs during baseline system identification adaptive voice occur when baseline system identification tested speech.Therefore, the mistake that occurs when as analysed basis wire system identification adaptive voice goes out is carried out suitable adjustment to status architecture and not only can be improved utilization factor to the self-adaptation language material, can also improve the posterior probability of model.On the other hand, only utilize the self-adaptation language material, being limited in scope of state adjustment to the status architecture adjustment; Corpus is from a large amount of speakers, and pronunciation has certain representativeness.Therefore suppose that the mistake that baseline system occurs also can occur when the recognition training voice, thereby can utilize corpus that the status architecture that does not have in the adaptive voice to occur is adjusted when the identification tested speech.
(3) speaker adaptation: adopt the linear regression algorithm (MLLR) of maximum likelihood, utilize the self-adaptation language material that adjusted model is done self-adaptation, purpose is not matching between adjusted model of further compensating coefficient and the tested speech.
Below the present invention is further illustrated, particular content is as follows:
1, the described adaptive voice that utilizes is to the model state structural adjustment, and concrete steps are:
If the state set of HMMs is Ω; Self-adaptation sample X={X 1..., X i...) and corresponding state set is Φ.Each sample X iThe characteristic of correspondence vector is O i=(o 1..., o t..., o T), state set is Φ ii Φ).According to sample X iAcoustic model, utilize frame synchronization Viterbi algorithm to obtain vector O iCorresponding to Φ iStatus switch Ξ=(s 1..., s t..., s T), claim that huge is actual status switch; Similarly obtain O according to the Viterbi recognizer iStatus switch Ψ=(r corresponding to state set Ω 1..., r t..., r T), claim Ψ status switch for identification.Relatively these two groups of status switches obtain corresponding to same vector o tTwo state s tAnd r t, if s t≠ r t, claim r tBe S tObscure state, define both degree of obscuring (confusion):
C s t | r t = P ( o t | r t ) P ( o t | s t ) - - - ( 1 )
Because state s tKnown into r by mistake tSo, work as s t≠ r t, ignore language model and state sound transition probability, P (o is arranged t| r t)>P (o ts t), promptly C s t | r t > 1 , From definition (1) as can be seen, C St|rtBig more, virtual condition S is described tBe identified as r tPossibility big more.Therefore, if state r tMixed Gaussian with the form and the state s of weighting tShare, change state s tStructure, probability P (o then t| s t) understand increase, thus the misclassification rate of system can be reduced, improve model to observing vector o tPosterior probability.
If state s ∈ is Φ, corresponding to the observation eigenvector O of self-adaptation sample sR sBe identification O sThe state set that obtains (Rs  Ω) claims R sClose state set for s.Utilize state r (r ∈ R s), the s structure to be adjusted, adjusted Gaussian Mixture function is
b ( · | s ) = Σ r ∈ R s w s | r P ( · | r ) + w 0 P ( · | s ) - - - ( 2 )
In the formula (2), get w 0=1-D, D are constant; Weights W S|rAnd the computing formula of probability function P (lr) is respectively
w s | r = D · C s | r Σ r ∈ R s C s | r - - - ( 3 )
P ( · | r ) = Σ l = 1 L m r , 1 N ( · | μ r , 1 , Σ r , 1 ) - - - ( 4 )
(4) L is that state is adjusted preceding Gaussian Mixture number, μ in the formula R, 1, ∑ R, 1And m R, 1Be respectively polynary Gaussian function N (| μ R, 1, ∑ R, 1) mean value vector, diagonal covariance matrix and weights.Therefore, there are two-layer weights in the state after the structural adjustment: weights m in the state R, 1And weight w between state S|r, satisfy
Weights in the state: Σ k = 1 K m r , k = 1 , 0≤m r,k≤1.
Weights between state: Σ r ∈ R s ′ w s | r = 1 , 0≤w S|r≤ 1, R wherein S '=R s∪ s.
2, the described training utterance that utilizes is to the model state structural adjustment, and concrete steps are:
If the state before adjusting is s, the log-likelihood value is L ( O s ) ′ = Σ o ∈ O s log ( P ( o | s ‾ ) ) , Adjust the increase of back likelihood value: Δ L (O s)=L (O s)-L (O sThe average likelihood value of) ', state set Φ correspondence increases to: ΔL = 1 size ( Φ ) Σ s ∈ Φ ΔL ( O s ) , Δ L will use in adjusting based on the status architecture of training utterance as threshold value.
Definition status collection Ψ (Ψ=Ω-Φ), utilize corpus that the model state structure is done further adjustment, concrete steps are:
1) to training sample Y i(Y i∈ Y) and characteristic of correspondence vector O i, obtain status recognition sequence { η } after the identification of employing Viterbi decoding algorithm iAccording to Y iCorresponding acoustic model adopts Viterbi frame synchronization to the observation sequence segmentation, obtains corresponding to eigenvector O iVirtual condition sequence { γ } i
2) repeating step 1), finish operation to all training sample Y, obtain two class status switches { η } ({ η } i { η }) and { γ } ({ γ } i { γ }).
3) compare { η } and { γ }, determine the close state set R of state s (s ∈ { γ }) s(Rs  { η }); Computing mode r ∈ R sDegree of obscuring C with state s S|rAccording to the size of degree of obscuring, with state set R sThe descending arrangement of element, and establish state set R sSize be I s
4) to the adjustment of state s: get preceding i (0<i<I s) individual state adjusts s, calculates the increase Δ Ls of likelihood value.If Δ Ls<Δ L gets i=i+1, up to Δ Ls>Δ L; If work as i=I sThe time, still have Δ L s<Δ L does not then adjust to state s.
5) repeating step 3)~4) structural adjustment of each state in finishing Ψ.
To weight w between the state that increases S|rRevaluation, the objective function of use is:
L ( O s ) = Σ o ∈ O s log ( P ( o | s ) )
= Σ o ∈ O s log Σ r ∈ R s ′ w s | r P ( o | r ) (5)
Weight w when asking objective function maximum S|rThe time, adopting maximum (EM) algorithm of expectation, auxiliary function is:
Q(w s|r, w s|r)=E[log P(O s,s| w s|r)|O s,w s|r] (6)
Σ r ∈ R s ′ w s | r = 1 Under the condition, following formula is to w S|rDifferentiate,
w ‾ s | r = Σ o ∈ O s Σ k = 1 K γ ( s , r , k ) Σ o ∈ O s Σ r ∈ R s ′ Σ k = 1 K γ ( s , r , k ) - - - ( 7 )
Here γ ( s , r , k ) = w s | r m r , k N ( o | μ r , k , δ r , k ) Σ r ∈ R s ′ Σ k = 1 K w s | r m r , k N ( o | μ r , k , δ r , k ) , For observing o (o ∈ O s) belong to the probability of k mixed Gaussian among the state r.w S|rBe exactly to w S|rUpdating value.
When utilizing the MLLR algorithm that the adjusted model of state is done self-adaptation, consider the finiteness of self-adaptation language material, only the average of model is done self-adaptation, all the other parameters remain unchanged; Translation matrix in the MLLR algorithm adopts the diagonal angle translation matrix, and shares translation matrix between the different target average.The estimation of diagonal angle translation matrix is to utilize all self-adapting datas of sharing the target distribution correspondence, and shared degree and scope are adjusted according to what and phonetics classification of self-adapting data.
The Gaussian Mixture function is shared between the state that the present invention easily obscures, because the identification error that training utterance and tested speech decision tree structure do not match and cause, the state that can occur when discerning adaptive voice embodies obscuring.For example, with female voice Model Identification male voice voice, (during B ≠ A), wherein most applications is that A and B belong to same decision tree, and some situation is that A and B exactly belong to same leaf node in the male voice decision tree when state A is identified as state B.Therefore, the method that the present invention at first adopts adaptive voice that status architecture is adjusted, the scope of utilizing the training utterance expanded state to adjust then on this basis again.
The present invention has improved the posterior probability of model to sample, enhancing is to the utilization factor of self-adaptation language material, and increase state confidential reference items quantity, enlarge the description power of model, increase to system's Headquarters of the General Staff quantity is limited, thereby reduces the do not match reduction of the discrimination that causes of corpus and testing material decision tree structure.Need to prove that protection scope of the present invention is not subjected to the restriction of modeling unit size and quantity, also be not subjected to the restriction of types of models, its method is applicable to any other continuous speech recognition system.
Description of drawings
Fig. 1: status architecture adjustment and speaker adaptation
Fig. 2: based on the status architecture adjustment of corpus
Fig. 3: status architecture Adjustment System performance relatively
Fig. 4: status architecture Adjustment System speaker adaptation performance relatively
Embodiment
Content in conjunction with the inventive method provides following examples that it is further understood.
Embodiment:
For understanding technical scheme of the present invention better, adopt the continuous speech database to do experiment and further specify.The training set of baseline system F_863 comprises that F_Tr comprises 68 female voice recording, everyone about 530 words, totally 36210; Voice adopt 16KHz sampling rate, 16 samplings, frame length 25ms, frames to move and be 10ms.Extract 39 dimension speech characteristic vectors, comprise 12 dimension MFCC, 1 dimension normalized energy, and their single order, second order difference.Acoustic model selects initial consonant and band to transfer simple or compound vowel of a Chinese syllable as basic modeling unit, each modeling unit all uses the HMM of continuous density to represent, in the present invention, basic modeling unit sees Table 1 (rhythm imperial mother's digitized representation tone, numeral 5 representatives softly), comprise 27 of initial consonants, wherein ga, ge, ger, go are respectively the supposition initial consonant of single syllable a, e, er, o; Band is transferred 157 of simple or compound vowel of a Chinese syllable, and wherein ib is illustrated in the simple or compound vowel of a Chinese syllable among syllable chi, ri, shi and the zhi, the simple or compound vowel of a Chinese syllable that the if representative is used in syllable ci, si and zi.Add quiet (silence) HMM model, train 185 single-tone submodels altogether, the training method of model is with general speech recognition process.After training pattern expands to three-tone by single-tone, based on the acoustics decision tree, the three-tone model is done state clustering, the distributions after the cluster expands to 8 mixed Gaussians gradually by single Gauss, system does not have the applicational language model in identifying, experiment only is the result on the acoustic layer.
Initial consonant in table 1 acoustic model and band are transferred simple or compound vowel of a Chinese syllable
Initial consonant (initial) b,c,ch,d,f,g,ga,ge,ger,go,h,j,k,l,m,n,p,q,r,s,sh,t, w,x,y,z,zh
Band is transferred simple or compound vowel of a Chinese syllable (tonal final) a(1-5),ai(1-4),an(1-4),ang(1-5),ao(1-4),e(1-5),ei(1-4), en(1-5),eng(1-4),er(2-4),i(1-5),ia(1-4),ib(1-4),ian(1-5), iang(1-4),iao(1-4),ie(1-4),if(1-4),in(1-4),ing(1-4),iong(1-3), iu(1-5),o(1-5),ong(1-4),ou(1-5),u(1-5),ua(1-4),uai(1-4), uan(1-4),uang(1-4),ui(1-4),un(1-4),uo(1-5),v(1-4),van(1-4), ve(1-4),vn(1-4)
Male voice testing material M_Te is from 14 people, everyone 40 words; Male voice self-adaptation language material M_Ad is from 14 same testers, and everyone 40 words are independently between tested speech and the adaptive voice wherein.Utilize M_Ad that F_863 is made the adjusted model of status architecture and be designated as R1_F, on the basis of R1_F, utilize F_Tr to make further adjusted model and be designated as R2_F, with the variation of self-adaptation statement quantity, system performance more as shown in Figure 3.As can be seen from Figure 3, R1_F has obtained the consistent discrimination that improves than F_863 with R2_F.When the self-adaptation language material more after a little while, when for example having only 1,3, the number of states of structural adjustment is limited among the R1_F, the raising of its performance also is limited; And utilize corpus that the performance of the R2_F of the state adjustment that do not have in the adaptive voice to occur is significantly improved, thereby the hypothesis that explanation is done when utilizing corpus that status architecture is adjusted is set up.Along with the increase of self-adaptation statement, R1_F and R2_F performance begin approaching, and when the self-adaptation language material was abundant, R1_F and R2_F will be consistent.
Utilize the male voice adaptive voice to do the MLLR speaker adaptation to above F_863, R1_F and three systems of R2_F, the discrimination of F_863/MLLR, F_R1/MLLR and F_R2/MLLR with the situation of change of self-adaptation sentence number as shown in Figure 4.Discrimination can be significantly increased after the MLLR self-adaptation was done by the system that the parameter amount is many, compare the F_863 system, adjusted F_R1 of state and F_R2 system have not only increased the parameter amount in the state greatly, and indirectly decision tree structure is adjusted from the angle of adjusting status architecture, having reduced decision tree structure and tested speech does not match to the influence of speaker adaptation, so F_R1/MLLR, the recognition performance of F_R2/MLLR is apparently higher than F_863/MLLR, thereby proved that the state adjustment algorithm helps improving the performance of system.

Claims (2)

1, the status architecture method of adjustment in a kind of speech recognition is characterized in that, according to degree of obscuring between state, adopts and to obscure between state Gauss's weighting and share status architecture is adjusted, and concrete steps are as follows:
(1) set up large vocabulary continuous speech recognition system: phonetic feature adopts 12 rank Mel cepstrum features and short-time energy totally 13 to tie up as essential characteristic, add its first order difference and second order difference, last intrinsic dimensionality is 39, process is with general speech recognition, extract the feature of every words of training utterance, utilize the HTK instrument at first to select initial consonant and band to transfer simple or compound vowel of a Chinese syllable according to the sentence content, set up band and transfer the single-tone submodel as basic modeling unit; Then model is expanded to context-sensitive three-tone model by single-tone, the three-tone model has been considered inter-syllable left and right sides sound mother's situation simultaneously, and linguistic context is corresponding with the three-tone model; Utilize the acoustics decision tree that the state based on all three-tone models of same single-tone is carried out cluster at last, the state after the cluster expands to a plurality of mixed Gaussians gradually by single Gaussian distribution and distributes;
(2) status architecture adjustment: comprise and utilize adaptive voice to the model state structural adjustment with utilize training utterance to the model state structural adjustment, adaptive voice and tested speech are from same tester, the wrong meeting equally that occurs during baseline system identification adaptive voice occur when baseline system identification tested speech, therefore, suppose that the mistake that baseline system occurs also can occur when the recognition training voice, thereby utilize corpus that the structure of the state that not have appearance in the adaptive voice is adjusted when the identification tested speech;
(3) speaker adaptation: adopt the linear regression algorithm of maximum likelihood, utilize the self-adaptation language material that adjusted model is done self-adaptation.
2, status architecture method of adjustment in the speech recognition according to claim 1, it is characterized in that, when the linear regression algorithm of described maximum likelihood is done self-adaptation to the adjusted model of state, consider the finiteness of self-adaptation language material, only the average of model is done self-adaptation, translation matrix in the linear regression algorithm of maximum likelihood adopts the diagonal angle translation matrix, and between plural target mean, share translation matrix, the estimation of diagonal angle translation matrix is to utilize all self-adapting datas of sharing the target distribution correspondence, and shared degree and scope are adjusted according to what and phonetics classification of self-adapting data.
CNB2004100667929A 2004-09-29 2004-09-29 State structure regulating method in sound identification Expired - Fee Related CN1295676C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100667929A CN1295676C (en) 2004-09-29 2004-09-29 State structure regulating method in sound identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100667929A CN1295676C (en) 2004-09-29 2004-09-29 State structure regulating method in sound identification

Publications (2)

Publication Number Publication Date
CN1588536A CN1588536A (en) 2005-03-02
CN1295676C true CN1295676C (en) 2007-01-17

Family

ID=34604094

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100667929A Expired - Fee Related CN1295676C (en) 2004-09-29 2004-09-29 State structure regulating method in sound identification

Country Status (1)

Country Link
CN (1) CN1295676C (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315733B (en) * 2008-07-17 2010-06-02 安徽科大讯飞信息科技股份有限公司 Self-adapting method aiming at computer language learning system pronunciation evaluation
CN101604522B (en) * 2009-07-16 2011-09-28 北京森博克智能科技有限公司 Embedded Chinese-English mixed voice recognition method and system for non-specific people
CN102237082B (en) * 2010-05-05 2015-04-01 三星电子株式会社 Self-adaption method of speech recognition system
CN103117060B (en) * 2013-01-18 2015-10-28 中国科学院声学研究所 For modeling method, the modeling of the acoustic model of speech recognition
CN104157294B (en) * 2014-08-27 2017-08-11 中国农业科学院农业信息研究所 A kind of Robust speech recognition method of market for farm products element information collection
CN106898355B (en) * 2017-01-17 2020-04-14 北京华控智加科技有限公司 Speaker identification method based on secondary modeling
CN110148403B (en) * 2019-05-21 2021-04-13 腾讯科技(深圳)有限公司 Decoding network generation method, voice recognition method, device, equipment and medium
CN112927716A (en) * 2021-01-22 2021-06-08 华东交通大学 Construction site special vehicle identification method based on improved MFCC

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5596680A (en) * 1992-12-31 1997-01-21 Apple Computer, Inc. Method and apparatus for detecting speech activity using cepstrum vectors
US6266636B1 (en) * 1997-03-13 2001-07-24 Canon Kabushiki Kaisha Single distribution and mixed distribution model conversion in speech recognition method, apparatus, and computer readable medium
CN1346126A (en) * 2000-09-27 2002-04-24 中国科学院自动化研究所 Three-tone model with tune and training method
CN1499481A (en) * 2002-10-24 2004-05-26 杜和平 'Ewenke' musical instrument

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5596680A (en) * 1992-12-31 1997-01-21 Apple Computer, Inc. Method and apparatus for detecting speech activity using cepstrum vectors
US6266636B1 (en) * 1997-03-13 2001-07-24 Canon Kabushiki Kaisha Single distribution and mixed distribution model conversion in speech recognition method, apparatus, and computer readable medium
CN1346126A (en) * 2000-09-27 2002-04-24 中国科学院自动化研究所 Three-tone model with tune and training method
CN1499481A (en) * 2002-10-24 2004-05-26 杜和平 'Ewenke' musical instrument

Also Published As

Publication number Publication date
CN1588536A (en) 2005-03-02

Similar Documents

Publication Publication Date Title
Chengalvarayan Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition.
CN1295676C (en) State structure regulating method in sound identification
He et al. Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs
Mak et al. Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers
Shahnawazuddin et al. Enhancing the recognition of children's speech on acoustically mismatched ASR system
Zhang et al. Improved context-dependent acoustic modeling for continuous Chinese speech recognition
Dusan Estimation of speaker's height and vocal tract length from speech signal.
CN1296887C (en) Training method for embedded automatic sound identification system
Gutman et al. Speaker verification using phoneme-adapted gaussian mixture models
Matsui et al. N-best-based unsupervised speaker adaptation for speech recognition
Dey et al. Content normalization for text-dependent speaker verification
CN1262989C (en) Language identification method and system
Zhang et al. Long span features and minimum phoneme error heteroscedastic linear discriminant analysis
Deshpande et al. Text-independent speaker identification using hidden Markov models
Zhang et al. A tree-structured clustering method integrating noise and SNR for piecewise linear-transformation-based noise adaptation
Hahm et al. Aspect-model-based reference speaker weighting
Antal Phonetic speaker recognition
Morris et al. GMM based clustering and speaker separability in the Timit speech database
Psutka et al. Comparison of various feature decorrelation techniques in automatic speech recognition
Vasilache et al. Speaker adaptation of quantized parameter HMMs.
Bocchieri et al. Methods for task adaptation of acoustic models with limited transcribed in-domain data.
Ariff et al. Malay speaker recognition system based on discrete HMM
Wang et al. Missing data solutions for robust speech recognition
Zhang et al. Tree-structured noise-adapted HMM modeling for piecewise linear-transformation-based adaptation.
ZHANG et al. Continuous speech recognition using an on-line speaker adaptation method based on automatic speaker clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070117

Termination date: 20091029