CN1221939C - Speaker self-adaptive method in speech recognition system - Google Patents

Speaker self-adaptive method in speech recognition system Download PDF

Info

Publication number
CN1221939C
CN1221939C CNB031022065A CN03102206A CN1221939C CN 1221939 C CN1221939 C CN 1221939C CN B031022065 A CNB031022065 A CN B031022065A CN 03102206 A CN03102206 A CN 03102206A CN 1221939 C CN1221939 C CN 1221939C
Authority
CN
China
Prior art keywords
self
adaptation
class
decision tree
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB031022065A
Other languages
Chinese (zh)
Other versions
CN1521728A (en
Inventor
吴及
王作英
吕萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANLANG SPEECH SOUND SCI-TECH Co Ltd BEIJING
Tsinghua University
Original Assignee
TIANLANG SPEECH SOUND SCI-TECH Co Ltd BEIJING
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANLANG SPEECH SOUND SCI-TECH Co Ltd BEIJING, Tsinghua University filed Critical TIANLANG SPEECH SOUND SCI-TECH Co Ltd BEIJING
Priority to CNB031022065A priority Critical patent/CN1221939C/en
Publication of CN1521728A publication Critical patent/CN1521728A/en
Application granted granted Critical
Publication of CN1221939C publication Critical patent/CN1221939C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention provides a speaker self-adaptive method in a speech recognition system, which is called as a linear interpolation method for covariance matrices via a maximum gauss semblance and can overcome defects of a bifurcation decision tree method based on the gauss semblance under the condition of little self-adaptive data. The present invention comprises the following main steps: before self-adaptation: firstly, like the bifurcation decision tree self-adaptive method analyzing based on the gauss semblance, building a bifurcation decision tree for the covariance matrices according to non-specified person models; then, calculating category central matrices corresponding to intermediate nodes under specified person models according to the decision tree; in self-adaptation: firstly, determining which intermediate nodes to carry out interpolation self-adaptation according to a data quantity provided by a tester; then, according to self-adaptive data corresponding to the intermediate node of each interpolation, calculating interpolation factors; finally, calculating the category central matrices after self-adaptation and updating covariance matrices to obtain a self-adaptive model.

Description

Speaker adaptation method in the speech recognition system
Technical field
The present invention relates to the speaker adaptation method in a kind of speech recognition technology field, relate in particular to a kind of speaker adaptation method covariance matrix.
Background technology
Speech recognition technology has been obtained significant progress through the development of over half a century, walks out laboratory environment gradually and enters into practical application.Wherein, (Speaker Dependent, SD) (application space of speech recognition technology has greatly been expanded in Speaker Independent, SI) speech recognition to unspecified person in speech recognition from specific people.Yet, with regard to same speaker, the performance of SI system is usually well below the sufficient SD of training system, this be because, the acoustic model of SD system is to be trained by single speaker's data to obtain, and has well reacted this speaker's characteristic, has then comprised different speakers' as much as possible speech data in the training set of SI system, corresponding acoustic model is many speakers' a level and smooth model, so the reduction of SI system identification performance is difficult to avoid.For remedying this defective of SI system, people's speaker adaptation technology that begins one's study.The target of speaker adaptation technology is, utilizes new speaker's speech data to adjust phonetic feature or acoustic model parameter, makes it as much as possible and new speaker's coupling, makes the system performance convergence SD system performance as far as possible after the self-adaptation.
Model adaptation is the technology of normal employing in the speaker adaptation, as shown in Figure 1, the speech data that it provides according to new speaker, adjust the acoustic model parameter (average or covariance) of SI system according to certain transformation relation, the model of this moment is called speaker adaptation (Speaker adapted, SA) model.After obtaining the SA model, system will discern other voice signal of this speaker by it, and such system is called speaker adaptation system (shown in Figure 2).
Since the 1980s, multiple speaker model adaptive approach has been proposed.Roughly can be divided into two big classes: based on Bayesian Estimation and based on the self-adaptation of conversion; Corresponding typical algorithm has: (Maximum A Posterior is MAP) with linear (MaximumLikelihood Linear Regression, the MLLR) algorithm of returning of maximum likelihood for maximum a posteriori probability.Along with the development of speech recognition in application, quick self-adapted technology receives increasing concern, and its basic thought is: in conjunction with MAP and MLLR algorithm, make full use of the correlativity between voice recognition unit, reduce the number of parameter estimation.
Belong to adaptive approach based on the binary decision tree adaptive approach of gaussian similarity analysis based on conversion, carry out self-adaptation at covariance matrix, its basic thought is: one group of more similar covariance matrix is no matter before self-adaptation or after the self-adaptation, their similarity relation is constant, therefore they share identical transformation equation when self-adaptation, and this group covariance matrix is dynamically determined by binary decision tree.This method has proposed a kind of low volume data that utilizes and has carried out the adaptive a kind of effective ways of covariance matrix, the effect that makes the recognition performance of adaptive model finally can approach specific human model.But this method also has weak point, promptly in adaptive process, at least need to estimate a Centroid matrix (being the root node matrix), under the few situation of self-adapting data, be difficult to stably estimate a matrix, then cause the self-adaptation effect born, promptly the system performance after the self-adaptation can be lower than the baseline system performance on the contrary.
Summary of the invention
The objective of the invention is to propose a kind of new quick covariance adaptive approach, to overcome based on the shortcoming under the few situation of self-adapting data in the binary decision tree method of Gauss's similarity.
For achieving the above object, the present invention is achieved in that the training step that the present invention includes the preceding unspecified person hidden Markov model of a self-adaptation;
Set up the step of the binary decision tree of this unspecified person hidden Markov model state covariance matrix before self-adaptation;
Calculate before self-adaptation each intermediate node of binary decision tree class center covariance matrix and with corresponding each leaf node covariance matrix between the step of transformation relation;
The training step of the preceding a plurality of specific people's hidden Markov models of self-adaptation;
The preceding step of calculating the class center matrix of intermediate node correspondence under each specific human model according to this binary decision tree of self-adaptation;
The step of the self-adapting data decision self-adaptation class that provides according to the tester during self-adaptation;
A step of each self-adaptation class being estimated the class center matrix according to self-adapting data with maximum likelihood method;
A step of each self-adaptation class being calculated the best interpolation coefficient;
One is calculated class center covariance matrix after self-adaptation to each self-adaptation class with the maximum likelihood estimator of specific human model class center matrix and its corresponding interpolation coefficient;
A covariance matrix that upgrades each self-adaptation class obtains the step of speaker adaptation model.
In the described step to each self-adaptation class calculating best interpolation coefficient, calculation criterion is Gauss's similarity maximum, that is: make the class center matrix of the intermediate node that obtains by linear interpolation, the similarity degree maximum of the class center matrix that obtains with the previous step of this step.
In the present invention, the said covariance matrix that is used for interpolation, it not merely is the covariance matrix of HMM state, the covariance matrix that also comprises each intermediate node class center of binary tree, intermediate node has been represented all leaf nodes of its correspondence, after event was carried out interpolation to it, all leaf covariance matrixes of its correspondence all can be by self-adaptation.Another advantage of utilizing the class barycenter of intermediate node to carry out interpolation is, can according to the self-adapting data amount what dynamically decision be used for the numbers of interpolation intermediate node, like this guarantee quick self-adapted in, improve the gradual of algorithm.
Description of drawings
Fig. 1 is the schematic flow sheet of model self-adapting method;
Fig. 2 is the process flow diagram through the speech recognition system behind the model adaptation;
Fig. 3 is the preceding process flow diagram of self-adaptation of the embodiment of the invention;
Fig. 4 is for setting up the process flow diagram of binary decision tree in the embodiment of the invention;
Fig. 5 is the process flow diagram of K Mean Method division node shown in Figure 4;
Process flow diagram when Fig. 6 is the self-adaptation of the embodiment of the invention.
Embodiment
The present invention is further elaborated below in conjunction with the drawings and specific embodiments:
Fig. 3 is to a kind of embodiment that has realized optimum of the present invention embodiment illustrated in fig. 6.
Before the self-adaptation, unspecified person hidden Markov model of training shown in Figure 3 (hereinafter to be referred as the SI model), be distance measure (being Gauss's similarity) between covariance matrix with formula (3) then, adopt top-down K Mean Method to set up the binary decision tree of a hidden Markov model (HMM) state covariance matrix, and calculate the transformation relation A between each state and class center covariance matrix I, Φ, as shown in Figure 4, will treat that earlier the covariance matrix of all state correspondences of adaptive model is put into root node, calculate the center matrix C of this node according to formula (1) ΦThen root node is divided into two child nodes with the K mean algorithm, the multiple division process, if the status number in the present node has decomposed inadequately or when being lower than predefined thresholding just with this node as leaf node, otherwise repeat above-mentioned fission process until obtaining all leaf nodes, the corresponding covariance matrix of leaf node is at last according to formula (2) compute classes center matrix C ΦWith the transformation relation battle array A between corresponding each leaf node covariance matrix I, Φ
N wherein ΦIt is the number (1) of leaf node among the set Φ
A i , Φ = Σ i - 1 / 2 [ Σ i 1 / 2 C Φ Σ i 1 / 2 ] 1 / 2 Σ i - 1 / 2 - - - i ∈ Φ - - - ( 2 )
d ( x , y ) = tr ( Σ x + Σ y - 2 [ Σ x 1 / 2 Σ y Σ x 1 / 2 ] 1 / 2 ) - - - ( 3 )
Being described below of K Mean Method: n some X arranged in the space 1, X 2X n, the number K of given class (K=2 among the present invention), establishing these classes is C 1, C 2C K, n point assigned to and gone in K the class, make similarity maximum between the interior object of class, and the similarity minimum between the class.As Fig. 5, the steps include:
1, chooses K initial classes center earlier, be designated as C 1, C 2C K
2, according to function (3) calculate respectively each put these class centers apart from d (X i, c j), seek the minimum class center c of distance l, that is: d (X i, c l)≤d (X i, c j), j ∈ 1,2 ... K, j ≠ l, 1≤l≤K then thinks X i∈ C l, i.e. X iBe the point that belongs to the l class, so, determine the ownership of being had a few;
3, calculate total distance measure: D = Σ i , j n , m min 1 ≤ j ≤ k d ( X i , c j ) ;
4,, utilize the point of every class to recomputate the class center according to sorting result.
5, utilize new class center, the ownership of computer memory each point again, and calculate total distance measure D of renewal New
6, the total distance measure that relatively obtains for twice if difference is enough little, then stops iteration, gets to the end mode classification and class center, otherwise continues iteration, repeats the 2-5 step.
Like this, we are just by treating adaptive HMM model parameter, according to Gauss similar estimate to have set up the binary decision tree of HMM state observation probability in feature space distribution shape relation described, the included state of each node on this decision tree is the less state of distance between observation probability under Gauss's similarity meaning distributes, and promptly these are distributed on the distribution shape of feature space more similar.In fact this binary tree is exactly a kind of structural description of state observation probability distribution at feature space.
Secondly, before self-adaptation, train a plurality of specific people's hidden Markov models (hereinafter to be referred as the SD model), then, calculate the class center C of intermediate node correspondence under each SD model according to above-mentioned decision tree (s) Φ j, (s=1 ..., S; J=1 ..., J), wherein S is the number of SD model, J is the sum of intermediate node, as shown in Figure 3.
During self-adaptation, at first the self-adapting data that provides according to the tester determines self-adaptation class quantity, method is: the speech samples number of adding up each leaf node earlier according to self-adapting data, if number is less than pre-determined threshold value, then pass up to its father node, add up all speech samples numbers of father node again, as then stopping greater than thresholding, otherwise continue, stop until all leaf nodes are recalled, this moment, we obtained being applicable to the state class of this batch self-adapting data.Thisly come the method for Dynamic Selection state class, be referred to as data-driven by self-adapting data.The selection of above-mentioned thresholding is vital for obtain best self-adaptation effect from limited self-adapting data, because self-adapting data is limited, so if thresholding is less, just may there be abundant data to carry out the estimation at class center in the state class, make the covariance matrix instability that estimates, thereby influence adaptive effect.If thresholding is too big, the state class of determining is very few for number, can make the state observation probability distribution in the description of feature space structural relation too in coarse, also is difficult to receive good self-adaptation effect.Experiment shows, under self-adapting data ten minutes condition of limited, is proper between the speech samples thresholding is taken at 350 to 450.Certainly, the increase of self-adapting data is can be beneficial for the self-adaptation effect, the number that opposite extreme situations is a state class equals status number, be to have only a state in each state class, each state all has enough data to carry out parameter estimation simultaneously, this situation just has been equal to specific people's covariance matrix model training, and certainly, so many speech data only just may occur when unsupervised progression self-adaptation.This also illustrates, limiting performance of the present invention is the performance that can be tending towards specific human model.
Estimate to obtain the class center matrix for each self-adaptation class according to self-adapting data and maximum likelihood method then Concrete grammar is described below: suppose that the leaf node that certain node comprises in the binary decision tree (being the HMM state) is s 1, s 2..., s n, each state is according to corresponding self-adapting data, by formula (4) statistics second-order statistic C (s i):
C ( s i ) = Σ t = 1 T ( s i ) ( o t - μ s 1 ) ( o t - μ s 1 ) T - - - ( 4 )
T (s wherein i) be state s iCorresponding self-adaptation totalframes.
Transform in the space of intermediate node correspondence according to the statistics second-order statistic of formula (5) again, and obtain maximal possibility estimation each state
C ~ Φ = 1 Σ i ∈ Φ T ( s i ) Σ i ∈ Φ ( A i , Φ ) - 1 C ( s i ) ( A i , Φ ) - 1 - - - ( 5 )
Then each self-adaptation class is calculated the best interpolation coefficient, calculation criterion is Gauss's similarity maximum, makes the class center matrix of the intermediate node that is obtained by linear interpolation that is:, the class center matrix that obtains with the previous step of this step
Figure C0310220600075
The similarity degree maximum.Suppose to have only a self-adaptation class, the objective function in this algorithm is formula (6):
J ( α ) = tr ( C ~ Φ + Σ s = 1 S α s C Φ ( s ) - 2 [ ( C ~ Φ ) 1 / 2 ( Σ s = 1 S α s C Φ ( s ) ) ( C ~ Φ ) 1 / 2 ] 1 / 2 ) - - - ( 6 )
Find the solution interpolation coefficient with gradient projection method, and mainly will calculate two derivatives in the gradient projection method: be i.e. derivative shown in formula (7) and (8).(7) and (8) formula of utilization obtains optimum combination coefficient
α * = arg min α ∈ Ω J ( α ) .
▿ α J ( α ) = tr ( C Φ ( 1 ) ) tr ( C Φ ( 2 ) ) tr ( C Φ ( S ) ) - tr ( C ~ Φ 1 / 2 ( Σ s = 1 S α s C Φ ( s ) ) - 1 / 2 C Φ ( 1 ) ) tr ( C ~ Φ 1 / 2 ( Σ s = 1 S α s C Φ ( s ) ) - 1 / 2 C Φ ( 2 ) ) · · · tr ( C ~ Φ 1 / 2 ( Σ s = 1 S α s C Φ ( s ) ) - 1 / 2 C Φ ( S ) ) - - - ( 7 )
▿ τ J ( α + τd ) = Σ s = 1 S tr ( d s C Φ ( s ) ) - Σ s = 1 S tr ( d s · ( C ~ Φ ) 1 / 2 ( Σ s = 1 S ( α s + τd s ) C Φ ( s ) ) - 1 / 2 C Φ ( s ) ) - - - ( 8 )
Class center C with the interpolation coefficient that obtains and each SD model Φ (s), (s=1 ... S), according to the class center C after (9) formula calculating self-adaptation Φ (SA)
C Φj ( SA ) = Σ s = 1 S α s , j · C Φj ( s ) - - - ( j = 1 , . . . , N J ) - - - ( 9 )
Wherein:
J represents intermediate node;
N JBe the sum of the intermediate node of interpolation, promptly total self-adaptation class number is dynamically determined by self-adapting data;
Φ jThe set of the leaf node (being state) of expression node j correspondence;
C Φ j (s)J intermediate node representing s SD model; S=1,2 ..., S, S are total SD pattern number;
α j={ α S, j| s=1,2 .., S} represents the linear interpolation coefficient of j intermediate node correspondence.
Use the class center C after the self-adaptation that a step obtains Φ (SA), upgrade covariance matrix according to (10) formula, obtain speaker adaptation model (SA model).The present invention's self-adaptation covariance matrix, the mean value vector in the model remains unchanged.
Σ i ( SA ) = A i , Φj C Φj ( SA ) A i , Φj - - - - i ∈ Φ j - - - ( 10 )
The present invention is applied to the Maximum Likelihood Model interpolation algorithm on covariance matrix quick self-adapted, so can be described as largest Gaussian one similarity covariance matrix linear interpolation method again, it has overcome binary decision tree method based on Gauss's similarity under the few situation of self-adapting data, owing to being difficult to stably to estimate that a matrix causes the defective of the self-adaptation effect born, have very big promotion and application and be worth.

Claims (1)

1, the speaker adaptation method in a kind of speech recognition system comprises:
The training step of a preceding unspecified person hidden Markov model of self-adaptation;
Set up the step of the binary decision tree of this unspecified person hidden Markov model state covariance matrix before self-adaptation;
Calculate before self-adaptation each intermediate node of binary decision tree class center covariance matrix and with corresponding each leaf node covariance matrix between the step of transformation relation;
The step of the self-adapting data decision self-adaptation class that provides according to the tester during self-adaptation;
A step of each self-adaptation class being estimated the class center matrix according to self-adapting data with maximum likelihood method;
Step to the class center covariance matrix after each self-adaptation class calculating self-adaptation;
A covariance matrix that upgrades each self-adaptation class obtains the step of speaker adaptation model;
It is characterized in that: the speaker adaptation method in the described speech recognition system also comprises:
The training step of the preceding a plurality of specific people's hidden Markov models of self-adaptation;
The preceding step of calculating the class center matrix of intermediate node correspondence under each specific human model according to described binary decision tree of self-adaptation;
During a self-adaptation each self-adaptation class is calculated the step of best interpolation coefficient, this calculation criterion is Gauss's similarity maximum, that is: make the class center matrix of the intermediate node that obtains by linear interpolation, the similarity degree maximum of the class center matrix that obtains with the previous step of this step;
The described step that each self-adaptation class is calculated the class center covariance matrix after self-adaptation is that maximum likelihood estimator and its corresponding interpolation coefficient with specific human model class center matrix calculates.
CNB031022065A 2003-01-27 2003-01-27 Speaker self-adaptive method in speech recognition system Expired - Fee Related CN1221939C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031022065A CN1221939C (en) 2003-01-27 2003-01-27 Speaker self-adaptive method in speech recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031022065A CN1221939C (en) 2003-01-27 2003-01-27 Speaker self-adaptive method in speech recognition system

Publications (2)

Publication Number Publication Date
CN1521728A CN1521728A (en) 2004-08-18
CN1221939C true CN1221939C (en) 2005-10-05

Family

ID=34281633

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031022065A Expired - Fee Related CN1221939C (en) 2003-01-27 2003-01-27 Speaker self-adaptive method in speech recognition system

Country Status (1)

Country Link
CN (1) CN1221939C (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101123648B (en) * 2006-08-11 2010-05-12 中国科学院声学研究所 Self-adapted method in phone voice recognition

Also Published As

Publication number Publication date
CN1521728A (en) 2004-08-18

Similar Documents

Publication Publication Date Title
US6879954B2 (en) Pattern matching for large vocabulary speech recognition systems
KR101113006B1 (en) Apparatus and method for clustering using mutual information between clusters
CN1302456C (en) Sound veins identifying method
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN113488060B (en) Voiceprint recognition method and system based on variation information bottleneck
CN1234110C (en) Noise adaptation system of speech model, noise adaptation method, and noise adaptation program for speech recognition
CN109616105A (en) A kind of noisy speech recognition methods based on transfer learning
US9984678B2 (en) Factored transforms for separable adaptation of acoustic models
CN112530407B (en) Language identification method and system
Maas et al. Recurrent neural network feature enhancement: The 2nd CHiME challenge
CN112270405A (en) Filter pruning method and system of convolution neural network model based on norm
CN105895104B (en) Speaker adaptation recognition methods and system
CN1534596A (en) Method and device for resonance peak tracing using residuum model
CN1514432A (en) Dynamic time curving system and method based on Gauss model in speech processing
CN113591733B (en) Underwater acoustic communication modulation mode classification identification method based on integrated neural network model
CN1221939C (en) Speaker self-adaptive method in speech recognition system
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
CN1787077A (en) Method for fast identifying speeking person based on comparing ordinal number of archor model space projection
CN1198261C (en) Voice identification based on decision tree
CN1870136A (en) Variation Bayesian voice strengthening method based on voice generating model
CN115348215B (en) Encryption network traffic classification method based on space-time attention mechanism
CN1221938C (en) Speaker self-adaptive method based on Gauss similarity analysis
CN106373576B (en) Speaker confirmation method and system based on VQ and SVM algorithms
CN1201287C (en) Hiaden Markov model edge decipher data reconstitution method f speech sound identification
Hashimoto et al. Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee