CN109545201A - The construction method of acoustic model based on the analysis of deep layer hybrid cytokine - Google Patents

The construction method of acoustic model based on the analysis of deep layer hybrid cytokine Download PDF

Info

Publication number
CN109545201A
CN109545201A CN201811537321.XA CN201811537321A CN109545201A CN 109545201 A CN109545201 A CN 109545201A CN 201811537321 A CN201811537321 A CN 201811537321A CN 109545201 A CN109545201 A CN 109545201A
Authority
CN
China
Prior art keywords
model
layer
mfa
state
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811537321.XA
Other languages
Chinese (zh)
Other versions
CN109545201B (en
Inventor
屈丹
闫红刚
张文林
杨绪魁
牛铜
张连海
陈琦
李�真
魏雪娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201811537321.XA priority Critical patent/CN109545201B/en
Publication of CN109545201A publication Critical patent/CN109545201A/en
Application granted granted Critical
Publication of CN109545201B publication Critical patent/CN109545201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Complex Calculations (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The present invention relates to technical field of voice recognition, the construction method of the open acoustic model analyzed based on deep layer hybrid cytokine, comprising: utilize training data, baseline system is generated using HMM-GMM model;According to HMM-GMM model parameter, DMFA model is initialized, DMFA model is made of two layers of MFA model, initializes DMFA model parameter using GMM cluster and Probabilistic Principal Component Analysis method;Using training data, by the baseline system of HMM-GMM model, using the overall model parameter of the DMFA model in greedy EM algorithm estimation acoustic feature space;Estimate the state model parameter of the first layer MFA model of DMFA model, the state model parameter includes state relevant parameter and state independent parameter;Estimate the state model parameter of the second layer MFA model of DMFA model.Deep layer hybrid cytokine analysis model is introduced into the modeling process of state model by the present invention, proposes the acoustic model based on the analysis of deep layer hybrid cytokine, has better anti-over-fitting ability.

Description

The construction method of acoustic model based on the analysis of deep layer hybrid cytokine
Technical field
The present invention relates to technical field of voice recognition, more particularly to the structure of the acoustic model based on the analysis of deep layer hybrid cytokine Construction method.
Background technique
Acoustic model is the important component in Continuous Speech Recognition System, mixed based on hidden Markov model-Gauss Molding type (Hidden Markov Model-Gaussian Mixture Model, HMM-GMM) is current main model, and And HMM-GMM model be based on deep neural network (Deep Neural network, DNN) bottleneck characteristic (Bottle Neck Feature, BNF) it combines, discrimination is greatly improved.But in nature voice signal it is uncertain very Greatly, it is difficult to obtain an accurate acoustic model and be described.The uncertainty of voice signal includes coarticulation, speaks People, transmission channel or noise circumstance of speaking etc., these require to realize the Accurate Model to Speech acoustics unit.In order to realize essence Really modeling, especially overcomes the problems, such as " allophone " caused by coarticulation, generallys use in HMM-GMM model context-sensitive Phoneme modeling method, i.e., by single-tone submodel carry out three-tone extension, but extend after model to data requirements require it is higher, Stable model parameter estimation in order to obtain realizes parameter sharing by state binding, is improving the same of model parameter estimation When, reduce the demand to amount of training data.
But since Continuous Speech Recognition System often faces the realistic problems such as low-resource, many scholars are dedicated to acoustic model Improvement, i.e. promotion modeling accuracy while reduce demand to data volume to the greatest extent.Subspace gauss hybrid models (Subspace Gaussian Mixture Model, SGMM) Gauss is mixed into member mean value and weight limit in a Parameter Subspace, and not With sharing identical subspace parameter and covariance matrix between state, therefore each state can use several low-dimensional parameters Vector in subspace indicates, it is achieved that effective parameter sharing, greatly reduces the size of model parameter, to mention The robustness of model parameter estimation under the conditions of high restricted data, but a priori assumption is lacked for the variable of lower-dimensional subspace.
Continuous Speech Recognition System often faces the realistic problems such as low-resource, and hybrid cytokine analytical acoustics model has less Model parameter, therefore become a kind of effective ways of accurate acoustic model modeling under limited resources.But this method is assumed implicit Factor local coordinate obeys standardized normal distribution, and real data shows to assume the factor also far from normal distribution.Mixing because In sub- analysis model, the coordinate hypothesis of each regional area is normal distribution, but the study found that the part that hybrid cytokine is analyzed The true distribution of coordinate is not normal state.
Summary of the invention
In view of the above-mentioned problems, the invention proposes the construction method for the acoustic model analyzed based on deep layer hybrid cytokine, this Deep layer hybrid cytokine analysis model is introduced into the modeling process of state model by invention, is proposed based on deep layer hybrid cytokine point The acoustic model of analysis has better anti-over-fitting ability.
To achieve the goals above, the invention adopts the following technical scheme:
The construction method of acoustic model based on the analysis of deep layer hybrid cytokine, comprising the following steps:
Step 1: utilizing training data, baseline system is generated using HMM-GMM model;
Step 2: according to HMM-GMM model parameter, to the acoustic model DMFA model analyzed based on deep layer hybrid cytokine into Row initialization, the DMFA model are two layers of hybrid cytokine analysis model, i.e., DMFA model is by first layer MFA model and the second layer MFA model composition initializes DMFA model parameter using GMM cluster and Probabilistic Principal Component Analysis method;
Step 3: training data is utilized, it is special using greedy EM algorithm estimation acoustics by the baseline system of HMM-GMM model Levy the overall model parameter of the DMFA model in space;
Step 4: the state model parameter of the first layer MFA model of estimation DMFA model, the state model parameter include State relevant parameter and state independent parameter;
Step 5: the state model parameter of the second layer MFA model of estimation DMFA model.
It is further, described that DMFA model parameter is initialized using GMM cluster and Probabilistic Principal Component Analysis method, comprising:
Determine the regional area number I of first layer MFA model and the latent factor dimension in each region
Determine the subregion number I in each region of second layer MFA modeli, the total subregion number of the second layerWith The latent factor dimension of each subregion
Further, the step 3 includes:
Step 3.1: data-oriented X={ x1,x2,...,xT, carry out first layer MFA model training:
Using X using greediness EM algorithm training first layer MFA model, wherein hybrid cytokine ingredient has I, each factor Subspace dimensionThat is MiFor
Step 3.2: to the data set Y of each region ii, i ∈ [1, I], calculating characteristic vector xtBelong to i-th of regional area Posterior probability Gaussian ProfileWith characteristic vector xtBelong to i-th of regional area posterior probability p (i | xt), t ∈ [1, T], obtains x according to the following formulatThe estimated value of corresponding hybrid cytokine ingredient i
Pass through following formula pairCorresponding local coordinateIt is sampled:
?Increase to data Yi, data set after being added
Step 3.3: carry out second layer MFA model training:
?It is upper to pass through second layer hybrid cytokine coordinate vector dimensionWith hybrid cytokine ingredient number Ki, using greediness EM algorithm trains an independent second layer MFA model.
Further, the step 4 includes:
Step 4.1: the state relevant parameter of initialization first layer MFA model: Set the number of iterations K;Wherein,For the initial weight factor of first layer MFA model,For the first of first layer MFA model Beginning local coordinate, J are context-sensitive state sum;
Step 4.2: according to the state alignment information of training data, the state relevant parameter of j-th of state of revaluation:k∈[1,K];Wherein,The weight factor of first layer MFA model when iteration secondary for kth,The local coordinate of first layer MFA model when iteration secondary for kth;
Step 4.3: according to the state alignment information of training data, the state independent parameter of revaluation ith zone:Wherein,The regional area mean value of first layer MFA model when iteration secondary for kth,For kth The regional area basic matrix of first layer MFA model when secondary iteration,When for kth time iteration the state of first layer MFA model without Close covariance matrix.
Further, the step 5 includes:
Step 5.1: the state relevant parameter of initialization second layer MFA model: Selected the number of iterations K ';Wherein,For the initial weight factor of second layer MFA model,For the initial local coordinate of second layer MFA model;
Step 5.2: according to the state alignment information of training data, the state relevant parameter of j-th of state of revaluation:k∈[1,K′];Wherein,The weight factor of second layer MFA model when iteration secondary for kth,The local coordinate of second layer MFA model when iteration secondary for kth;
Step 5.3: according to the state alignment information of training data, the state independent parameter in s-th of region of revaluation:Wherein,The regional area mean value of second layer MFA model when iteration secondary for kth,It is The regional area basic matrix of second layer MFA model when k iteration,When for kth time iteration the state of second layer MFA model without Close covariance matrix.
Compared with prior art, the invention has the benefit that
State acoustic model is considered that one kind of global characteristics spatial mixing Factor Analysis Model is adaptive by the present invention, is adopted State modeling is described jointly with state relevant parameter and state independent parameter.DMFA model of the invention is based on hybrid cytokine The further in-depth and extension of analytical acoustics model, master mould assume that area coordinate parameter obeys Gaussian prior distribution, but practical In it is really not so, therefore the present invention by using deeper hybrid cytokine analysis model to area coordinate parameter using more The hybrid cytokine analysis of deep layer is modeled to obtain.Since DMFA model itself has parameter sharing strategy, have preferably Anti- over-fitting ability, the acoustic model modeling being more suitable under low-resource.
Detailed description of the invention
Fig. 1 is the basic procedure of the construction method of the acoustic model based on the analysis of deep layer hybrid cytokine of the embodiment of the present invention Figure.
Fig. 2 is the basic of the construction method of the acoustic model based on the analysis of deep layer hybrid cytokine of another embodiment of the present invention Flow chart.
Fig. 3 is the deep layer mixing of the construction method of the acoustic model based on the analysis of deep layer hybrid cytokine of the embodiment of the present invention Factorial analysis schematic diagram, (a) partially illustrate for the region of deep layer hybrid cytokine, are partially (b) figure of deep layer hybrid cytokine analysis Model.
Specific embodiment
With reference to the accompanying drawing with specific embodiment the present invention will be further explained explanation:
Embodiment one:
As shown in Figure 1, a kind of construction method of acoustic model based on the analysis of deep layer hybrid cytokine of the invention, including with Lower step:
Step S101: utilizing training data, generates baseline system using HMM-GMM model.
Step S102: according to HMM-GMM model parameter, to the acoustic model DMFA model analyzed based on deep layer hybrid cytokine It is initialized, the DMFA model is two layers of hybrid cytokine analysis model, i.e., DMFA model is by first layer MFA model and second Layer MFA model composition initializes DMFA model parameter using GMM cluster and Probabilistic Principal Component Analysis method.
Step S103: training data is utilized, by the baseline system of HMM-GMM model, using greedy EM algorithm estimation sound Learn the overall model parameter of the DMFA model of feature space.
Step S104: the state model parameter of the first layer MFA model of estimation DMFA model, the state model parameter packet Include state relevant parameter and state independent parameter.
Step S105: the state model parameter of the second layer MFA model of estimation DMFA model.
Embodiment two:
As shown in Fig. 2, the construction method of another acoustic model based on the analysis of deep layer hybrid cytokine of the invention.
The invention firstly uses the tool boxes Kaldi to generate HMM-GMM acoustic model;Then special to entire acoustics using DMFA It levies space and generates progress typicalness model, be substantially a global context model, and the model of each state can be recognized To be to be obtained by the model by certain adaptive algorithm, this adaptive algorithm is exactly to be joined using the data of each state Several revaluations.The present invention is modeled using two layers of DMFA model, and DMFA model includes two layers of MFA model, detailed process As shown in algorithm 1.
Wherein,For the initial weight factor of first layer MFA model,For the initial local of first layer MFA model Coordinate, J are context-sensitive state sum;The weight factor of first layer MFA model when iteration secondary for kth,For kth The local coordinate of first layer MFA model when secondary iteration;The regional area mean value of first layer MFA model when iteration secondary for kth,The regional area basic matrix of first layer MFA model when iteration secondary for kth,First layer MFA mould when iteration secondary for kth The unrelated covariance matrix of the state of type;For the initial weight factor of second layer MFA model,For second layer MFA model Initial local coordinate;The weight factor of second layer MFA model when iteration secondary for kth,Second when iteration secondary for kth The local coordinate of layer MFA model;The regional area mean value of second layer MFA model when iteration secondary for kth,For kth time The regional area basic matrix of second layer MFA model when iteration,The state of second layer MFA model is unrelated when iteration secondary for kth Covariance matrix.
Algorithm 1 is largely divided into four parts.First part's (the 1st row) generates according to by unit of context-sensitive three-tone Based on the baseline system of HMM-GMM model, if context-sensitive state sum is J;Second part (the 2nd~3 row) generates entire The global DMFA model of acoustic space, wherein the 2nd row initializes DMFA mould using GMM cluster and probability principal component method The parameter of type;3rd row is the layer-by-layer parameter revaluation algorithm of the DMFA model of global acoustic space;Part III (the 4th~12 row) The first layer parameter for estimating the state model of DMFA model, wherein the 4th row is the initialization of first layer MFA model, the 6th~8 row First layer state relevant parameter of estimated state model, the 9th~11 row estimated state model the first layer state independent parameter;4th Partially second layer parameter of (the 13rd~21 row) estimated state model, the 13rd row are the initialization of second layer MFA model, the 15th~ The second layer relevant parameter of 17 row estimated state models, the 18th~20 estimated state model second layer independent parameter.It may be noted that , therefore can from formula (27) and (28) as can be seen that final state model is adjusted by bottom DMFA parameter To regard upper-layer parameters as relatively-stationary.
Detailed process is as follows:
Step S201: utilizing training data, is generated using hidden Markov model-gauss hybrid models HMM-GMM model Baseline system is (referring to Povey D, Ghoshal A, Boulianne G, etc.The Kaldi speech recognition toolkit.In:Proceedings of the 2011IEEE Workshop on Automatic Speech Recognition and Understanding.IEEE,2011);
Specifically, the training data is 1987-1989WSJ text data, about 215M.
Step S202: according to HMM-GMM model parameter, to the acoustic model DMFA model analyzed based on deep layer hybrid cytokine It is initialized, the DMFA model is two layers of hybrid cytokine analysis model, i.e., DMFA model is by first layer MFA model and second Layer MFA model composition initializes DMFA model parameter using GMM cluster and Probabilistic Principal Component Analysis method, comprising:
Factorial analysis (Factor Analyzer) is a kind of high dimensional data correlation modeling method, it is by correlation information It is modeled with a low-dimensional linear subspaces.Assuming that x is higher dimensional space RDIn a sampled point, the vacation of root Factor Analysis Model If x can pass through a lower-dimensional subspace RKPoint y in (D'< D) is obtained by affine transformation:
Wherein, μ is the mean value of High dimensional space data distribution;M is a matrix of a linear transformation, referred to as factor loads matrix, Its dimension is D × D ';N is a stochastic error.In factor distributed model, y obeys D ' dimension standardized normal distribution, and random Error n obey mean value be 0, the Gaussian Profile that covariance is diagonal matrix Σ.It can be seen that largely in observation data set Near mean μ, obtained is the linear subspace model of a part.Therefore, Factor Analysis Model is well suited for pair Part in higher dimensional space is modeled, and multiple local Factor Analysis Models are combined, and just obtains hybrid cytokine analysis Model (Mixture of Factor Analysers, MFA).
Assuming that data space is divided into I regional area, the probability that observation data x falls into these regional areas is respectively w1,w2,…,wI, to each regional area respectively with a Factor Analysis Model come approximate, then hybrid cytokine analysis model is obtained, That is:
Wherein, μi、MiAnd ΣiMean value, factor loads matrix and the reconstruction error square of respectively i-th Factor Analysis Model Battle array, yiTo observe data x corresponding coordinate vector in a model.In formula (3), line corresponding to each local factors analysis model Subspace dimension may be different, and enabling the linear subspaces dimension of i-th of Factor Analysis Model is Di, then MiBe a D × DiThe matrix of dimension, local coordinate yiIt is a DiVector.There is posterior probability Gaussian distributed according to formula (2) and (3), it may be assumed that
Wherein, mean valueVariance Σyi|x=(I+Mi TΣi -1Mi)-1
Therefore the parameter of hybrid cytokine analysis model includes the potential dimension D of regional area number I, each regional areai And local factors parameter of analytic model { wii,Mii}.The parameter Estimation of the model can pass through expectation maximization (Expectation Maximization, EM) algorithm is realized.
First in E-step, the single order E (y of implicit variable is first obtainedi|xt) and second-order statisticKnown to:
Assuming that model parameter is after kth time iterationThen greediness EM is calculated in+1 iteration of kth The auxiliary function of method are as follows:
Wherein, γi(xt) it is given parameters Λ(k), characteristic vector xtBelong to the posterior probability of i-th of regional area, counts Operator expression formula are as follows:
For the convenience of derivation, enableThen in M-step, pass through auxiliary function Q (Λ(k), Λ) and to parameter wi,ΣiLocal derviation is sought, and enabling corresponding partial derivative is 0, the parameter for respectively obtaining MFA model updates public affairs Formula:
WhereinWithIt is respectively as follows:
When the convergence of MFA model after, the precision of model can be by increasing number I and the latent factor space of area of space Dimension D ' Lai Gaishan, this conditional probability for being equivalent to the x under the conditions of the latent factor y of adjustment all areas areHowever this method results in quickly when to high dimensional data modeling or data volume deficiency Overfitting problem.Therefore in order to solve this problem, the prior distribution of each latent factor y is exactly substituted Substituting simple normal distribution prior probability is more complicated MFA prior probability, then has:
Wherein,The parameter of the new MFA of the representation space region i second layer, that is, hybrid cytokine is carried out again to region i Analysis.
Fig. 3 gives the schematic diagram of deep layer hybrid cytokine analysis.Wherein (a) partially analyzes for deep layer hybrid cytokine in Fig. 3 Region signal, (b) partial left side first layer hybrid cytokine number of regions I=2 in Fig. 3, thick line represents hybrid cytokine analysis model Two regions indicate, i.e. i=1,2, analyzed again using hybrid cytokine respectively in each region, it is assumed that each region i was divided Subregion number is Ki, subregion ki=1,2 ..., KiTo indicate.Then for exemplified here, for region i=1, further learn It practises one and contains KI=1The hybrid cytokine of=4 sub-regions is analyzed, and corresponding subregion is subregion k1=1,2 ..., K1,K1 =4;And region i=2, respective sub-areas k2=1,2 ..., K2,K2=2.
So a DMFA is original DFA prior probability pMFA(y, i)=p (i) p (y | i) with better prior probability come Substitution, it may be assumed that
pDMFA(y, i)=p (i) p (ki|i)p(y|ki) (15)
Therefore, by being sampled from DMFA, first with wiI is selected, then the second layer utilizesSample ki, wherein ki= 1,2,...,Ki, KiIndicate that i-th of factor of first layer carries out the number of factorial analysis in the second layer;Finally, can use the factor Ingredient kiTo indicate y.Using more simple, equivalent a DMFA form, it is possible exactly to enumerate all second layers Factor ki.The ingredient of a specific second layer is indicated using new factor component indicator a s=1,2 ..., S, InIn Fig. 3 shown in (a) partial right side.Mixed weight can be defined asIts Middle i (s) respectively indicates s corresponding first layer factor i and second layer factor ki(s).Such as i (2)=1, and i (5)=2.Pay attention to To S size with the number of plies increase exponentially increase.The statement of this generation process is very intuitively, in part below Middle will continue uses this representation method.
In Fig. 3 (b) part indicate be one two layers DFMA graph model.In particular
i←i(s) (19)
Formula (19) is completely specified, because each s is belonged to and only belonged to an i.Wherein WithRespectively D(1)×D(1)With D(2)×D(2)The diagonal matrix of dimension.Here variable is that one kind refers to, it is important to note that y(2)Refer to the y of each ss (2), it is a that the practical second layer one shares S;y(1)Refer to i's corresponding with sWhen s is determined, i is also determining.
The deep layer hybrid cytokine analysis model for being L for any number of plies, the parameter of DFMA model is the part for including each layer Areal I(l), l=1,2 ..., the potential dimension of L, each regional areaAnd local factors analyze mould Shape parameterHere every layer of subregion is all global expression, but each subregion is pertaining only to one layer Some region.Present invention employs two layers of DMFA model modelings, indicate to be more clear, then the first layer region i It indicates, the second sub-regions are indicated with s;First layer areal is I;The number of the total subregion of the second layer is S.
The weight factor that the measurement vector of same condition j falls into i-th of regional area is wji, meet probabilistic constraints, And assume the measurement vector of the state j Gaussian distributed in i-th of regional area, the same influence for only considering mean value, if Its mean value is μ 'ji, variance is Σ 'i(variance in each each region of state is equal).Given regional area i and mean μ 'ji, state j's Observation probability are as follows:
Assuming that using deep layer Factor Analysis Model to mean μ 'jiModeled, by taking two-layer model as an example, enable mean μ 'ji? The local coordinate of i-th of regional area of first layer isAccording to formula (2), it is known that μ 'jiPrior distribution are as follows:
Assuming thatIt is in the local coordinate of s-th of regional area of the second layerGiven second layer regional area s and Then according to formula (2), it is known thatPrior distribution are as follows:
According to formula (22) and (23), to indicate the prior distribution of mean value, i.e., variable is implied in the part that can use the second layer
Formula (22) and (23), which are substituted into formula (24), can obtain μ 'jiPrior distribution are as follows:
Then according to Bayesian formula, the observation probability of state j are as follows:
In order to reduce number of parameters, enableGiven second layer area coordinate can finally be obtainedCondition Under, the observation probability model of state j are as follows:
Wherein
And have:
In HMM acoustic model, each state is substantially for describing corresponding some steady section of Acoustic Modeling unit Observation probability distribution.Therefore a few regions in the inevitable entire spatial distribution MFA model of the observational characteristic vector of each steady section, DefinitionAndWherein, the observation data distribution of state j is entirely empty Between a few regions on, thereforeIt is all necessarily sparse, in practical above formula (27)It is first layer and second The collective effect of layer weight can consider together to reduce number of parameters when direct revaluation.
It will be seen that the state model based on depth factor analysis is constituted using formula (27) and formula (28), by right The Accurate Model of each state is realized in the revaluation of parameter, it can therefore be seen that in each layer of DMFA model, state model parameter It all include two parts, first is that state relevant parameter: weight factorAnd local coordinateWherein i=1, 2,..,I;S=1,2 ..., S;Second is that state independent parameter:With
Since from algorithm 1, it is known that typicalness model is generated based on greediness EM algorithm, the initial value of DMFA is selected It takes most important.Use for reference the generation method of the typicalness model based on hybrid cytokine analytical acoustics model, typical case of the invention The generation of state model includes two parts, first is that generating the GMM of overall space from the baseline system of HMM-GMM model Model;Second is that the GMM model of overall space is expressed as deep layer hybrid cytokine analysis model.
The GMM model acquisition of overall space is similar with the acoustic model method based on MFA, but it is real to need two layers of cluster It is existing.Look first at first layer cluster, if the mixed member sum of Gauss is M in HMM-GMM baseline system, to the mixed member of Gauss in some sequence from 1 is numbered to M, if the mean value that m-th of Gauss mixes member isCovariance matrix isPressure pair is carried out to training data Together, and the corresponding zeroth order of the mixed member of each Gauss, single order and second-order statistic γ are calculatedm=∑tγm(t)、WithWherein γ m (t) is t frame feature vector otBelong to the posterior probability that m-th of Gauss mixes member, it can be with It is calculated by backward algorithm before Baum-Welch, then calculates the likelihood score of each corresponding training data of mixed member are as follows: LLKm =∑tγm(t)logp(ot|m)。
If by m ' and m " merging generates new Gauss and mixes member m after the mixed member cluster of a Gauss ", corresponding zeroth order, one Rank and second-order statistic calculate separately as γm″′m′m″, sm″′=sm′+sm″、Sm″′=Sm′+Sm″, then utilize HMM-GMM's Parameter revaluation formula calculates the weight after mergingMean value vectorAnd covariance matrixAnd training data pair after merging The penalty values of number likelihood score are as follows:
ΔLLKm′m″→m″′=LLKm″′-LLKm′-LLKm″ (29)
Cluster process is walked by M-I and obtains the GMM for mixing member containing I Gauss, in each step cluster process, to current The mixed member of Gauss is merged two-by-two, and the penalty values for merging the Likelihood Score of front and back are calculated by formula (29), and penalty values are the smallest Two Gausses mix member and merge into the mixed member of a new Gauss, and the mixed member of two Gausses before merging is deleted, the mixed member of new Gauss Weight, mean value vector and covariance matrix are distinguishedWithAfter the completion of after above-mentioned cluster process, obtain high containing I This mixes first GMM parameter
It is worth noting that, obtaining the GMM model of entire acoustic space above, some intermediate result informations is needed to be used for The generation of multilayer GMM model.Multilayer gauss hybrid models are equivalent to indicates every layer of the mixed member of Gauss with GMM model again, from And constitute multilayer GMM model.Such as two layers of GMM model is had a look first, it is assumed that the Gauss of HMM-GMM baseline system mixes metaset Close Ω={ m(0), m=1,2 ..., M }, the acquisition of first layer GMM model uses above-mentioned clustering algorithm, but for every in cluster Secondary merging all records the mixed first composition of raw Gaussian of each mixed member after merging, such as mixed for the new Gauss after kth time merging MemberIts original mixed member composition is gatheredThe process that just represent in Ω is repeatedly merged intoThose of it is original mixed Member, after the completion of cluster, GMM parameter isWherein upper right footmark 1 indicates first layer cluster;And each mix member i It is original it is mixed member set be respectively Ci, i=1,2 ..., I,For second layer GMM, the original shape for mixing member for i-th The mixed first set C of stateiIn mixed member merged using clustering algorithm identical with first layer, be merged into specified mixed first number, it is complete At the acquisition of the second layer GMM parameter of deep layer GMM model, ifMixed member after the cluster of the second layer shared the One layer is mixed first information.
If after the completion of after above-mentioned cluster process, the GMM parameter for obtaining mixing member containing I Gauss of first layer isThe mixed member of the second layerSuch as to each covariance matrix of each layer Eigenvalues analysis is carried out respectively, and characteristic value is sorted from large to small as λh1h2,…,λhD, character pair Vector isDefine d-th of characteristic value accumulation contribution rate (Cumulative Contribution Rate, CCR)ηidAre as follows:
ηidThe ratio of the sum of d characteristic value Zhan total characteristic value before reflecting.For h-th of hybrid cytokine analysis model Regional area selects its potential dimension DhForI.e. selection characteristic value accumulation contribution rate be more than 90%, Potential dimension of the smallest characteristic value serial number as the regional area.To l layers of region h, using probability principal component analysis Remaining parameter of Factor Analysis Model is answered it accordinglyIt initializes respectively.
Step S203: training data is utilized, by the baseline system of HMM-GMM model, using greedy EM algorithm estimation sound Learn the overall model parameter of the DMFA model of feature space, comprising:
The training of DMFA can be realized by a greediness EM algorithm.The MFA of first layer is trained by standard mode; Then hybrid cytokine ingredient is obtained using formula (8)The factor of each training data ingredient corresponding with this.Then freeze first Layer parameter and using extracting the first layer factor values of each ingredient as the training data of second layer MFA.Algorithm 2 gives specifically Layer-by-layer training algorithm.After greediness study, updated and traditional greedy EM algorithm steps by one DMFA of compression.
Step S204: the state model parameter of the first layer MFA model of estimation DMFA model, the state model parameter packet Include state relevant parameter and state independent parameter, comprising:
According to the modeling method for the state model that step 2 provides, first layer is joined using the state model based on MFA Number, respectively to weight factorLocal coordinateRegional area basic matrixRegional area mean value vectorAnd state Unrelated covariance matrixMaximum-likelihood criterion is respectively adopted, constructs the auxiliary function of each parameter, available parameter respectively Revaluation formula are as follows:
It should be pointed out that since each state model is substantially that some is steady for describing corresponding Acoustic Modeling unit The observation probability distribution of section, therefore the corresponding observational characteristic vector of its steady section cover state model can only in all areas Sub-fraction region.Therefore in state model observation spaceBe it is sparse, the sparsity constraints can pass through " power Shrinking again " algorithm is realized, including weight is shunk and weight normalizes two parts, weight, which is shunk, realizes sparsity constraints, and weighs Renormalization be so that weight meets statistical restraint condition, therefore it is obtained aboveFinal power is only after the algorithm Weight parameter.What all parameters obtained after parameter updates is exactly the first layer MFA acoustic model of state.
Step S205: the state model parameter of the second layer MFA model of estimation DMFA model, comprising:
From formula (27) and (28) it is found that after the first layer parameter of the DMFA model of acquisition state, state model parameter can To realize the Accurate Model of state by the variation of the second layer parameter.Therefore it can establish auxiliary function are as follows:
HereActually aboveFor the purposes of reducing number of parameters, it is incorporated into together.
Therefore it is constructed respectively on the basis of the first layer parameter using formula (36)Local coordinateRegional area base MatrixRegional area mean value vectorCovariance matrix unrelated with stateAuxiliary function, available parameter Revaluation formula are as follows:
Wherein, parameter HjsAnd gjsAs shown in formula (42), γjs、sjsAnd SjsThe respectively zeroth order of the s subregion of state j, one Rank and second-order statistic.
It should be pointed out that hereIt is same due to being modeled from all subregion Combined estimator angles AboveEqually, all meet sparse constraint, estimated using identical algorithm;Secondly, each subregion s is pertaining only to One region i, therefore when determining subregion s, then corresponding region i is also determining;In addition, occurring in formula (39)~(41) 'sIt is generalized inverse matrix, can provesThere are left inverse matrixes, andThere are right inverse matrixs.
The invention proposes a kind of acoustic model modeling methods based on the analysis of deep layer hybrid cytokine.New method reference is based on State acoustic model is considered global characteristics spatial mixing factorial analysis by the thinking of the modeling method of hybrid cytokine analysis model One kind of model is adaptive, and adoption status relevant parameter and state independent parameter model to describe state jointly.But mould of the present invention Type is further in-depth and extension based on hybrid cytokine analytical acoustics model, and master mould assumes that area coordinate parameter obeys Gauss Prior distribution, but it is really not so in practice, therefore region is sat herein by using deeper hybrid cytokine analysis model Mark parameter is modeled to obtain using the hybrid cytokine analysis of more deep layer.Since DMFA model itself has parameter sharing plan Slightly, has better anti-over-fitting ability, the acoustic model modeling being more suitable under low-resource.
Illustrated above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (5)

1. the construction method of the acoustic model based on the analysis of deep layer hybrid cytokine, which comprises the following steps:
Step 1: utilizing training data, baseline system is generated using HMM-GMM model;
Step 2: according to HMM-GMM model parameter, the acoustic model DMFA model analyzed based on deep layer hybrid cytokine being carried out just Beginningization, the DMFA model are two layers of hybrid cytokine analysis model, i.e., DMFA model is by first layer MFA model and second layer MFA Model composition initializes DMFA model parameter using GMM cluster and Probabilistic Principal Component Analysis method;
Step 3: training data is utilized, it is empty using greedy EM algorithm estimation acoustic feature by the baseline system of HMM-GMM model Between DMFA model overall model parameter;
Step 4: the state model parameter of the first layer MFA model of estimation DMFA model, the state model parameter includes state Relevant parameter and state independent parameter;
Step 5: the state model parameter of the second layer MFA model of estimation DMFA model.
2. the construction method of the acoustic model according to claim 1 based on the analysis of deep layer hybrid cytokine, which is characterized in that It is described that DMFA model parameter is initialized using GMM cluster and Probabilistic Principal Component Analysis method, comprising:
Determine the regional area number I of first layer MFA model and the latent factor dimension in each region
Determine the subregion number I in each region of second layer MFA modeli, the total subregion number of the second layerWith it is each The latent factor dimension of subregion
3. the construction method of the acoustic model according to claim 2 based on the analysis of deep layer hybrid cytokine, which is characterized in that The step 3 includes:
Step 3.1: data-oriented X={ x1,x2,...,xT, carry out first layer MFA model training:
Using X using greediness EM algorithm training first layer MFA model, wherein hybrid cytokine ingredient has I, and the son of each factor is empty Between dimensionThat is MiFor
Step 3.2: to the data set Y of each region ii, i ∈ [1, I], calculating characteristic vector xtAfter belonging to i-th of regional area Test the Gaussian Profile of probabilityWith characteristic vector xtBelong to i-th of regional area posterior probability p (i | xt), t ∈ [1, T], x is obtained according to the following formulatThe estimated value of corresponding hybrid cytokine ingredient i
Pass through following formula pairCorresponding local coordinateIt is sampled:
?Increase to data Yi, data set after being added
Step 3.3: carry out second layer MFA model training:
?It is upper to pass through second layer hybrid cytokine coordinate vector dimensionWith hybrid cytokine ingredient number Ki, using greedy EM algorithm One independent second layer MFA model of training.
4. the construction method of the acoustic model according to claim 3 based on the analysis of deep layer hybrid cytokine, which is characterized in that The step 4 includes:
Step 4.1: the state relevant parameter of initialization first layer MFA model: If Determine the number of iterations K;Wherein,For the initial weight factor of first layer MFA model,For the initial of first layer MFA model Local coordinate, J are context-sensitive state sum;
Step 4.2: according to the state alignment information of training data, the state relevant parameter of j-th of state of revaluation:Wherein,The weight factor of first layer MFA model when iteration secondary for kth, The local coordinate of first layer MFA model when iteration secondary for kth;
Step 4.3: according to the state alignment information of training data, the state independent parameter of revaluation ith zone:Wherein,The regional area mean value of first layer MFA model when iteration secondary for kth,For kth The regional area basic matrix of first layer MFA model when secondary iteration,When for kth time iteration the state of first layer MFA model without Close covariance matrix.
5. the construction method of the acoustic model according to claim 4 based on the analysis of deep layer hybrid cytokine, which is characterized in that The step 5 includes:
Step 5.1: the state relevant parameter of initialization second layer MFA model: Selected the number of iterations K ';Wherein,For the initial weight factor of second layer MFA model,For the initial local coordinate of second layer MFA model;
Step 5.2: according to the state alignment information of training data, the state relevant parameter of j-th of state of revaluation:Wherein,The weight factor of second layer MFA model when iteration secondary for kth,The local coordinate of second layer MFA model when iteration secondary for kth;
Step 5.3: according to the state alignment information of training data, the state independent parameter in s-th of region of revaluation:Wherein,The regional area mean value of second layer MFA model when iteration secondary for kth,It is The regional area basic matrix of second layer MFA model when k iteration,When for kth time iteration the state of second layer MFA model without Close covariance matrix.
CN201811537321.XA 2018-12-15 2018-12-15 Construction method of acoustic model based on deep mixing factor analysis Active CN109545201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811537321.XA CN109545201B (en) 2018-12-15 2018-12-15 Construction method of acoustic model based on deep mixing factor analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811537321.XA CN109545201B (en) 2018-12-15 2018-12-15 Construction method of acoustic model based on deep mixing factor analysis

Publications (2)

Publication Number Publication Date
CN109545201A true CN109545201A (en) 2019-03-29
CN109545201B CN109545201B (en) 2023-06-06

Family

ID=65856473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811537321.XA Active CN109545201B (en) 2018-12-15 2018-12-15 Construction method of acoustic model based on deep mixing factor analysis

Country Status (1)

Country Link
CN (1) CN109545201B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421555B (en) * 2021-08-05 2024-04-12 辽宁大学 BN-SGMM-HMM-based low-resource voice recognition method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117060A (en) * 2013-01-18 2013-05-22 中国科学院声学研究所 Modeling approach and modeling system of acoustic model used in speech recognition
CN104795063A (en) * 2015-03-20 2015-07-22 中国人民解放军信息工程大学 Acoustic model building method based on nonlinear manifold structure of acoustic space
CN108630199A (en) * 2018-06-30 2018-10-09 中国人民解放军战略支援部队信息工程大学 A kind of data processing method of acoustic model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117060A (en) * 2013-01-18 2013-05-22 中国科学院声学研究所 Modeling approach and modeling system of acoustic model used in speech recognition
CN104795063A (en) * 2015-03-20 2015-07-22 中国人民解放军信息工程大学 Acoustic model building method based on nonlinear manifold structure of acoustic space
CN108630199A (en) * 2018-06-30 2018-10-09 中国人民解放军战略支援部队信息工程大学 A kind of data processing method of acoustic model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YICHUAN TANG ET AL.: "Deep Mixtures of Factor Analysers", 《PROCEEDINGS OF THE 29 TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
张文林: "基于子空间的声学模型及自适应技术研究", 《中国博士学位论文全文数据库 信息科技辑(月刊)》 *
张文林等: "基于声学特征空间非线性流形结构的语音识别声学模型", 《自动化学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421555B (en) * 2021-08-05 2024-04-12 辽宁大学 BN-SGMM-HMM-based low-resource voice recognition method

Also Published As

Publication number Publication date
CN109545201B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
Palm et al. Cross-sectional dependence robust block bootstrap panel unit root tests
Scheines et al. Bayesian estimation and testing of structural equation models
Brooks et al. Nonparametric convergence assessment for MCMC model selection
CN105469784B (en) A kind of speaker clustering method and system based on probability linear discriminant analysis model
CN102567719B (en) Human age automatic estimation method based on posterior probability neural network
Diakoloukas et al. Maximum-likelihood stochastic-transformation adaptation of hidden Markov models
CN102663432A (en) Kernel fuzzy c-means speech emotion identification method combined with secondary identification of support vector machine
CN111222847B (en) Open source community developer recommendation method based on deep learning and unsupervised clustering
CN105895089A (en) Speech recognition method and device
CN110377752A (en) A kind of knowledge base system applied to the operation of government affairs hall
Salazar On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling
CN109214025A (en) Reservoir parameter predication method and system based on Bayes&#39;s classification
CN104795063A (en) Acoustic model building method based on nonlinear manifold structure of acoustic space
CN109948242A (en) Network representation learning method based on feature Hash
CN107908807B (en) Small subsample reliability evaluation method based on Bayesian theory
CN111144462A (en) Unknown individual identification method and device for radar signals
Kidd et al. Bayesian nonstationary and nonparametric covariance estimation for large spatial data (with discussion)
CN109545201A (en) The construction method of acoustic model based on the analysis of deep layer hybrid cytokine
CN108282424A (en) The tetradic Joint diagonalization algorithm of blind source separating is closed for four data set associatives
CN102521202B (en) Automatic discovery method of complex system oriented MAXQ task graph structure
CN110956221A (en) Small sample polarization synthetic aperture radar image classification method based on deep recursive network
CN1366295A (en) Speaker&#39;s inspection and speaker&#39;s identification system and method based on prior knowledge
CN114912719B (en) Heterogeneous traffic individual trajectory collaborative prediction method based on graph neural network
CN115760785A (en) Brain somatotropin morphology high-order feature extraction method
CN114021445B (en) Ocean vortex mixing non-locality prediction method based on random forest model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant