CN109147772A - A kind of DNN-HMM acoustic model parameters migration structure - Google Patents
A kind of DNN-HMM acoustic model parameters migration structure Download PDFInfo
- Publication number
- CN109147772A CN109147772A CN201811176930.7A CN201811176930A CN109147772A CN 109147772 A CN109147772 A CN 109147772A CN 201811176930 A CN201811176930 A CN 201811176930A CN 109147772 A CN109147772 A CN 109147772A
- Authority
- CN
- China
- Prior art keywords
- model
- dnn
- data
- parameter
- migration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013508 migration Methods 0.000 title claims abstract description 113
- 230000005012 migration Effects 0.000 title claims abstract description 104
- 238000012549 training Methods 0.000 claims abstract description 68
- 238000012546 transfer Methods 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 239000004576 sand Substances 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 10
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 9
- 238000003062 neural network model Methods 0.000 claims description 8
- 238000013461 design Methods 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000006872 improvement Effects 0.000 abstract description 3
- 230000007812 deficiency Effects 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 22
- 238000013526 transfer learning Methods 0.000 description 14
- 238000002474 experimental method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 241001061260 Emmelichthys struhsakeri Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G10L15/146—Training of HMMs with insufficient amount of training data, e.g. state sharing, tying, deleted interpolation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The present invention is by migrating improvement of the structure to small training speech database in acoustic feature for a kind of DNN-HMM acoustic model parameters, with raising acoustic model to the modeling ability of acoustic feature, reduce the Word Error Rate of speech recognition and sentence error rate under small-scale data, to train the DNN-HMM acoustic model under small training speech database, and the isomery model parameter migration models and transfer training algorithm defined, the migration of isomery model parameter is added in acoustic training model, by the parameter of DNN model for obtaining source data training move to target data train come model in, realize the parameter migration of DNN-HMM isomery model, the advantages that with this Word Error Rate and sentence error rate to reduce speech recognition, to effective solution problems of the prior art and deficiency.
Description
Technical field
The present invention relates to technical field of acoustics more particularly to DNN-HMM acoustic model parameters to migrate structure.
Background technique
Migration learning method mainly trains source domain data relevant to target domain in technical field of acoustics
Model parameter is migrated.And neural network possesses very strong modeling ability for given large-scale data, so, in order to
Enhance model to the learning ability of target data, transfer learning method combination neural network model is in natural language processing, image
Extensive development and application have been obtained in processing, machine translation and speech recognition.
Deep neural network model contains a large amount of parameter, carries out to update when the training of model under small data quantity
Whole weights of model cause the part weight in model that cannot be made full use of by small-scale data to so as to cause training
The neural network model come is low to the modeling ability of small-scale data.In order to improve model to sparse features in small-scale data
Modeling ability, 2007, the Dai Wenyuan of fourth normal form (4nf will train to come containing a large amount of source domain labeled data using transfer learning
Model parameter move to a small amount of target domain labeled data and train in the model come, while reducing source domain data and target
The not corresponding weight of identical data is distributed in FIELD Data, to improve model to the classification capacity of target domain data;
2008, transfer learning was applied in indoor WIFI positioning by Hong Kong University of Science and Thchnology laboratory, by one in the same environment
The data that a region is collected into remove building model, are learnt using data of the model to another region in the environment, are obtained
The common portion for taking two area datas connects the data in two regions using the part as a bridge, then rebuilds
Model, obtained model can position the data in two regions, and so on, so that it may navigate to entire environment
In data;2010, the research department IBM is utilized to be gone to instruct containing a large amount of label source domain data by extracting different acoustic features
Practice model, is decoded using the model on a small quantity without the target data of label, the data decoded are gone to give birth to using voting ROVER
At the label of target data, source data and target data, which are then combined the model to acoustic training model, obtained, has
Good generalization ability, in automatic speech recognition, Word Error Rate reduces about 1.2%.
Under small-scale corpus during training DNN-HMM acoustic model, due to labeled data small scale and data distribution
The unbalanced a large amount of initial parameters of appearance do not update, and model cannot describe the phonetic feature in corpus very well, cause under discrimination
The problem of drop.In voice recognition acoustic model training, it usually needs a large amount of labeled data are conducive to acoustic model to language in this way
The abundant study of sound feature reduces the word error rate and sentence error rate of speech recognition.However it is carried out for some rare foreign languages language
The voice annotation of big data quantity has difficulties, and therefore, is trained in face of the monolingual acoustic model of small data quantity for rare foreign languages language
Say the problem of acoustic training model.
In this regard, proposing one kind towards DNN-HMM acoustic model parameters transfer training method under small-scale corpus.In DNN-
Under HMM isomery model, acoustic model is respectively trained first with source corpus and target corpus, then by source corpus model hidden layer
Parameter moves to target corpus model and forms initial model, finally carries out re -training to the model after migration using target corpus
Obtain final mask.
Summary of the invention
The purpose of the present invention is to provide a kind of DNN-HMM acoustic model parameters to migrate structure, to solve in background technique
During training DNN-HMM acoustic model under small-scale corpus of proposition, due to labeled data small scale and data distribution is not
There are a large amount of initial parameters and does not update in equilibrium, and model cannot describe the phonetic feature in corpus very well, and discrimination is caused to decline
The problem of and deficiency.
To achieve the above object, the present invention provides a kind of DNN-HMM acoustic model parameters to migrate structure, using isomorphism mould
The definition of type and isomery model and its parameter moving method, and DNN-HMM model training method and isomery model parameter are migrated
Method is combined, and obtains the parameter transfer training algorithm of DNN-HMM isomery model;Include:
(1) isomorphic model parameter migrates;
(2) isomery model parameter migrates;
(3) DNN-HMM acoustic model parameters migrate;
Described provide to the migration of isomorphic model parameter is defined, wherein defining one: model structure, by deep neural network
Model structure be M, M=(N, P, F, l), wherein N is network node N={ N1, N2..., Ni..., Nl, NiRefer in neural network
I-th layer of number of nodes; I-th layer to i-th of neural network of finger+
1 layer of parameter matrix; Refer to that i-th layer of neural network is arrived i+1 layer
Weight matrix;B refers to bias vector B={ B1, B2..., Bi..., Bl-1, BiI-th layer of neural network of bias vector in finger;F=
{ g (), o () }, wherein g () indicates that the activation primitive of neural network hidden layer, o () indicate neural network output layer
Function;L refers to network depth;Definition two: data source, DS={ XS,YSAnd DT={ XT,YT, S indicates source data, and T indicates target
Data, X indicate that input training data, Y indicate label data;Define three: isomorphic model refers to source model MSWith object module MT's
N, l and F is identical, indicates MS=MT;It defines four: the migration of isomorphic model parameter: referring to and using source data DSThe source model M of buildingSIn
WSAnd BSReplace target data DTThe object module M of buildingTIn WTAnd BT, migration models tr-M is obtained, M is worked asS=MTWhen, show
MSW in modelSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSMould
Parameter matrix moves to M in typeTOn the corresponding position of model parameter;
Described provide to the migration of isomery model parameter is defined, wherein defining five: isomery model refers to source model MSWith mesh
Mark model MTL it is identical, F is identical, N1To N1-1It is identical, N1It is not identical, indicate MS< > MT;Define six: isomery model parameter is moved
It moves, refers to and using source data DSThe source model M of buildingSMiddle part WSAnd BSTo target data DTThe object module M of buildingTIn WT
And BTIt is replaced, obtains migration models tr-M;
The isomery model parameter migration models and transfer training of definition are calculated in the DNN-HMM acoustic model parameters migration
The migration of isomery model parameter is added in acoustic training model for method, is moved by the parameter for the DNN model for obtaining source data training
It moves on to target data to train in the model come, realizes the parameter migration of DNN-HMM isomery model.
Preferably, the isomorphic model parameter migrates specific algorithm, works as MS=MTWhen, show MSW in modelSAnd BSWith MTMould
W in typeTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSModel Parameter matrix moves to MT
On the corresponding position of model parameter, algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT
Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);// initialization
B:MS←train(XS, YS, MS);
C:MT←MS;
D:tr-M ← train (XT, YT, MT)。
Preferably, described under isomery model, due to Nl-1It is not identical, directly the training of source domain data cannot be obtained
Model parameter is directly trained in the model come with corresponding relationship migration target domain data, increases the difficulty of parameter migration,
Isomery model parameter migrate process as shown in Figure 1, and in Heterogeneous Neural Network model, MSIn modelWith MTIn model
It is not identical, i.e.,MSIn modelWith MTIn modelBelong to homotype matrix, i.e.,So parameter matrix directly cannot be migrated, design parameter when carrying out model parameter migration
Migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT
Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);
B:MS←train(XS, YS, MS);
C:MT←initalize(MT);
D:
E:tr-M ← train (XT, YT, MT);.
Preferably, the DNN-HMM acoustic model parameters migration is obtained first with source data to DNN-HMM model training
S is named as to source modelDNN;Then, is obtained by object module and is named as T for DNN-HMM model training using target dataDNN,
Wherein, source data and target data select different scales, different language data, finally, by SDNNModel parameter moves to TDNNMould
In type, by again to migration after the training of model obtain tr-DNN model, concrete model parameter transition process such as Fig. 2 institute
Show, and SDNNModel is trained by source data Lai TDNNModel is trained by target data Lai as m ∈ N1, n ∈ Nl-1, k ∈ Nl,
u∈Nl, wherein SDNN.m=TDNN.m, SDNN.n=TDNN.n, SDNN.k≠TDNN.u, cause And SDNN.B=TDNN.B, andS can be releasedDNNModel with
TDNNModel belongs to isomery model, it may be assumed that SDNN< > TDNN, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT
Indicate the label data of target data, output: tr-DNN, //tr-DNN indicate DNN model after migration,
A:initalize (SDNN);
B:SDNN←train(XS, Ys, SDNN);
C:TDNN←initalize(TDNN);
D:
E:TDNN.B←SDNN.B;
F:tr-DNN ← train (XT, YT, TDNN);.
Preferably, it is described using TIMIT English corpus as data, Tibetan language corpus is as target data, and due to Tibetan language
Corpus scale is smaller, using it as small-scale corpus.
Due to the application of the above technical scheme, compared with the prior art, the invention has the following advantages:
The present invention by the way that DNN-HMM acoustic model parameters are migrated improvement of the structure to small training speech database in acoustic feature,
With acoustic model is improved to the modeling ability of acoustic feature, the Word Error Rate and sentence for reducing speech recognition under small-scale data are wrong
Accidentally rate, to train the DNN-HMM acoustic model under small training speech database, and the isomery model parameter migration models defined and migration
The migration of isomery model parameter is added in acoustic training model for training algorithm, by the way that source data is trained obtained DNN model
Parameter moves to target data and trains in the model come, realizes the parameter migration of DNN-HMM isomery model, is reduced with this
The advantages of Word Error Rate and sentence error rate of speech recognition, thus effective solution problems of the prior art and not
Foot.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present invention, schematic reality of the invention
It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is isomery model parameter transition process structural schematic diagram of the invention.
Fig. 2 is DNN-HMM acoustic model parameters transition process structural schematic diagram of the invention.
Fig. 3 is word incorrect migration effect diagram of the invention.
Fig. 4 is sentence incorrect migration effect diagram of the invention.
Fig. 5 is acoustic model parameters migration results table under different scales set of source data of the invention.
Fig. 6 is acoustic model parameters migration results schematic diagram under the different hidden layer numbers of plies of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.
Referring to Figure 1 to Fig. 3, the present invention provides a kind of DNN-HMM acoustic model parameters migration structure technology scheme:
A kind of DNN-HMM acoustic model parameters migration structure, using the definition and its parameter of isomorphic model and isomery model
Moving method, and DNN-HMM model training method and isomery model parameter moving method are combined, show that DNN-HMM is different
The parameter transfer training algorithm of structure model;Include:
(1) isomorphic model parameter migrates;
(2) isomery model parameter migrates;
(3) DNN-HMM acoustic model parameters migrate;
Described provide to the migration of isomorphic model parameter is defined, wherein defining one: model structure, by deep neural network
Model structure be M, M=(N, P, F, l), wherein N is network node N={ N1, N2..., Ni..., Nl, NiRefer in neural network
I-th layer of number of nodes; Refer to i-th layer to i-th of neural network
+ 1 layer of parameter matrix; Refer to the layer of neural network to i+1 layer
Weight matrix;B refers to bias vector B={ B1, B2..., Bi..., Bl-1, BiI-th layer of neural network of bias vector in finger;F=
{ g (), o () }, wherein g () indicates that the activation primitive of neural network hidden layer, o () indicate neural network output layer
Function;L refers to network depth;Definition two: data source, DS={ XS,YSAnd DT={ XT,YT, S indicates source data, and T indicates mesh
Data are marked, X indicates that input training data, Y indicate label data;Define three: isomorphic model refers to source model MSWith object module MT
N, l it is identical with F, indicate MS=MT;It defines four: the migration of isomorphic model parameter: referring to and using source data DSThe source model M of buildingS
Middle WSAnd BSReplace target data DTThe object module M of buildingTIn WTAnd BT, migration models tr-M is obtained, M is worked asS=MTWhen, table
Bright MSW in modelSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationS
Model Parameter matrix moves to MTOn the corresponding position of model parameter;
Described provide to the migration of isomery model parameter is defined, wherein defining five: isomery model refers to source model MSWith mesh
Mark model MTL it is identical, F is identical, N1To Nl-1It is identical, NlIt is not identical, indicate MS<>MT;Define six: isomery model parameter migrates,
Refer to and is using source data DSThe source model M of buildingSMiddle part WSAnd BSTo target data DTThe object module M of buildingTIn WTAnd BT
It is replaced, obtains migration models tr-M;
The isomery model parameter migration models and transfer training of definition are calculated in the DNN-HMM acoustic model parameters migration
The migration of isomery model parameter is added in acoustic training model for method, is moved by the parameter for the DNN model for obtaining source data training
It moves on to target data to train in the model come, realizes the parameter migration of DNN-HMM isomery model.
Specifically, isomorphic model parameter migrates specific algorithm, work as MS=MTWhen, show MSW in modelSAnd BSWith MTIn model
WTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSModel Parameter matrix moves to MTModel
On the corresponding position of parameter, algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT
Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);// initialization
B:MS←train(XS, YS, MS);
C:MT←MS;
D:tr-M ← train (XT, YT, MT)。
Specifically, under isomery model, due to Nl-1Model that is not identical, cannot directly obtaining the training of source domain data
Parameter is directly trained in the model come with corresponding relationship migration target domain data, increases the difficulty of parameter migration, isomery
Model parameter migrate process as shown in Figure 1, and in Heterogeneous Neural Network model, MSIn modelWith MTIn modelNot phase
Together, i.e.,MSIn modelWith MTIn modelBelong to homotype matrix, i.e.,Institute
Parameter matrix directly cannot be migrated, design parameter migration algorithm is as follows when carrying out model parameter migration:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT
Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);
B:MS←train(XS, YS, MS);
C:MT←initalize(MT);
D:
E:tr-M ← train (XT, YT, MT);.
Specifically, DNN-HMM acoustic model parameters are migrated first with source data to DNN-HMM model training, source is obtained
Model is named as SDNN;Then, is obtained by object module and is named as T for DNN-HMM model training using target dataDNN, wherein
Source data and target data select different scales, different language data, finally, by SDNNModel parameter moves to TDNNIn model,
By again to migration after the training of model obtain tr-DNN model, concrete model parameter transition process as shown in Fig. 2, and
SDNNModel is trained by source data Lai TDNNModel is trained by target data Lai as m ∈ N1, n ∈ Nl-1, k ∈ Nl, u ∈
Nl, wherein SDNN.m=TDNN.m, SDNN.n=TDNN.n, SDNN.k≠TDNN.u, cause And SDNN.B=TDNN.B, andS can be releasedDNNModel with
TDNNModel belongs to isomery model, it may be assumed that SDNN<>TDNN, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT
Indicate the label data of target data, output: tr-DNN, //tr-DNN indicate DNN model after migration,
A:initalize (SDNN);
B:SDNN←train(XS, YS, SDNN);
C:TDNN←initalize(TDNN);
D:
E:TDNN.B←SDNN.B;
F:tr-DNN ← train (XT, YT, TDNN);.
Specifically, using TIMIT English corpus as data, Tibetan language corpus is as target data, and due to Tibetan language corpus
Library scale is smaller, using it as small-scale corpus.
Specific implementation step of the invention:
When carrying out voice recognition acoustic model training, S_DNN model is trained using TIMIT data, and T_DNN
The data that model training uses use Tibetan language corpus, number of users and universal range for Tibetan language corpus, due to language
Narrow, there is very big difficulties in the production and collection of corpus, directly result in the scale of Tibetan language corpus relative to English
It is too small for language and Chinese.So using TIMIT English corpus as data herein, Tibetan language corpus is as target data.
When the Tibetan language corpus used, Tibetan language corpus is divided into training part and part of detecting, training part is using several
Sentence Tibetan language corpus and part of detecting equally use several Tibetan language corpus.And all experiments are all based on the realization of kaldi platform.
In the entire experiment process, the acoustic feature that training S_DNN model, T_DNN model and tr-DNN model use
It is the MFCC feature of 39 dimensions.
In order to illustrate parameter moving method to the validity of acoustic training model, 2 class comparative experimentss are devised:
(1) result that acoustic model parameters migrate under the corpus of different scales source is verified: to 4 experiment sides of experimental design
Formula, TIMIT corpus share 3696 as target data one, are classified as four different size of scales, and respectively 1000
Sentence, 2000,3000 and 3696.Using three-tone as Acoustic Modeling unit, the hidden layer number of plies is 4 layers, and comparison uses
Influence of the different scales corpus to acoustic model modeling ability after addition transfer learning, experimental result are as shown in Figure 5.
From can be seen that becoming larger with TIMIT corpus scale shown in Fig. 5, the Tibetan language language of parameter moving method is added
The Word Error Rate and sentence error rate of sound identification constantly reduce on training set, but Word Error Rate and error rate on test set
Reach minimum when the scale of TIMIT data set is 3000, has been 28.17% and 36.96% respectively.
Experiment shows when the scale of TIMIT data set is 3000, the effect of transfer learning for Tibetan voice identification
It is best.In this regard, subsequent other test the comparison tested using the data set of the scale.
(2) influence of the verifying hidden layer number of plies to parameter migration acoustic model: optimal TIMIT language in shown in Fig. 5 is used
Expect that library scale carries out parameter transfer training, the discrimination of comparison migration front and back Tibetan voice identification to Tibetan language acoustic model.Meanwhile
When Tibetan language acoustic model after training transfer, the continuous number of plies for increasing hidden layer compares the different hidden layer numbers of plies to migration
The influence of Tibetan voice identification afterwards.The hidden layer number of plies is respectively set to 4 layers, 5 layers, 6 layers and 7 layers to test, experiment knot
Fruit is as shown in Figure 6.
It can be concluded that, the Tibetan voice identification after parameter transfer learning is than the Tibetan voice before migration from shown in Fig. 6
Identify that discrimination is high, as the number of plies of hidden layer is from when increasing to 6 layers of increase for 4 layers, the Tibetan voice based on transfer learning is identified
Word Error Rate is lower and lower;But when the hidden layer number of plies is 7 layers, the Tibetan voice based on transfer learning identifies Word Error Rate phase
For preceding several layers of trend for presenting liter.Word Error Rate and sentence error rate difference when the hidden layer number of plies is 6 layers, on training set
2.94% and 9.06% are had dropped, the Word Error Rate and sentence error rate on test set have dropped 2.72% and 15.22% respectively,
At this point, after transfer learning Tibetan voice identification achieved the effect that it is best.
Influence in order to illustrate the hidden layer number of plies to migration effect, defines parameter migration effect: parameter migration effect=
The error rate of model after error rate-migration of baseline model, parameter migration effect is divided into Word Error Rate migration effect again and sentence is wrong
Accidentally rate migration effect, the Word Error Rate of model, sentence mistake after the Word Error Rate-migration of Word Error Rate migration effect=baseline model
The sentence error rate of model after rate migration effect=baseline model sentence error rate-migration.
According to the definition of word mistake and sentence error rate migration effect, migration effect analysis is carried out to experimental result shown in Fig. 6,
Word incorrect migration effect picture is obtained, as shown in figure 3, also having arrived sentence incorrect migration effect picture simultaneously, as shown in Figure 4.
From figs. 3 and 4 it can be seen that the effect of parameter migration is presented when the number of plies of hidden layer is from when increasing to 6 layers for 4 layers
The trend risen, the model hidden layer after illustrating migration have achieved the effect that the study of target data phonetic feature best.
But when the hidden layer number of plies is 7 layers, parameter migration effect is had dropped, and is illustrated when the hidden layer number of plies is deepened,
The hidden layer of model has carried out deeper extraction to phonetic feature, source model hidden layer learnt source data language feature with
There is field discomfort in the more different parts of target data language feature, the model after leading to migration when modeling to target data
The problem of answering.
The algorithm is applied to the hiding DNN-HMM by parameter transfer training algorithm of the parameter transfer learning under small-scale data
It is verified in phonics model, by comparing influence and mind of the size of different scales source corpus to parameter transfer learning
This several groups of experiments are influenced on transfer learning performance through the hidden layer number of plies in network, the results showed that
(1) source data of different scales affects model to the modeling ability of target data, when to source model training, source
Data scale be not it is bigger, the effect of parameter migration will be better, but source data scale is by target data scale
Influence, only when source data scale and target data scale reach a suitable ratio, parameter migration can be only achieved one
Good effect.
(2) in model training be added parameter migration method so that migration after model target data is possessed it is stronger
Modeling ability.
(3) the hidden layer number of plies affects effect of the transfer learning in Tibetan voice identification, with the increasing of the hidden layer number of plies
Add, then the learning ability of model was promoted before this after parameter migration declines, illustrate the parameter transfer learning under same quantity of data
Ability is limited.By this 3 points above, the validity of parameter moving method is illustrated.
In the training process of parameter migration, the parameter migration of model is considered only, there is no consider source data
There is the adjustments in field between target data, subsequent Model suitability to be added in parameter transfer training method
Algorithm, it is desired to be able to improve target data to the adaptability of model after migration.
In summary: the present invention is by migrating structure to small training speech database in acoustic feature for DNN-HMM acoustic model parameters
In improvement, have and improve acoustic model to the modeling ability of acoustic feature, the word for reducing speech recognition under small-scale data is wrong
Accidentally rate and sentence error rate, to train the DNN-HMM acoustic model under small training speech database, and the isomery model parameter migration defined
The migration of isomery model parameter is added in acoustic training model for model and transfer training algorithm, by obtaining source data training
DNN model parameter move to target data train come model in, realize DNN-HMM isomery model parameter migration,
With this Word Error Rate and sentence error rate to reduce speech recognition the advantages that, so that effective solution is existing in the prior art
Problem and shortage.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding
And modification, the scope of the present invention is defined by the appended.
Claims (5)
1. a kind of DNN-HMM acoustic model parameters migrate structure, it is characterised in that: using the definition of isomorphic model and isomery model
And its parameter moving method, and DNN-HMM model training method and isomery model parameter moving method are combined, it obtains
The parameter transfer training algorithm of DNN-HMM isomery model;Include:
(1) isomorphic model parameter migrates;
(2) isomery model parameter migrates;
(3) DNN-HMM acoustic model parameters migrate;
Described provide to the migration of isomorphic model parameter is defined, wherein defining one: model structure, by the mould of deep neural network
Type structure is M, M=(N, P, F, l), and wherein N is network node N={ N1, N2..., Ni..., Nl, NiRefer to i-th in neural network
The number of nodes of layer;P= Refer to that i-th layer of neural network is arrived i+1
The parameter matrix of layer; Refer to i-th layer of the neural network power for arriving i+1 layer
Value matrix;B refers to bias vector B={ B1, B2..., Bi..., B1-1, BiI-th layer of neural network of bias vector in finger;F={ g
(), o () }, wherein g () indicates that the activation primitive of neural network hidden layer, o () indicate the letter of neural network output layer
Number;L refers to network depth;Definition two: data source, DS={ Xs,YsAnd DT={ XT,YT, S indicates source data, and T indicates number of targets
According to X indicates that input training data, Y indicate label data;Define three: isomorphic model refers to source model MSWith object module MTN, l
It is identical with F, indicate MS=MT;It defines four: the migration of isomorphic model parameter: referring to and using source data DSThe source model M of buildingSMiddle WSWith
BSReplace target data DTThe object module M of buildingTIn WTAnd BT, migration models tr-M is obtained, M is worked asS=MTWhen, show MsMould
W in typeSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSJoin in model
Matrix number moves to MTOn the corresponding position of model parameter;
Described provide to the migration of isomery model parameter is defined, wherein defining five: isomery model refers to source model MsWith target mould
Type MTL it is identical, F is identical, N1To N1-1It is identical, N1It is not identical, indicate MS< > MT;Define six: the migration of isomery model parameter refers to
Using source data DSThe source model M of buildingSMiddle part WSAnd BSTo target data DTThe object module M of buildingTIn WTAnd BTInto
Row replacement, obtains migration models tr-M;
DNN-HMM acoustic model parameters migration by the isomery model parameter migration models and transfer training algorithm of definition,
The migration of isomery model parameter is added when acoustic training model, is moved to by the parameter for the DNN model for obtaining source data training
Target data trains in the model come, realizes the parameter migration of DNN-HMM isomery model.
2. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: the isomorphism
Model parameter migrates specific algorithm, works as MS=MTWhen, show MsW in modelSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix,
It can be directly by M when carrying out model parameter migrationSModel Parameter matrix moves to MTOn the corresponding position of model parameter, calculate
Method is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YTIt indicates
The label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);// initialization
B:Ms←train(XS, YS, MS);
C:MT←MS;
D:tr-M ← train (XT, YT, MT)。
3. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: described different
Under structure model, due to Nl-1It is not identical, directly the model parameter that the training of source domain data obtains directly cannot be moved with corresponding relationship
It moves target domain data to train in the model come, increases the difficulty of parameter migration, isomery model parameter migrates process such as Fig. 1
It is shown, and in Heterogeneous Neural Network model, MSIn modelWith MTIn modelIt is not identical, i.e.,MSMould
In typeWith MTIn modelBelong to homotype matrix, i.e.,So carrying out model parameter migration
When, parameter matrix cannot directly be migrated, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YTIt indicates
The label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);
B:MS←train(XS, YS, MS);
C:MT←initalize(MT);
D:
E:tr-M ← train (XT, YT, MT);.
4. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: the DNN-
The migration of HMM acoustic model parameters, to DNN-HMM model training, obtains source model and is named as S first with source dataDNN;Then,
Using target data to DNN-HMM model training, obtains object module and be named as TDNN, wherein source data and target data are selected
Different scales, different language data, finally, by SDNNModel parameter moves to TDNNIn model, by again to migration rear mold
The training of type obtains tr-DNN model, and concrete model parameter transition process is as shown in Fig. 2, and SDNNModel is trained by source data
Come, TDNNModel is trained by target data Lai as m ∈ N1, n ∈ Nl-1, k ∈ Nl, u ∈ Nl, wherein SDNN.m=TDNN.m,
SDNN.n=TDNN.n, SDNN.k≠TDNN.u, cause And SDNN.B=
TDNN.B, andS can be releasedDNNModel and TDNNModel belongs to isomery model, it may be assumed that SDNN<>
TDNN, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YTIt indicates
The label data of target data, output: tr-DNN, //tr-DNN indicate DNN model after migration,
A:initalize (SDNN);
B:sDNN←train(XS, YS, SDNN);
C:TDNN←initalize(TDNN);
D:
E:TDNN.B←SDNN.B;
F:tr-DNN ← train (XT, YT, TDNN);.
5. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: the use
As data, Tibetan language corpus is done as target data, and since Tibetan language corpus scale is smaller using it TIMIT English corpus
For small-scale corpus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811176930.7A CN109147772A (en) | 2018-10-10 | 2018-10-10 | A kind of DNN-HMM acoustic model parameters migration structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811176930.7A CN109147772A (en) | 2018-10-10 | 2018-10-10 | A kind of DNN-HMM acoustic model parameters migration structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109147772A true CN109147772A (en) | 2019-01-04 |
Family
ID=64810867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811176930.7A Pending CN109147772A (en) | 2018-10-10 | 2018-10-10 | A kind of DNN-HMM acoustic model parameters migration structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109147772A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428818A (en) * | 2019-08-09 | 2019-11-08 | 中国科学院自动化研究所 | The multilingual speech recognition modeling of low-resource, audio recognition method |
CN112133290A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field |
CN113239967A (en) * | 2021-04-14 | 2021-08-10 | 北京达佳互联信息技术有限公司 | Character recognition model training method, recognition method, related equipment and storage medium |
WO2022121185A1 (en) * | 2020-12-11 | 2022-06-16 | 平安科技(深圳)有限公司 | Model training method and apparatus, dialect recognition method and apparatus, and server and storage medium |
CN115662409A (en) * | 2022-10-27 | 2023-01-31 | 亿铸科技(杭州)有限责任公司 | Voice recognition method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009080309A (en) * | 2007-09-26 | 2009-04-16 | Toshiba Corp | Speech recognition device, speech recognition method, speech recognition program and recording medium in which speech recogntion program is recorded |
US20170221474A1 (en) * | 2016-02-02 | 2017-08-03 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Training Language Models to Reduce Recognition Errors |
CN107481717A (en) * | 2017-08-01 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of acoustic training model method and system |
CN108109615A (en) * | 2017-12-21 | 2018-06-01 | 内蒙古工业大学 | A kind of construction and application method of the Mongol acoustic model based on DNN |
-
2018
- 2018-10-10 CN CN201811176930.7A patent/CN109147772A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009080309A (en) * | 2007-09-26 | 2009-04-16 | Toshiba Corp | Speech recognition device, speech recognition method, speech recognition program and recording medium in which speech recogntion program is recorded |
US20170221474A1 (en) * | 2016-02-02 | 2017-08-03 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Training Language Models to Reduce Recognition Errors |
CN107481717A (en) * | 2017-08-01 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of acoustic training model method and system |
CN108109615A (en) * | 2017-12-21 | 2018-06-01 | 内蒙古工业大学 | A kind of construction and application method of the Mongol acoustic model based on DNN |
Non-Patent Citations (2)
Title |
---|
CHIEN-TING LIN ET AL.: "A preliminary study on cross-language knowledge transfer for low-resource Taianese Mandarin ASR", 《2016 CONFERENCE OF THE ORIENTAL CHAPTER OF INTERNATIONAL COMMITTEE FOR COORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES》 * |
MING SUN ET AL.: "An Empirical Study of Cross-Lingual Transfer Learning Techniques for Small-Footprint Keyword Spotting", 《2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112133290A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field |
CN110428818A (en) * | 2019-08-09 | 2019-11-08 | 中国科学院自动化研究所 | The multilingual speech recognition modeling of low-resource, audio recognition method |
CN110428818B (en) * | 2019-08-09 | 2021-09-28 | 中国科学院自动化研究所 | Low-resource multi-language voice recognition model and voice recognition method |
WO2022121185A1 (en) * | 2020-12-11 | 2022-06-16 | 平安科技(深圳)有限公司 | Model training method and apparatus, dialect recognition method and apparatus, and server and storage medium |
CN113239967A (en) * | 2021-04-14 | 2021-08-10 | 北京达佳互联信息技术有限公司 | Character recognition model training method, recognition method, related equipment and storage medium |
CN115662409A (en) * | 2022-10-27 | 2023-01-31 | 亿铸科技(杭州)有限责任公司 | Voice recognition method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109147772A (en) | A kind of DNN-HMM acoustic model parameters migration structure | |
CN109902171B (en) | Text relation extraction method and system based on hierarchical knowledge graph attention model | |
CN111090461B (en) | Code annotation generation method based on machine translation model | |
CN107239446B (en) | A kind of intelligence relationship extracting method based on neural network Yu attention mechanism | |
CN105512209B (en) | The biomedical event trigger word recognition methods that a kind of feature based learns automatically | |
CN103154936B (en) | For the method and system of robotization text correction | |
CN108733792A (en) | A kind of entity relation extraction method | |
US11580975B2 (en) | Systems and methods for response selection in multi-party conversations with dynamic topic tracking | |
CN108363816A (en) | Open entity relation extraction method based on sentence justice structural model | |
CN109933664A (en) | A kind of fine granularity mood analysis improved method based on emotion word insertion | |
WO2022001333A1 (en) | Hyperbolic space representation and label text interaction-based fine-grained entity recognition method | |
CN106228980A (en) | Data processing method and device | |
CN110059160A (en) | A kind of knowledge base answering method and device based on context end to end | |
CN108427665A (en) | A kind of text automatic generation method based on LSTM type RNN models | |
CN110516095A (en) | Weakly supervised depth Hash social activity image search method and system based on semanteme migration | |
CN110059191A (en) | A kind of text sentiment classification method and device | |
CN105261358A (en) | N-gram grammar model constructing method for voice identification and voice identification system | |
CN109065029A (en) | A kind of small-scale corpus DNN-HMM acoustic model | |
CN111310441A (en) | Text correction method, device, terminal and medium based on BERT (binary offset transcription) voice recognition | |
CN110222347A (en) | A kind of detection method that digresses from the subject of writing a composition | |
CN109635105A (en) | A kind of more intension recognizing methods of Chinese text and system | |
CN107145514A (en) | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models | |
CN107679225A (en) | A kind of reply generation method based on keyword | |
CN110119443A (en) | A kind of sentiment analysis method towards recommendation service | |
CN109145287A (en) | Indonesian word error-detection error-correction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |
|
RJ01 | Rejection of invention patent application after publication |