CN109147772A - A kind of DNN-HMM acoustic model parameters migration structure - Google Patents

A kind of DNN-HMM acoustic model parameters migration structure Download PDF

Info

Publication number
CN109147772A
CN109147772A CN201811176930.7A CN201811176930A CN109147772A CN 109147772 A CN109147772 A CN 109147772A CN 201811176930 A CN201811176930 A CN 201811176930A CN 109147772 A CN109147772 A CN 109147772A
Authority
CN
China
Prior art keywords
model
dnn
data
parameter
migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811176930.7A
Other languages
Chinese (zh)
Inventor
马志强
李图雅
韩佳俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN201811176930.7A priority Critical patent/CN109147772A/en
Publication of CN109147772A publication Critical patent/CN109147772A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • G10L15/146Training of HMMs with insufficient amount of training data, e.g. state sharing, tying, deleted interpolation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The present invention is by migrating improvement of the structure to small training speech database in acoustic feature for a kind of DNN-HMM acoustic model parameters, with raising acoustic model to the modeling ability of acoustic feature, reduce the Word Error Rate of speech recognition and sentence error rate under small-scale data, to train the DNN-HMM acoustic model under small training speech database, and the isomery model parameter migration models and transfer training algorithm defined, the migration of isomery model parameter is added in acoustic training model, by the parameter of DNN model for obtaining source data training move to target data train come model in, realize the parameter migration of DNN-HMM isomery model, the advantages that with this Word Error Rate and sentence error rate to reduce speech recognition, to effective solution problems of the prior art and deficiency.

Description

A kind of DNN-HMM acoustic model parameters migration structure
Technical field
The present invention relates to technical field of acoustics more particularly to DNN-HMM acoustic model parameters to migrate structure.
Background technique
Migration learning method mainly trains source domain data relevant to target domain in technical field of acoustics Model parameter is migrated.And neural network possesses very strong modeling ability for given large-scale data, so, in order to Enhance model to the learning ability of target data, transfer learning method combination neural network model is in natural language processing, image Extensive development and application have been obtained in processing, machine translation and speech recognition.
Deep neural network model contains a large amount of parameter, carries out to update when the training of model under small data quantity Whole weights of model cause the part weight in model that cannot be made full use of by small-scale data to so as to cause training The neural network model come is low to the modeling ability of small-scale data.In order to improve model to sparse features in small-scale data Modeling ability, 2007, the Dai Wenyuan of fourth normal form (4nf will train to come containing a large amount of source domain labeled data using transfer learning Model parameter move to a small amount of target domain labeled data and train in the model come, while reducing source domain data and target The not corresponding weight of identical data is distributed in FIELD Data, to improve model to the classification capacity of target domain data; 2008, transfer learning was applied in indoor WIFI positioning by Hong Kong University of Science and Thchnology laboratory, by one in the same environment The data that a region is collected into remove building model, are learnt using data of the model to another region in the environment, are obtained The common portion for taking two area datas connects the data in two regions using the part as a bridge, then rebuilds Model, obtained model can position the data in two regions, and so on, so that it may navigate to entire environment In data;2010, the research department IBM is utilized to be gone to instruct containing a large amount of label source domain data by extracting different acoustic features Practice model, is decoded using the model on a small quantity without the target data of label, the data decoded are gone to give birth to using voting ROVER At the label of target data, source data and target data, which are then combined the model to acoustic training model, obtained, has Good generalization ability, in automatic speech recognition, Word Error Rate reduces about 1.2%.
Under small-scale corpus during training DNN-HMM acoustic model, due to labeled data small scale and data distribution The unbalanced a large amount of initial parameters of appearance do not update, and model cannot describe the phonetic feature in corpus very well, cause under discrimination The problem of drop.In voice recognition acoustic model training, it usually needs a large amount of labeled data are conducive to acoustic model to language in this way The abundant study of sound feature reduces the word error rate and sentence error rate of speech recognition.However it is carried out for some rare foreign languages language The voice annotation of big data quantity has difficulties, and therefore, is trained in face of the monolingual acoustic model of small data quantity for rare foreign languages language Say the problem of acoustic training model.
In this regard, proposing one kind towards DNN-HMM acoustic model parameters transfer training method under small-scale corpus.In DNN- Under HMM isomery model, acoustic model is respectively trained first with source corpus and target corpus, then by source corpus model hidden layer Parameter moves to target corpus model and forms initial model, finally carries out re -training to the model after migration using target corpus Obtain final mask.
Summary of the invention
The purpose of the present invention is to provide a kind of DNN-HMM acoustic model parameters to migrate structure, to solve in background technique During training DNN-HMM acoustic model under small-scale corpus of proposition, due to labeled data small scale and data distribution is not There are a large amount of initial parameters and does not update in equilibrium, and model cannot describe the phonetic feature in corpus very well, and discrimination is caused to decline The problem of and deficiency.
To achieve the above object, the present invention provides a kind of DNN-HMM acoustic model parameters to migrate structure, using isomorphism mould The definition of type and isomery model and its parameter moving method, and DNN-HMM model training method and isomery model parameter are migrated Method is combined, and obtains the parameter transfer training algorithm of DNN-HMM isomery model;Include:
(1) isomorphic model parameter migrates;
(2) isomery model parameter migrates;
(3) DNN-HMM acoustic model parameters migrate;
Described provide to the migration of isomorphic model parameter is defined, wherein defining one: model structure, by deep neural network Model structure be M, M=(N, P, F, l), wherein N is network node N={ N1, N2..., Ni..., Nl, NiRefer in neural network I-th layer of number of nodes; I-th layer to i-th of neural network of finger+ 1 layer of parameter matrix; Refer to that i-th layer of neural network is arrived i+1 layer Weight matrix;B refers to bias vector B={ B1, B2..., Bi..., Bl-1, BiI-th layer of neural network of bias vector in finger;F= { g (), o () }, wherein g () indicates that the activation primitive of neural network hidden layer, o () indicate neural network output layer Function;L refers to network depth;Definition two: data source, DS={ XS,YSAnd DT={ XT,YT, S indicates source data, and T indicates target Data, X indicate that input training data, Y indicate label data;Define three: isomorphic model refers to source model MSWith object module MT's N, l and F is identical, indicates MS=MT;It defines four: the migration of isomorphic model parameter: referring to and using source data DSThe source model M of buildingSIn WSAnd BSReplace target data DTThe object module M of buildingTIn WTAnd BT, migration models tr-M is obtained, M is worked asS=MTWhen, show MSW in modelSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSMould Parameter matrix moves to M in typeTOn the corresponding position of model parameter;
Described provide to the migration of isomery model parameter is defined, wherein defining five: isomery model refers to source model MSWith mesh Mark model MTL it is identical, F is identical, N1To N1-1It is identical, N1It is not identical, indicate MS< > MT;Define six: isomery model parameter is moved It moves, refers to and using source data DSThe source model M of buildingSMiddle part WSAnd BSTo target data DTThe object module M of buildingTIn WT And BTIt is replaced, obtains migration models tr-M;
The isomery model parameter migration models and transfer training of definition are calculated in the DNN-HMM acoustic model parameters migration The migration of isomery model parameter is added in acoustic training model for method, is moved by the parameter for the DNN model for obtaining source data training It moves on to target data to train in the model come, realizes the parameter migration of DNN-HMM isomery model.
Preferably, the isomorphic model parameter migrates specific algorithm, works as MS=MTWhen, show MSW in modelSAnd BSWith MTMould W in typeTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSModel Parameter matrix moves to MT On the corresponding position of model parameter, algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);// initialization
B:MS←train(XS, YS, MS);
C:MT←MS
D:tr-M ← train (XT, YT, MT)。
Preferably, described under isomery model, due to Nl-1It is not identical, directly the training of source domain data cannot be obtained Model parameter is directly trained in the model come with corresponding relationship migration target domain data, increases the difficulty of parameter migration, Isomery model parameter migrate process as shown in Figure 1, and in Heterogeneous Neural Network model, MSIn modelWith MTIn model It is not identical, i.e.,MSIn modelWith MTIn modelBelong to homotype matrix, i.e.,So parameter matrix directly cannot be migrated, design parameter when carrying out model parameter migration Migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);
B:MS←train(XS, YS, MS);
C:MT←initalize(MT);
D:
E:tr-M ← train (XT, YT, MT);.
Preferably, the DNN-HMM acoustic model parameters migration is obtained first with source data to DNN-HMM model training S is named as to source modelDNN;Then, is obtained by object module and is named as T for DNN-HMM model training using target dataDNN, Wherein, source data and target data select different scales, different language data, finally, by SDNNModel parameter moves to TDNNMould In type, by again to migration after the training of model obtain tr-DNN model, concrete model parameter transition process such as Fig. 2 institute Show, and SDNNModel is trained by source data Lai TDNNModel is trained by target data Lai as m ∈ N1, n ∈ Nl-1, k ∈ Nl, u∈Nl, wherein SDNN.m=TDNN.m, SDNN.n=TDNN.n, SDNN.k≠TDNN.u, cause And SDNN.B=TDNN.B, andS can be releasedDNNModel with TDNNModel belongs to isomery model, it may be assumed that SDNN< > TDNN, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT Indicate the label data of target data, output: tr-DNN, //tr-DNN indicate DNN model after migration,
A:initalize (SDNN);
B:SDNN←train(XS, Ys, SDNN);
C:TDNN←initalize(TDNN);
D:
E:TDNN.B←SDNN.B;
F:tr-DNN ← train (XT, YT, TDNN);.
Preferably, it is described using TIMIT English corpus as data, Tibetan language corpus is as target data, and due to Tibetan language Corpus scale is smaller, using it as small-scale corpus.
Due to the application of the above technical scheme, compared with the prior art, the invention has the following advantages:
The present invention by the way that DNN-HMM acoustic model parameters are migrated improvement of the structure to small training speech database in acoustic feature, With acoustic model is improved to the modeling ability of acoustic feature, the Word Error Rate and sentence for reducing speech recognition under small-scale data are wrong Accidentally rate, to train the DNN-HMM acoustic model under small training speech database, and the isomery model parameter migration models defined and migration The migration of isomery model parameter is added in acoustic training model for training algorithm, by the way that source data is trained obtained DNN model Parameter moves to target data and trains in the model come, realizes the parameter migration of DNN-HMM isomery model, is reduced with this The advantages of Word Error Rate and sentence error rate of speech recognition, thus effective solution problems of the prior art and not Foot.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present invention, schematic reality of the invention It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is isomery model parameter transition process structural schematic diagram of the invention.
Fig. 2 is DNN-HMM acoustic model parameters transition process structural schematic diagram of the invention.
Fig. 3 is word incorrect migration effect diagram of the invention.
Fig. 4 is sentence incorrect migration effect diagram of the invention.
Fig. 5 is acoustic model parameters migration results table under different scales set of source data of the invention.
Fig. 6 is acoustic model parameters migration results schematic diagram under the different hidden layer numbers of plies of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.
Referring to Figure 1 to Fig. 3, the present invention provides a kind of DNN-HMM acoustic model parameters migration structure technology scheme:
A kind of DNN-HMM acoustic model parameters migration structure, using the definition and its parameter of isomorphic model and isomery model Moving method, and DNN-HMM model training method and isomery model parameter moving method are combined, show that DNN-HMM is different The parameter transfer training algorithm of structure model;Include:
(1) isomorphic model parameter migrates;
(2) isomery model parameter migrates;
(3) DNN-HMM acoustic model parameters migrate;
Described provide to the migration of isomorphic model parameter is defined, wherein defining one: model structure, by deep neural network Model structure be M, M=(N, P, F, l), wherein N is network node N={ N1, N2..., Ni..., Nl, NiRefer in neural network I-th layer of number of nodes; Refer to i-th layer to i-th of neural network + 1 layer of parameter matrix; Refer to the layer of neural network to i+1 layer Weight matrix;B refers to bias vector B={ B1, B2..., Bi..., Bl-1, BiI-th layer of neural network of bias vector in finger;F= { g (), o () }, wherein g () indicates that the activation primitive of neural network hidden layer, o () indicate neural network output layer Function;L refers to network depth;Definition two: data source, DS={ XS,YSAnd DT={ XT,YT, S indicates source data, and T indicates mesh Data are marked, X indicates that input training data, Y indicate label data;Define three: isomorphic model refers to source model MSWith object module MT N, l it is identical with F, indicate MS=MT;It defines four: the migration of isomorphic model parameter: referring to and using source data DSThe source model M of buildingS Middle WSAnd BSReplace target data DTThe object module M of buildingTIn WTAnd BT, migration models tr-M is obtained, M is worked asS=MTWhen, table Bright MSW in modelSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationS Model Parameter matrix moves to MTOn the corresponding position of model parameter;
Described provide to the migration of isomery model parameter is defined, wherein defining five: isomery model refers to source model MSWith mesh Mark model MTL it is identical, F is identical, N1To Nl-1It is identical, NlIt is not identical, indicate MS<>MT;Define six: isomery model parameter migrates, Refer to and is using source data DSThe source model M of buildingSMiddle part WSAnd BSTo target data DTThe object module M of buildingTIn WTAnd BT It is replaced, obtains migration models tr-M;
The isomery model parameter migration models and transfer training of definition are calculated in the DNN-HMM acoustic model parameters migration The migration of isomery model parameter is added in acoustic training model for method, is moved by the parameter for the DNN model for obtaining source data training It moves on to target data to train in the model come, realizes the parameter migration of DNN-HMM isomery model.
Specifically, isomorphic model parameter migrates specific algorithm, work as MS=MTWhen, show MSW in modelSAnd BSWith MTIn model WTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSModel Parameter matrix moves to MTModel On the corresponding position of parameter, algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);// initialization
B:MS←train(XS, YS, MS);
C:MT←MS
D:tr-M ← train (XT, YT, MT)。
Specifically, under isomery model, due to Nl-1Model that is not identical, cannot directly obtaining the training of source domain data Parameter is directly trained in the model come with corresponding relationship migration target domain data, increases the difficulty of parameter migration, isomery Model parameter migrate process as shown in Figure 1, and in Heterogeneous Neural Network model, MSIn modelWith MTIn modelNot phase Together, i.e.,MSIn modelWith MTIn modelBelong to homotype matrix, i.e.,Institute Parameter matrix directly cannot be migrated, design parameter migration algorithm is as follows when carrying out model parameter migration:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT Indicate the label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);
B:MS←train(XS, YS, MS);
C:MT←initalize(MT);
D:
E:tr-M ← train (XT, YT, MT);.
Specifically, DNN-HMM acoustic model parameters are migrated first with source data to DNN-HMM model training, source is obtained Model is named as SDNN;Then, is obtained by object module and is named as T for DNN-HMM model training using target dataDNN, wherein Source data and target data select different scales, different language data, finally, by SDNNModel parameter moves to TDNNIn model, By again to migration after the training of model obtain tr-DNN model, concrete model parameter transition process as shown in Fig. 2, and SDNNModel is trained by source data Lai TDNNModel is trained by target data Lai as m ∈ N1, n ∈ Nl-1, k ∈ Nl, u ∈ Nl, wherein SDNN.m=TDNN.m, SDNN.n=TDNN.n, SDNN.k≠TDNN.u, cause And SDNN.B=TDNN.B, andS can be releasedDNNModel with TDNNModel belongs to isomery model, it may be assumed that SDNN<>TDNN, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YT Indicate the label data of target data, output: tr-DNN, //tr-DNN indicate DNN model after migration,
A:initalize (SDNN);
B:SDNN←train(XS, YS, SDNN);
C:TDNN←initalize(TDNN);
D:
E:TDNN.B←SDNN.B;
F:tr-DNN ← train (XT, YT, TDNN);.
Specifically, using TIMIT English corpus as data, Tibetan language corpus is as target data, and due to Tibetan language corpus Library scale is smaller, using it as small-scale corpus.
Specific implementation step of the invention:
When carrying out voice recognition acoustic model training, S_DNN model is trained using TIMIT data, and T_DNN The data that model training uses use Tibetan language corpus, number of users and universal range for Tibetan language corpus, due to language Narrow, there is very big difficulties in the production and collection of corpus, directly result in the scale of Tibetan language corpus relative to English It is too small for language and Chinese.So using TIMIT English corpus as data herein, Tibetan language corpus is as target data.
When the Tibetan language corpus used, Tibetan language corpus is divided into training part and part of detecting, training part is using several Sentence Tibetan language corpus and part of detecting equally use several Tibetan language corpus.And all experiments are all based on the realization of kaldi platform.
In the entire experiment process, the acoustic feature that training S_DNN model, T_DNN model and tr-DNN model use It is the MFCC feature of 39 dimensions.
In order to illustrate parameter moving method to the validity of acoustic training model, 2 class comparative experimentss are devised:
(1) result that acoustic model parameters migrate under the corpus of different scales source is verified: to 4 experiment sides of experimental design Formula, TIMIT corpus share 3696 as target data one, are classified as four different size of scales, and respectively 1000 Sentence, 2000,3000 and 3696.Using three-tone as Acoustic Modeling unit, the hidden layer number of plies is 4 layers, and comparison uses Influence of the different scales corpus to acoustic model modeling ability after addition transfer learning, experimental result are as shown in Figure 5.
From can be seen that becoming larger with TIMIT corpus scale shown in Fig. 5, the Tibetan language language of parameter moving method is added The Word Error Rate and sentence error rate of sound identification constantly reduce on training set, but Word Error Rate and error rate on test set Reach minimum when the scale of TIMIT data set is 3000, has been 28.17% and 36.96% respectively.
Experiment shows when the scale of TIMIT data set is 3000, the effect of transfer learning for Tibetan voice identification It is best.In this regard, subsequent other test the comparison tested using the data set of the scale.
(2) influence of the verifying hidden layer number of plies to parameter migration acoustic model: optimal TIMIT language in shown in Fig. 5 is used Expect that library scale carries out parameter transfer training, the discrimination of comparison migration front and back Tibetan voice identification to Tibetan language acoustic model.Meanwhile When Tibetan language acoustic model after training transfer, the continuous number of plies for increasing hidden layer compares the different hidden layer numbers of plies to migration The influence of Tibetan voice identification afterwards.The hidden layer number of plies is respectively set to 4 layers, 5 layers, 6 layers and 7 layers to test, experiment knot Fruit is as shown in Figure 6.
It can be concluded that, the Tibetan voice identification after parameter transfer learning is than the Tibetan voice before migration from shown in Fig. 6 Identify that discrimination is high, as the number of plies of hidden layer is from when increasing to 6 layers of increase for 4 layers, the Tibetan voice based on transfer learning is identified Word Error Rate is lower and lower;But when the hidden layer number of plies is 7 layers, the Tibetan voice based on transfer learning identifies Word Error Rate phase For preceding several layers of trend for presenting liter.Word Error Rate and sentence error rate difference when the hidden layer number of plies is 6 layers, on training set 2.94% and 9.06% are had dropped, the Word Error Rate and sentence error rate on test set have dropped 2.72% and 15.22% respectively, At this point, after transfer learning Tibetan voice identification achieved the effect that it is best.
Influence in order to illustrate the hidden layer number of plies to migration effect, defines parameter migration effect: parameter migration effect= The error rate of model after error rate-migration of baseline model, parameter migration effect is divided into Word Error Rate migration effect again and sentence is wrong Accidentally rate migration effect, the Word Error Rate of model, sentence mistake after the Word Error Rate-migration of Word Error Rate migration effect=baseline model The sentence error rate of model after rate migration effect=baseline model sentence error rate-migration.
According to the definition of word mistake and sentence error rate migration effect, migration effect analysis is carried out to experimental result shown in Fig. 6, Word incorrect migration effect picture is obtained, as shown in figure 3, also having arrived sentence incorrect migration effect picture simultaneously, as shown in Figure 4.
From figs. 3 and 4 it can be seen that the effect of parameter migration is presented when the number of plies of hidden layer is from when increasing to 6 layers for 4 layers The trend risen, the model hidden layer after illustrating migration have achieved the effect that the study of target data phonetic feature best.
But when the hidden layer number of plies is 7 layers, parameter migration effect is had dropped, and is illustrated when the hidden layer number of plies is deepened, The hidden layer of model has carried out deeper extraction to phonetic feature, source model hidden layer learnt source data language feature with There is field discomfort in the more different parts of target data language feature, the model after leading to migration when modeling to target data The problem of answering.
The algorithm is applied to the hiding DNN-HMM by parameter transfer training algorithm of the parameter transfer learning under small-scale data It is verified in phonics model, by comparing influence and mind of the size of different scales source corpus to parameter transfer learning This several groups of experiments are influenced on transfer learning performance through the hidden layer number of plies in network, the results showed that
(1) source data of different scales affects model to the modeling ability of target data, when to source model training, source Data scale be not it is bigger, the effect of parameter migration will be better, but source data scale is by target data scale Influence, only when source data scale and target data scale reach a suitable ratio, parameter migration can be only achieved one Good effect.
(2) in model training be added parameter migration method so that migration after model target data is possessed it is stronger Modeling ability.
(3) the hidden layer number of plies affects effect of the transfer learning in Tibetan voice identification, with the increasing of the hidden layer number of plies Add, then the learning ability of model was promoted before this after parameter migration declines, illustrate the parameter transfer learning under same quantity of data Ability is limited.By this 3 points above, the validity of parameter moving method is illustrated.
In the training process of parameter migration, the parameter migration of model is considered only, there is no consider source data There is the adjustments in field between target data, subsequent Model suitability to be added in parameter transfer training method Algorithm, it is desired to be able to improve target data to the adaptability of model after migration.
In summary: the present invention is by migrating structure to small training speech database in acoustic feature for DNN-HMM acoustic model parameters In improvement, have and improve acoustic model to the modeling ability of acoustic feature, the word for reducing speech recognition under small-scale data is wrong Accidentally rate and sentence error rate, to train the DNN-HMM acoustic model under small training speech database, and the isomery model parameter migration defined The migration of isomery model parameter is added in acoustic training model for model and transfer training algorithm, by obtaining source data training DNN model parameter move to target data train come model in, realize DNN-HMM isomery model parameter migration, With this Word Error Rate and sentence error rate to reduce speech recognition the advantages that, so that effective solution is existing in the prior art Problem and shortage.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (5)

1. a kind of DNN-HMM acoustic model parameters migrate structure, it is characterised in that: using the definition of isomorphic model and isomery model And its parameter moving method, and DNN-HMM model training method and isomery model parameter moving method are combined, it obtains The parameter transfer training algorithm of DNN-HMM isomery model;Include:
(1) isomorphic model parameter migrates;
(2) isomery model parameter migrates;
(3) DNN-HMM acoustic model parameters migrate;
Described provide to the migration of isomorphic model parameter is defined, wherein defining one: model structure, by the mould of deep neural network Type structure is M, M=(N, P, F, l), and wherein N is network node N={ N1, N2..., Ni..., Nl, NiRefer to i-th in neural network The number of nodes of layer;P= Refer to that i-th layer of neural network is arrived i+1 The parameter matrix of layer; Refer to i-th layer of the neural network power for arriving i+1 layer Value matrix;B refers to bias vector B={ B1, B2..., Bi..., B1-1, BiI-th layer of neural network of bias vector in finger;F={ g (), o () }, wherein g () indicates that the activation primitive of neural network hidden layer, o () indicate the letter of neural network output layer Number;L refers to network depth;Definition two: data source, DS={ Xs,YsAnd DT={ XT,YT, S indicates source data, and T indicates number of targets According to X indicates that input training data, Y indicate label data;Define three: isomorphic model refers to source model MSWith object module MTN, l It is identical with F, indicate MS=MT;It defines four: the migration of isomorphic model parameter: referring to and using source data DSThe source model M of buildingSMiddle WSWith BSReplace target data DTThe object module M of buildingTIn WTAnd BT, migration models tr-M is obtained, M is worked asS=MTWhen, show MsMould W in typeSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, it can be directly by M when carrying out model parameter migrationSJoin in model Matrix number moves to MTOn the corresponding position of model parameter;
Described provide to the migration of isomery model parameter is defined, wherein defining five: isomery model refers to source model MsWith target mould Type MTL it is identical, F is identical, N1To N1-1It is identical, N1It is not identical, indicate MS< > MT;Define six: the migration of isomery model parameter refers to Using source data DSThe source model M of buildingSMiddle part WSAnd BSTo target data DTThe object module M of buildingTIn WTAnd BTInto Row replacement, obtains migration models tr-M;
DNN-HMM acoustic model parameters migration by the isomery model parameter migration models and transfer training algorithm of definition, The migration of isomery model parameter is added when acoustic training model, is moved to by the parameter for the DNN model for obtaining source data training Target data trains in the model come, realizes the parameter migration of DNN-HMM isomery model.
2. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: the isomorphism Model parameter migrates specific algorithm, works as MS=MTWhen, show MsW in modelSAnd BSWith MTW in modelTAnd BTBelong to homotype matrix, It can be directly by M when carrying out model parameter migrationSModel Parameter matrix moves to MTOn the corresponding position of model parameter, calculate Method is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YTIt indicates The label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);// initialization
B:Ms←train(XS, YS, MS);
C:MT←MS
D:tr-M ← train (XT, YT, MT)。
3. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: described different Under structure model, due to Nl-1It is not identical, directly the model parameter that the training of source domain data obtains directly cannot be moved with corresponding relationship It moves target domain data to train in the model come, increases the difficulty of parameter migration, isomery model parameter migrates process such as Fig. 1 It is shown, and in Heterogeneous Neural Network model, MSIn modelWith MTIn modelIt is not identical, i.e.,MSMould In typeWith MTIn modelBelong to homotype matrix, i.e.,So carrying out model parameter migration When, parameter matrix cannot directly be migrated, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YTIt indicates The label data of target data, output: tr-M, //tr-M indicate model after migration,
A:initalize (MS);
B:MS←train(XS, YS, MS);
C:MT←initalize(MT);
D:
E:tr-M ← train (XT, YT, MT);.
4. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: the DNN- The migration of HMM acoustic model parameters, to DNN-HMM model training, obtains source model and is named as S first with source dataDNN;Then, Using target data to DNN-HMM model training, obtains object module and be named as TDNN, wherein source data and target data are selected Different scales, different language data, finally, by SDNNModel parameter moves to TDNNIn model, by again to migration rear mold The training of type obtains tr-DNN model, and concrete model parameter transition process is as shown in Fig. 2, and SDNNModel is trained by source data Come, TDNNModel is trained by target data Lai as m ∈ N1, n ∈ Nl-1, k ∈ Nl, u ∈ Nl, wherein SDNN.m=TDNN.m, SDNN.n=TDNN.n, SDNN.k≠TDNN.u, cause And SDNN.B= TDNN.B, andS can be releasedDNNModel and TDNNModel belongs to isomery model, it may be assumed that SDNN<> TDNN, design parameter migration algorithm is as follows:
Input: XS, YS, XT, YT, //XSIndicate source data, YSIndicate the labeled data of source data;XTIndicate target data, YTIt indicates The label data of target data, output: tr-DNN, //tr-DNN indicate DNN model after migration,
A:initalize (SDNN);
B:sDNN←train(XS, YS, SDNN);
C:TDNN←initalize(TDNN);
D:
E:TDNN.B←SDNN.B;
F:tr-DNN ← train (XT, YT, TDNN);.
5. a kind of DNN-HMM acoustic model parameters according to claim 1 migrate structure, it is characterised in that: the use As data, Tibetan language corpus is done as target data, and since Tibetan language corpus scale is smaller using it TIMIT English corpus For small-scale corpus.
CN201811176930.7A 2018-10-10 2018-10-10 A kind of DNN-HMM acoustic model parameters migration structure Pending CN109147772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811176930.7A CN109147772A (en) 2018-10-10 2018-10-10 A kind of DNN-HMM acoustic model parameters migration structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811176930.7A CN109147772A (en) 2018-10-10 2018-10-10 A kind of DNN-HMM acoustic model parameters migration structure

Publications (1)

Publication Number Publication Date
CN109147772A true CN109147772A (en) 2019-01-04

Family

ID=64810867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811176930.7A Pending CN109147772A (en) 2018-10-10 2018-10-10 A kind of DNN-HMM acoustic model parameters migration structure

Country Status (1)

Country Link
CN (1) CN109147772A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428818A (en) * 2019-08-09 2019-11-08 中国科学院自动化研究所 The multilingual speech recognition modeling of low-resource, audio recognition method
CN112133290A (en) * 2019-06-25 2020-12-25 南京航空航天大学 Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field
CN113239967A (en) * 2021-04-14 2021-08-10 北京达佳互联信息技术有限公司 Character recognition model training method, recognition method, related equipment and storage medium
WO2022121185A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Model training method and apparatus, dialect recognition method and apparatus, and server and storage medium
CN115662409A (en) * 2022-10-27 2023-01-31 亿铸科技(杭州)有限责任公司 Voice recognition method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009080309A (en) * 2007-09-26 2009-04-16 Toshiba Corp Speech recognition device, speech recognition method, speech recognition program and recording medium in which speech recogntion program is recorded
US20170221474A1 (en) * 2016-02-02 2017-08-03 Mitsubishi Electric Research Laboratories, Inc. Method and System for Training Language Models to Reduce Recognition Errors
CN107481717A (en) * 2017-08-01 2017-12-15 百度在线网络技术(北京)有限公司 A kind of acoustic training model method and system
CN108109615A (en) * 2017-12-21 2018-06-01 内蒙古工业大学 A kind of construction and application method of the Mongol acoustic model based on DNN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009080309A (en) * 2007-09-26 2009-04-16 Toshiba Corp Speech recognition device, speech recognition method, speech recognition program and recording medium in which speech recogntion program is recorded
US20170221474A1 (en) * 2016-02-02 2017-08-03 Mitsubishi Electric Research Laboratories, Inc. Method and System for Training Language Models to Reduce Recognition Errors
CN107481717A (en) * 2017-08-01 2017-12-15 百度在线网络技术(北京)有限公司 A kind of acoustic training model method and system
CN108109615A (en) * 2017-12-21 2018-06-01 内蒙古工业大学 A kind of construction and application method of the Mongol acoustic model based on DNN

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHIEN-TING LIN ET AL.: "A preliminary study on cross-language knowledge transfer for low-resource Taianese Mandarin ASR", 《2016 CONFERENCE OF THE ORIENTAL CHAPTER OF INTERNATIONAL COMMITTEE FOR COORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES》 *
MING SUN ET AL.: "An Empirical Study of Cross-Lingual Transfer Learning Techniques for Small-Footprint Keyword Spotting", 《2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112133290A (en) * 2019-06-25 2020-12-25 南京航空航天大学 Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field
CN110428818A (en) * 2019-08-09 2019-11-08 中国科学院自动化研究所 The multilingual speech recognition modeling of low-resource, audio recognition method
CN110428818B (en) * 2019-08-09 2021-09-28 中国科学院自动化研究所 Low-resource multi-language voice recognition model and voice recognition method
WO2022121185A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Model training method and apparatus, dialect recognition method and apparatus, and server and storage medium
CN113239967A (en) * 2021-04-14 2021-08-10 北京达佳互联信息技术有限公司 Character recognition model training method, recognition method, related equipment and storage medium
CN115662409A (en) * 2022-10-27 2023-01-31 亿铸科技(杭州)有限责任公司 Voice recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109147772A (en) A kind of DNN-HMM acoustic model parameters migration structure
CN109902171B (en) Text relation extraction method and system based on hierarchical knowledge graph attention model
CN111090461B (en) Code annotation generation method based on machine translation model
CN107239446B (en) A kind of intelligence relationship extracting method based on neural network Yu attention mechanism
CN105512209B (en) The biomedical event trigger word recognition methods that a kind of feature based learns automatically
CN103154936B (en) For the method and system of robotization text correction
CN108733792A (en) A kind of entity relation extraction method
US11580975B2 (en) Systems and methods for response selection in multi-party conversations with dynamic topic tracking
CN108363816A (en) Open entity relation extraction method based on sentence justice structural model
CN109933664A (en) A kind of fine granularity mood analysis improved method based on emotion word insertion
WO2022001333A1 (en) Hyperbolic space representation and label text interaction-based fine-grained entity recognition method
CN106228980A (en) Data processing method and device
CN110059160A (en) A kind of knowledge base answering method and device based on context end to end
CN108427665A (en) A kind of text automatic generation method based on LSTM type RNN models
CN110516095A (en) Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN110059191A (en) A kind of text sentiment classification method and device
CN105261358A (en) N-gram grammar model constructing method for voice identification and voice identification system
CN109065029A (en) A kind of small-scale corpus DNN-HMM acoustic model
CN111310441A (en) Text correction method, device, terminal and medium based on BERT (binary offset transcription) voice recognition
CN110222347A (en) A kind of detection method that digresses from the subject of writing a composition
CN109635105A (en) A kind of more intension recognizing methods of Chinese text and system
CN107145514A (en) Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN107679225A (en) A kind of reply generation method based on keyword
CN110119443A (en) A kind of sentiment analysis method towards recommendation service
CN109145287A (en) Indonesian word error-detection error-correction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication