CN104751227A - Method and system for constructing deep neural network - Google Patents

Method and system for constructing deep neural network Download PDF

Info

Publication number
CN104751227A
CN104751227A CN201310755400.9A CN201310755400A CN104751227A CN 104751227 A CN104751227 A CN 104751227A CN 201310755400 A CN201310755400 A CN 201310755400A CN 104751227 A CN104751227 A CN 104751227A
Authority
CN
China
Prior art keywords
hidden layer
node number
neural network
deep neural
successively
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310755400.9A
Other languages
Chinese (zh)
Other versions
CN104751227B (en
Inventor
潘嘉
何婷婷
刘聪
王智国
胡国平
张仕良
胡郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310755400.9A priority Critical patent/CN104751227B/en
Publication of CN104751227A publication Critical patent/CN104751227A/en
Application granted granted Critical
Publication of CN104751227B publication Critical patent/CN104751227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method and a system for constructing a deep neural network. The method comprises the steps of determining the number of nodes at a deep neural network input layer and the number of the nodes in an output layer; acquiring training data; determining the number of deep neural network hidden layers and the number of nodes of a first hidden layer; determining the number of the nodes at subsequent hidden layers according to the data quantity of the training data, the number of the hidden layers, and the number of nodes of the first hidden layer; enabling the number of the nodes of the hidden layer to be gradually reduced; determining the model parameters of the deep neural network according to the training data to obtain the deep neural network. Compared with the deep neural network in the prior art, the deep neural network constructed by the method and system has the advantages that the number of parameters of the neural network can be greatly decreased, the storage space required is reduced, and the training speed of the model is increased.

Description

The construction method of deep neural network and system
Technical field
The present invention relates to signal transacting field, particularly relate to a kind of construction method and system of deep neural network.
Background technology
Namely speech recognition allows machine understand people's word, voice signal is converted into the discernible input of computing machine.Over nearly 20 years, speech recognition technology achieves remarkable effect, starts to move towards market from laboratory.At present based on the phonetic entry of speech recognition technology, speech retrieval, voiced translation etc. obtain uses widely.Along with the progress of science and technology, the explosive increase of information, the speech data that can obtain also gets more and more, and how to utilize the data of magnanimity to train a speech recognition system, and making phonetic recognization rate reach higher is a difficult problem in practical application.
Tradition Automatic continuous speech recognition system mainly adopts the GMM-HMM speech recognition system based on Hidden Markov Model (HMM) (HiddenMarkov Model, HMM) and gauss hybrid models (Gaussian Mixture Model, GMM).GMM-HMM speech recognition system uses HMM to carry out modeling to the sequential organization of voice signal, and the output probability of each HMM state adopts mixed Gauss model simulation.In recent years based on deep neural network (Deep Neural Networks, DNN) and the DNN-HMM speech recognition system of Hidden Markov Model (HMM) be subject to researchist and more and more pay close attention to, DNN-HMM system adopts DNN to substitute the output probability that GMM simulates each HMM state.Compared to GMM model, the descriptive power of DNN model is stronger, can simulate very complicated Data distribution8 better, and can learn the information of data context well, therefore relative to GMM-HMM system, DNN-HMM system can obtain significant performance boost.
Although but DNN-HMM system has clear superiority in performance, still more difficult popularization in actual applications, main cause is that the model complexity of DNN-HMM is higher, and when model training and decoding, required time is all far beyond GMM-HMM system.Such as hidden layer number has 6 at least in DNN model under normal circumstances, and the node number of each hidden layer presets identical numerical value by system, as 2048 or 2560 nodes.The topological structure of obvious described model is comparatively complicated and model parameter is numerous, brings larger computing pressure, cause running efficiency of system excessively slow, be unfavorable for the popularization that system is practical and renewal to the model training on large database concept and subsequent voice decoding.
Summary of the invention
The object of the invention is to overcome deficiency of the prior art, a kind of construction method and system of deep neural network is provided, by effective control of the node number to each hidden layer, greatly reduce the redundancy of deep neural network interior joint.
For achieving the above object, technical scheme of the present invention is:
A construction method for deep neural network, comprising:
Determine the node number of deep neural network input layer and the node number of output layer;
Obtain training data;
Determine the number of plies of deep neural network hidden layer and the node number of ground floor hidden layer:
According to the node number of the data volume of described training data, the number of plies of hidden layer and ground floor hidden layer, determine the node number of follow-up hidden layer, and the node number of different hidden layer is tapered off change;
The model parameter of described deep neural network obtains deep neural network to utilize described training data to determine.
Preferably, the described node number determining follow-up hidden layer, and the change that tapers off of the node number of different hidden layer is comprised:
According to data volume determination decreasing fashion and the ratio of successively decreasing of described training data;
According to the node number of the decreasing fashion determined and the follow-up hidden layer of the ratio-dependent that successively decreases.
Preferably, the described data volume determination decreasing fashion according to described training data and ratio of successively decreasing comprise:
The node number of follow-up each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer.
Preferably, the described data volume determination decreasing fashion according to described training data and ratio of successively decreasing comprise:
The number of plies is less than or equal to each hidden layer of number of plies threshold value, the node number of wherein each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer;
The number of plies is greater than to each hidden layer of described number of plies threshold value, the node number of wherein each odd-level hidden layer is successively decreased according to described ratio of successively decreasing on the basis of the node number of its previous odd-level hidden layer, makes the node number of wherein each even level hidden layer equal the node number of its previous hidden layer.
Preferably, the ratio of successively decreasing of a kth hidden layer is: 1/p k-1, wherein, 1≤p≤2.
A constructing system for deep neural network, comprising:
Input and output layer determining unit, for the node number of the node number and output layer of determining deep neural network input layer;
Data capture unit, for obtaining training data;
Hidden layer first determining unit, the node number for the number of plies and ground floor hidden layer of determining deep neural network hidden layer:
Hidden layer second determining unit, for the node number according to the data volume of described training data, the number of plies of hidden layer and ground floor hidden layer, determines the node number of follow-up hidden layer, and the node number of different hidden layer is tapered off change;
Model parameter determining unit, determines that for utilizing described training data the model parameter of described deep neural network obtains deep neural network.
Preferably, described hidden layer second determining unit comprises:
Decline mode determining unit, for according to the data volume determination decreasing fashion of described training data and ratio of successively decreasing;
Nodes determining unit, for the node number according to the decreasing fashion determined and the follow-up hidden layer of the ratio-dependent that successively decreases.
Preferably, described decline mode determining unit, specifically for determining that according to the data volume of described training data the node number of follow-up each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer.
Preferably, described decline mode determining unit, specifically for being less than or equal to each hidden layer of number of plies threshold value for the number of plies, determines that the node number of wherein each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer; The number of plies is greater than to each hidden layer of described number of plies threshold value, determine that the node number of wherein each odd-level hidden layer is successively decreased according to described ratio of successively decreasing on the basis of the node number of its previous odd-level hidden layer, make the node number of wherein each even level hidden layer equal the node number of its previous hidden layer.
Preferably, the ratio of successively decreasing of a kth hidden layer is: 1/p k-1, wherein, 1≤p≤2.
Beneficial effect of the present invention is:
1. compared with the deep neural network identical with the node number of each hidden layer general at present, the deep neural network that application the present invention builds greatly can reduce the number of parameters of neural network, thus reduces required storage space and accelerate the training speed of model;
2. due to the time decreased of computing mode output probability when neural network parameter minimizing makes to decode, the deep neural network adopting the present invention to build is applied to speech recognition system, the decoding speed of the final identification of speech recognition system can be improved, thus in practice, have better real-time;
3. the present invention does not affect for the recognition performance that speech recognition system is final when neural network parameter significantly reduces, and under the constant prerequisite of neural network parameter, recognition performance can also be promoted by the method increased near the hidden layer node number of input layer.
Accompanying drawing explanation
In order to be illustrated more clearly in technical scheme of the invention process, be briefly described to the accompanying drawing used required in embodiment below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 shows the process flow diagram of the construction method of embodiment of the present invention deep neural network;
Fig. 2 shows in the embodiment of the present invention a kind of process flow diagram of the method for the node number determining follow-up hidden layer;
Fig. 3 shows in the embodiment of the present invention the another kind of process flow diagram of the method for the node number determining follow-up hidden layer;
Fig. 4 shows the structural representation of the constructing system of embodiment of the present invention deep neural network.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
In order to the scheme making those skilled in the art person understand the embodiment of the present invention better, first brief description is done to the training process of traditional DNN model below.
The training process of tradition DNN model comprises:
Step one: the topological structure determining DNN model;
Particularly, the input layer of DNN and output layer correspond respectively to the output state of acoustic feature and HMM model, and its node number can pre-determine before training.And the node number of the number of plies of hidden layer and each hidden layer also rule of thumb presets usually, although different system empirical value has difference, but the implicit number of plies that in most cases can arrange DNN is between 4 to 9, the node number of each hidden layer is usually identical, is 1024,2048 or 2560.
Step two: deep neural network model parameter is trained;
Particularly, model parameter is weight parameter.Utilize the training data gathered to train the weight parameter of described DNN model, whole training process is divided into two steps:
A) without supervision pre-training
System first stochastic generation meets the initial weight of random number as neural network of Gaussian distribution, then only utilizes the acoustic feature of training data successively to carry out the training of weight according to the training method of limited Boltzmann machine from input layer toward output layer.Particularly, after first weight between input layer and first hidden layer has trained, acoustic feature and this weight is utilized to obtain the output valve of first hidden layer, the weight between first hidden layer and second hidden layer is trained in the input being regarded as limited Boltzmann machine, so repeat down, until the weight training between penultimate hidden layer and last hidden layer completes.
B) the final training of supervision is had
Weight nothing supervision pre-training obtained, as the initial weight of neural network, utilizes the acoustic feature of training data and the mark of correspondence, adopts error backpropagation algorithm to carry out the final optimization pass adjustment of all weights.Particularly, first calculating output and the legitimate reading of Current Situation of Neural Network according to present weight value---the error value E between mark, then calculates the gradient of error value E to each layer weight the last renewal carrying out each layer weight according to gradient descent method, namely wherein W i trepresent the present weight of i-th layer, W i t+1represent the weight after i-th layer of renewal.
The defect of tradition DNN model is: the method that the topological structure of traditional DNN model mainly adopts experience to arrange, to each hidden layer selection same node point number.Obviously such DNN model is comparatively large, and the model parameter of redundancy is more, causes the chronic and final decoding speed needed for model training very slow.
But, in the topological structure of the deep neural network being applied to speech recognition builds, therefore the hidden layer near input layer often needs to retain more node number to avoid the loss of Speech acoustics characteristic information owing to needing to retain the acoustic feature information extracted from speech waveform signal; And the hidden layer of close output layer, give up a lot for identifying inoperative or producing the information of interference compared to original acoustic feature, and the distinction information remained for identifying different conditions, therefore less node can be adopted in these layers to carry out modeling, thus the reduction of network parameter scale is realized when not losing recognition performance, realize the lifting of training effectiveness.And result of study confirms increasing along with the number of plies in deep neural network further, its weight distribution is sparse gradually, and general weight absolute value major part all will be less than 0.1, a lot of node in a network inoperative or effect very little.
To this, this case proposes a kind of deep neural network model regularity characteristic of meeting, the node number realizing hidden layer tapers off the construction method of deep neural network of change and system, by effective control of the node number to each hidden layer, greatly reduce the redundancy of deep neural network interior joint, the deep neural network adopting the present invention to build is applied to speech recognition system, when not losing final recognition performance, can effectively improve model training efficiency and the decoding speed of deep neural network.
As shown in Figure 1, be the process flow diagram of the construction method of embodiment of the present invention deep neural network, this construction method comprises the following steps:
Step 101: determine the node number of deep neural network input layer and the node number of output layer.
Particularly, the node number of deep neural network input layer is the dimension of the input acoustic feature of DNN model, particularly carry out the dimension of the rear acoustic feature of consecutive frame splicing, as taked adjacent 11 frames composition acoustic features and be the spectrum signature of 43 dimensions before the splicing of every frame, then the node number of deep neural network input layer is 11*43=473.
The node number of deep neural network output layer decides the distinction of DNN, and its node is for simulating the distribution of HMM state, and to this, most plain mode can be set to the corresponding state of an output layer node.But because in HMM model, status number is too much, the set-up mode of a single state output node easily causes DNN scale too huge, so often first classified to HMM state by the method for cluster in actual applications, and make the corresponding state subclass of each node of output layer.Each node of usual output layer corresponds to each state after decision tree in GMM-HMM is bound, what such as adopted before training DNN is the cutting that the GMM-HMM of 3000 states carries out training data mark, then the node number of the output layer of DNN is 3000, and is one to one with 3000 states of this GMM-HMM.
Step 102: obtain training data.
Step 103: determine the number of plies of deep neural network hidden layer and the node number of ground floor hidden layer.
Particularly, the number of plies of deep neural network hidden layer sets according to the attribute of obtained training data, and wherein, the attribute of training data comprises following several: have large vocabulary, continuous print, discrete etc.In a preferred embodiment of the invention, for there is large vocabulary, number of plies L=6 that hidden layer is set continuous print training data usually, for the number of plies L=3 usually arranging hidden layer discrete training data.
The node number of the ground floor hidden layer of deep neural network sets according to the data volume of obtained training data, and wherein, the data volume of training data refers to the hourage of training data.In a preferred embodiment of the invention, for the training data of data volume more than 100 hours, the node number N of ground floor hidden layer is set usually 1=2048, data volume is less than to the training data of 100 hours, the node number N of ground floor hidden layer is set usually 1=1024.
Step 104: according to the node number of the data volume of described training data, the number of plies of hidden layer and ground floor hidden layer, determines the node number of follow-up hidden layer, and the node number of different hidden layer is tapered off change.
Particularly, can according to the data volume determination decreasing fashion of described training data and ratio of successively decreasing, then according to the node number of the decreasing fashion determined and the follow-up hidden layer of the ratio-dependent that successively decreases.
Step 105: the model parameter of described deep neural network obtains deep neural network to utilize described training data to determine.
Particularly, model parameter is here the weight parameter of deep neural network, consistent with defining method of the prior art to the defining method of the model parameter of deep neural network according to training data, does not repeat them here.
Below to the node number of the follow-up hidden layer of determination that above-mentioned steps 104 relates to, and make the node number of different hidden layer taper off change method be described in detail.
As shown in Figure 2, be a kind of process flow diagram of method of the node number determining follow-up hidden layer in the embodiment of the present invention.
In general, the data volume of training data is larger, needs the node number retaining more hidden layer, otherwise then only needs the node number of less hidden layer.Based on this, in this embodiment, when determining the node number of follow-up hidden layer, the node number of different hidden layer can be made to taper off change, specifically can comprise the following steps:
Step 201: according to determine the ratio p that successively decreases, wherein H is the data volume of training data, and the value of ratio of successively decreasing meets 1≤p≤2.When the data volume of training data is less, the node number of the hidden layer that we need is less, therefore ratio p is successively decreased close to 2, when the data volume of training data is increased to 10000 constantly little gradually, the ratio p that successively decreases drops to 1 gradually from 2, and when the data volume of training data is constantly little more than 10000, the ratio p that successively decreases remains on 1, namely the node number of each hidden layer is identical, then current deep neural network be traditional hidden layer node number identical, the feed-forward type deep neural network of standard.
Step 202: according to determine the node number N of follow-up hidden layer k, make for described follow-up hidden layer, the node number N of each hidden layer kat the node number N of described ground floor hidden layer 1basis on successively successively decrease according to the described ratio p that successively decreases.
As shown in Figure 3, be the another kind of process flow diagram of method of the node number determining follow-up hidden layer.
The method of the node number of the follow-up hidden layer of the determination in the present embodiment is a kind of method for the attribute of training data and the differentiation of data volume: for discrete or that data volume is less training data, the node number of required hidden layer is less, and the node number of hidden layer promptly can reduce and not affect recognition performance; Otherwise for continuous print or the larger training data of data volume, the node number of required hidden layer is more, and in order to ensure that the node number of recognition performance hidden layer can not reduce too fast.Because the number of plies of hidden layer is determined by the attribute of training data, so the method for the node number of the follow-up hidden layer of determination in the present embodiment is a kind of method for the number of plies of hidden layer and the differentiation of data volume.
In this embodiment, when determining the node number of follow-up hidden layer, the number of plies is less than or equal to each hidden layer of number of plies threshold value, the node number of wherein each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer; The number of plies is greater than to each hidden layer of described number of plies threshold value, the node number of wherein each odd-level hidden layer is successively decreased according to described ratio of successively decreasing on the basis of the node number of its previous odd-level hidden layer, makes the node number of wherein each even level hidden layer equal the node number of its previous hidden layer.Specifically can comprise the following steps:
Step 301: according to determine the ratio p that successively decreases, wherein H is the data volume of training data, and the value of ratio of successively decreasing meets 1≤p≤2.When the data volume of training data is less, the node number of the hidden layer that we need is less, therefore ratio p is successively decreased close to 2, when the data volume of training data is increased to 10000 constantly little gradually, the ratio p that successively decreases drops to 1 gradually from 2, and when the data volume of training data is constantly little more than 10000, the ratio p that successively decreases remains on 1, namely the node number of each hidden layer is identical, then current deep neural network be traditional hidden layer node number identical, the feed-forward type deep neural network of standard.
Step 302: judge whether the number of plies L of described hidden layer is less than or equal to number of plies threshold value L 0.
Step 303: if the number of plies L of described hidden layer is less than or equal to number of plies threshold value L 0, then basis determine the node number N of follow-up hidden layer k, make for described follow-up hidden layer, the node number N of each hidden layer kat the node number N of described ground floor hidden layer 1basis on successively successively decrease according to the described ratio p that successively decreases.
Step 304: if the number of plies L of described hidden layer is greater than number of plies threshold value L 0, then according to N 2m-1* p m-1=N 1, N 2m=N 2m-1determine the node number of follow-up hidden layer, make for described follow-up hidden layer, the node number N of each odd-level hidden layer 2m-1successively decrease according to the described ratio p that successively decreases in the basis of the node number of its previous odd-level hidden layer; The node number N of each even level hidden layer 2mequal the node number of its previous hidden layer.
Illustrate above-mentioned two kinds of methods determining the node number of follow-up hidden layer below: for data volume more than 10000 hours, there is large vocabulary, continuous print training data, the number of plies L=6 of hidden layer can be obtained, the node number N of ground floor hidden layer according to scheme above 1=2048, successively decrease ratio p=1, and thus, the node number of 6 layers of hidden layer of corresponding deep neural network is respectively 2048,2048,2048,2048,2048,2048.For data volume be less than 100 hours, large vocabulary, continuous print training data, the number of plies L=6 of hidden layer can be obtained according to scheme above, the node number N of ground floor hidden layer 1=1024, successively decrease ratio p=2, and the number of plies L due to hidden layer is greater than default number of plies threshold value L 0=3, so the node number that the method according to Fig. 3 can obtain 6 layers of hidden layer of corresponding deep neural network is respectively 1024,1024,512,512,256,256.100 hours, discrete training data is less than for data volume, the number of plies L=3 of hidden layer can be obtained according to scheme above, the node number N of ground floor hidden layer 1=1024, successively decrease ratio p=2, and the number of plies L due to hidden layer equals default number of plies threshold value L 0=3, so the node number that the method according to Fig. 3 can obtain 3 layers of hidden layer of corresponding deep neural network is respectively 1024,512,256.Compared to the neural network that the node number of each hidden layer general is at present 2048, the present invention, while a large amount of reduction neural network parameter, can not lose the recognition performance of the deep neural network being applied to speech recognition.It is worth mentioning that the node number N making ground floor hidden layer 1=3072, when ensureing that neural network population parameter scale is identical, final recognition performance can also be improved.
To sum up, compared with the deep neural network identical with the node number of each hidden layer general at present, the deep neural network that application the present invention builds greatly can reduce the number of parameters of neural network, thus reduces required storage space and accelerate the training speed of model.The particularly DNN-HMM system of the binding state of the speech recognition system use of large vocabulary at present, node number due to output layer can reach 10,000 even more, and the node number reducing last hidden layer effectively can reduce the number of parameters of neural network.In addition, due to the time decreased of computing mode output probability when network parameter minimizing makes to decode, the deep neural network adopting the present invention to build is applied to speech recognition system, the final decoding speed identified can be improved, thus in practice, have better real-time.Have again, method of the present invention does not affect for the recognition performance that speech recognition system is final when network parameter significantly reduces, and under the constant prerequisite of network parameter, recognition performance can also be promoted by the method increased near the hidden layer node number of input layer.Correspondingly, the embodiment of the present invention also provides a kind of constructing system of deep neural network, as shown in Figure 4, is a kind of structural representation of this system.
In this embodiment, the constructing system of described deep neural network comprises:
Input and output layer determining unit 401, for the node number of the node number and output layer of determining deep neural network input layer;
Data capture unit 402, for obtaining training data;
Hidden layer first determining unit 403, the node number for the number of plies and ground floor hidden layer of determining deep neural network hidden layer:
Hidden layer second determining unit 404, for the node number according to the data volume of described training data, the number of plies of hidden layer and ground floor hidden layer, determines the node number of follow-up hidden layer, and the node number of different hidden layer is tapered off change;
Model parameter determining unit 405, determines that for utilizing described training data the model parameter of described deep neural network obtains deep neural network.
In embodiments of the present invention, a kind of concrete structure of described hidden layer second determining unit can comprise: decline mode determining unit and nodes determining unit, wherein:
Described decline mode determining unit, for according to the data volume determination decreasing fashion of described training data and ratio of successively decreasing;
Described nodes determining unit, for the node number according to the decreasing fashion determined and the follow-up hidden layer of the ratio-dependent that successively decreases.
In actual applications, above-mentioned decline mode determining unit can adopt the decreasing fashion of various ways determination hidden layer node and ratio of successively decreasing.Such as, determine that the node number of follow-up each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer according to the data volume of described training data.For another example, the number of plies is less than or equal to each hidden layer of number of plies threshold value, determines that the node number of wherein each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer; The number of plies is greater than to each hidden layer of described number of plies threshold value, determine that the node number of wherein each odd-level hidden layer is successively decreased according to described ratio of successively decreasing on the basis of the node number of its previous odd-level hidden layer, make the node number of wherein each even level hidden layer equal the node number of its previous hidden layer.
Correspondingly, the node number of the decreasing fashion that above-mentioned nodes determining unit can be determined according to decline mode determining unit and the follow-up hidden layer of the ratio-dependent that successively decreases, thus greatly reduce the number of parameters of neural network, thus reduce required storage space and accelerate the training speed of model.
Compared with the deep neural network identical with the node number of each hidden layer general at present, the deep neural network of system constructing of the application embodiment of the present invention greatly can reduce the number of parameters of neural network, thus reduces required storage space and accelerate the training speed of model.The particularly DNN-HMM system of the binding state of the speech recognition system use of large vocabulary at present, node number due to output layer can reach 10,000 even more, and the node number reducing last hidden layer effectively can reduce the number of parameters of neural network.In addition, due to the time decreased of computing mode output probability when network parameter minimizing makes to decode, the deep neural network adopting the present invention to build is applied to speech recognition system, the final decoding speed identified can be improved, thus in practice, have better real-time.Have again, method of the present invention does not affect for the recognition performance that speech recognition system is final when network parameter significantly reduces, and under the constant prerequisite of network parameter, recognition performance can also be promoted by the method increased near the hidden layer node number of input layer.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.System embodiment described above is only schematic, and the wherein said unit that illustrates as separating component and module can or may not be physically separates.In addition, some or all of unit wherein and module can also be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
Structure of the present invention, feature and action effect is described in detail above according to graphic shown embodiment; the foregoing is only preferred embodiment of the present invention; but the present invention does not limit practical range with shown in drawing; every change done according to conception of the present invention; or be revised as the Equivalent embodiments of equivalent variations; do not exceed yet instructions with diagram contain spiritual time, all should in protection scope of the present invention.

Claims (10)

1. a construction method for deep neural network, is characterized in that, comprising:
Determine the node number of deep neural network input layer and the node number of output layer;
Obtain training data;
Determine the number of plies of deep neural network hidden layer and the node number of ground floor hidden layer:
According to the node number of the data volume of described training data, the number of plies of hidden layer and ground floor hidden layer, determine the node number of follow-up hidden layer, and the node number of different hidden layer is tapered off change;
The model parameter of described deep neural network obtains deep neural network to utilize described training data to determine.
2. the construction method of deep neural network according to claim 1, is characterized in that, the described node number determining follow-up hidden layer, and the change that tapers off of the node number of different hidden layer is comprised:
According to data volume determination decreasing fashion and the ratio of successively decreasing of described training data;
According to the node number of the decreasing fashion determined and the follow-up hidden layer of the ratio-dependent that successively decreases.
3. the construction method of deep neural network according to claim 2, is characterized in that, the described data volume determination decreasing fashion according to described training data and ratio of successively decreasing comprise:
The node number of follow-up each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer.
4. the construction method of deep neural network according to claim 2, is characterized in that, the described data volume determination decreasing fashion according to described training data and ratio of successively decreasing comprise:
The number of plies is less than or equal to each hidden layer of number of plies threshold value, the node number of wherein each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer;
The number of plies is greater than to each hidden layer of described number of plies threshold value, the node number of wherein each odd-level hidden layer is successively decreased according to described ratio of successively decreasing on the basis of the node number of its previous odd-level hidden layer, makes the node number of wherein each even level hidden layer equal the node number of its previous hidden layer.
5. the construction method of the deep neural network according to Claims 2 or 3 or 4, is characterized in that, the ratio of successively decreasing of a kth hidden layer is: 1/p k-1, wherein, 1≤p≤2.
6. a constructing system for deep neural network, is characterized in that, comprising:
Input and output layer determining unit, for the node number of the node number and output layer of determining deep neural network input layer;
Data capture unit, for obtaining training data;
Hidden layer first determining unit, the node number for the number of plies and ground floor hidden layer of determining deep neural network hidden layer:
Hidden layer second determining unit, for the node number according to the data volume of described training data, the number of plies of hidden layer and ground floor hidden layer, determines the node number of follow-up hidden layer, and the node number of different hidden layer is tapered off change;
Model parameter determining unit, determines that for utilizing described training data the model parameter of described deep neural network obtains deep neural network.
7. the constructing system of deep neural network according to claim 6, is characterized in that, described hidden layer second determining unit comprises:
Decline mode determining unit, for according to the data volume determination decreasing fashion of described training data and ratio of successively decreasing;
Nodes determining unit, for the node number according to the decreasing fashion determined and the follow-up hidden layer of the ratio-dependent that successively decreases.
8. the constructing system of deep neural network according to claim 7, is characterized in that,
Described decline mode determining unit, specifically for determining that according to the data volume of described training data the node number of follow-up each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer.
9. the constructing system of deep neural network according to claim 7, it is characterized in that, described decline mode determining unit, specifically for being less than or equal to each hidden layer of number of plies threshold value for the number of plies, determine that the node number of wherein each hidden layer is successively successively decreased according to described ratio of successively decreasing on the basis of the node number of described ground floor hidden layer; The number of plies is greater than to each hidden layer of described number of plies threshold value, determine that the node number of wherein each odd-level hidden layer is successively decreased according to described ratio of successively decreasing on the basis of the node number of its previous odd-level hidden layer, make the node number of wherein each even level hidden layer equal the node number of its previous hidden layer.
10. the constructing system of the deep neural network according to claim 7 or 8 or 9, is characterized in that, the ratio of successively decreasing of a kth hidden layer is: 1/p k-1, wherein, 1≤p≤2.
CN201310755400.9A 2013-12-31 2013-12-31 Construction method and system for the deep neural network of speech recognition Active CN104751227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310755400.9A CN104751227B (en) 2013-12-31 2013-12-31 Construction method and system for the deep neural network of speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310755400.9A CN104751227B (en) 2013-12-31 2013-12-31 Construction method and system for the deep neural network of speech recognition

Publications (2)

Publication Number Publication Date
CN104751227A true CN104751227A (en) 2015-07-01
CN104751227B CN104751227B (en) 2018-03-06

Family

ID=53590872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310755400.9A Active CN104751227B (en) 2013-12-31 2013-12-31 Construction method and system for the deep neural network of speech recognition

Country Status (1)

Country Link
CN (1) CN104751227B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096727A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 A kind of network model based on machine learning building method and device
WO2017076211A1 (en) * 2015-11-05 2017-05-11 阿里巴巴集团控股有限公司 Voice-based role separation method and device
CN106898354A (en) * 2017-03-03 2017-06-27 清华大学 Speaker number estimation method based on DNN models and supporting vector machine model
CN108122035A (en) * 2016-11-29 2018-06-05 科大讯飞股份有限公司 End-to-end modeling method and system
CN108648769A (en) * 2018-04-20 2018-10-12 百度在线网络技术(北京)有限公司 Voice activity detection method, apparatus and equipment
CN109034372A (en) * 2018-06-28 2018-12-18 浙江大学 A kind of neural networks pruning method based on probability
CN109102067A (en) * 2018-07-13 2018-12-28 厦门快商通信息技术有限公司 The method of increase and decrease certainly, computer equipment and the storage medium of neural network node
CN109154798A (en) * 2016-05-09 2019-01-04 1Qb信息技术公司 For improving the method and system of the strategy of Stochastic Control Problem
CN109697506A (en) * 2017-10-20 2019-04-30 图核有限公司 Processing in neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040076944A1 (en) * 2002-08-22 2004-04-22 Ibex Process Technology, Inc. Supervised learning in the presence of null data
US7756646B2 (en) * 2006-03-31 2010-07-13 Battelle Memorial Institute Method for predicting peptide detection in mass spectrometry
CN102411931A (en) * 2010-09-15 2012-04-11 微软公司 Deep belief network for large vocabulary continuous speech recognition
CN102496059A (en) * 2011-11-25 2012-06-13 中冶集团武汉勘察研究院有限公司 Mine shaft well engineering surrounding rock artificial intelligence stage division method
CN103049792A (en) * 2011-11-26 2013-04-17 微软公司 Discriminative pretraining of Deep Neural Network
CN103400577A (en) * 2013-08-01 2013-11-20 百度在线网络技术(北京)有限公司 Acoustic model building method and device for multi-language voice identification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040076944A1 (en) * 2002-08-22 2004-04-22 Ibex Process Technology, Inc. Supervised learning in the presence of null data
US7756646B2 (en) * 2006-03-31 2010-07-13 Battelle Memorial Institute Method for predicting peptide detection in mass spectrometry
CN102411931A (en) * 2010-09-15 2012-04-11 微软公司 Deep belief network for large vocabulary continuous speech recognition
CN102496059A (en) * 2011-11-25 2012-06-13 中冶集团武汉勘察研究院有限公司 Mine shaft well engineering surrounding rock artificial intelligence stage division method
CN103049792A (en) * 2011-11-26 2013-04-17 微软公司 Discriminative pretraining of Deep Neural Network
CN103400577A (en) * 2013-08-01 2013-11-20 百度在线网络技术(北京)有限公司 Acoustic model building method and device for multi-language voice identification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘维群等: "BP网络中隐含层节点优化的研究", 《交通与计算机》 *
高大文等: "人工神经网络中隐含层节点与训练次数的优化", 《哈尔滨工业大学学报》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017076211A1 (en) * 2015-11-05 2017-05-11 阿里巴巴集团控股有限公司 Voice-based role separation method and device
CN109154798A (en) * 2016-05-09 2019-01-04 1Qb信息技术公司 For improving the method and system of the strategy of Stochastic Control Problem
CN109154798B (en) * 2016-05-09 2022-02-25 1Qb信息技术公司 Method and system for improving strategies for stochastic control problems
WO2017206936A1 (en) * 2016-06-02 2017-12-07 腾讯科技(深圳)有限公司 Machine learning based network model construction method and apparatus
CN106096727A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 A kind of network model based on machine learning building method and device
CN106096727B (en) * 2016-06-02 2018-12-07 腾讯科技(深圳)有限公司 A kind of network model building method and device based on machine learning
CN108122035A (en) * 2016-11-29 2018-06-05 科大讯飞股份有限公司 End-to-end modeling method and system
CN108122035B (en) * 2016-11-29 2019-10-18 科大讯飞股份有限公司 End-to-end modeling method and system
CN106898354B (en) * 2017-03-03 2020-05-19 北京华控智加科技有限公司 Method for estimating number of speakers based on DNN model and support vector machine model
CN106898354A (en) * 2017-03-03 2017-06-27 清华大学 Speaker number estimation method based on DNN models and supporting vector machine model
CN109697506A (en) * 2017-10-20 2019-04-30 图核有限公司 Processing in neural network
CN108648769A (en) * 2018-04-20 2018-10-12 百度在线网络技术(北京)有限公司 Voice activity detection method, apparatus and equipment
CN109034372A (en) * 2018-06-28 2018-12-18 浙江大学 A kind of neural networks pruning method based on probability
CN109034372B (en) * 2018-06-28 2020-10-16 浙江大学 Neural network pruning method based on probability
CN109102067A (en) * 2018-07-13 2018-12-28 厦门快商通信息技术有限公司 The method of increase and decrease certainly, computer equipment and the storage medium of neural network node
CN109102067B (en) * 2018-07-13 2021-08-06 厦门快商通信息技术有限公司 Method for increasing and decreasing neural network nodes, computer device and storage medium

Also Published As

Publication number Publication date
CN104751227B (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN104751227A (en) Method and system for constructing deep neural network
CN104751228B (en) Construction method and system for the deep neural network of speech recognition
CN104143327B (en) A kind of acoustic training model method and apparatus
CN104751842B (en) The optimization method and system of deep neural network
CN104538028B (en) A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term
CN103400577B (en) The acoustic model method for building up of multilingual speech recognition and device
CN105427869A (en) Session emotion autoanalysis method based on depth learning
CN105139864B (en) Audio recognition method and device
CN107785015A (en) A kind of audio recognition method and device
CN104036774A (en) Method and system for recognizing Tibetan dialects
CN104700828A (en) Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles
CN103117060A (en) Modeling approach and modeling system of acoustic model used in speech recognition
CN104952448A (en) Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN105096941A (en) Voice recognition method and device
CN106104674A (en) Mixing voice identification
CN107393542A (en) A kind of birds species identification method based on binary channels neutral net
CN109119072A (en) Civil aviaton's land sky call acoustic model construction method based on DNN-HMM
CN108197294A (en) A kind of text automatic generation method based on deep learning
CN107767861A (en) voice awakening method, system and intelligent terminal
CN102779510A (en) Speech emotion recognition method based on feature space self-adaptive projection
CN106919977A (en) A kind of feedforward sequence Memory Neural Networks and its construction method and system
CN110070855A (en) A kind of speech recognition system and method based on migration neural network acoustic model
CN105279552A (en) Character based neural network training method and device
CN102945673A (en) Continuous speech recognition method with speech command range changed dynamically

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant