CN109767759A

CN109767759A - End-to-end speech recognition methods based on modified CLDNN structure

Info

Publication number: CN109767759A
Application number: CN201910115486.6A
Authority: CN
Inventors: 冯昱劼; 张毅; 徐轩
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2019-02-14
Filing date: 2019-02-14
Publication date: 2019-05-17
Anticipated expiration: 2039-02-14
Also published as: CN109767759B

Abstract

A kind of end-to-end speech recognition methods based on modified CLDNN structure is claimed in the present invention; it is usually used in traditional CLDNN structure of speech recognition using the timing information in full connection LSTM (Long Short Term Memory) model treatment voice signal; over-fitting easily occurs in the training process, influences learning effect.Deeper model often show it is more outstanding, but by simply stack network layer increase model depth can occur gradient disappear, gradient explosion and " degeneration " problem.For the above phenomenon and problem, the present invention proposes a kind of modified CLDNN structure, establishes residual error ConvLSTM model using residual error network and ConvLSTM combination, and replace the full connection LSTM model in tradition CLDNN structure with this.The model structure improve traditional CLDNN model there are the problem of, and can by stack residual error ConvLSTM block increase model depth without gradient disappear, gradient explosion and " degeneration " problem, keep speech recognition system performance more excellent.

Description

End-to-end speech recognition methods based on modified CLDNN structure

Technical field

The invention belongs to field of speech recognition, especially a kind of audio recognition method based on deep learning.

Background technique

Automatic speech recognition technology has always very important status in artificial intelligence field.It is with HMM-GMM model The traditional voice identification technology of representative is once used as always mainstream, and it is decades-long to have ruled field of speech recognition.In recent years, it benefits In the breakthrough of deep learning, automatic speech recognition technology is also at the stage of rapid development.Currently, based on deep learning End-to-end speech identifying system has surmounted legacy speech recognition systems on the popularity degree in academia, and starts Gradually apply to actual production instead of legacy speech recognition systems.

Since the 1980s, it is based on mixed Gauss model/hidden Markov model (Gaussian Mixture Model/Hidden Markov Model, GMM/HMM) acoustic model be just widely used, HMM for handle voice when Variation in sequence, GMM is for completing the mapping that acoustics is input between Hidden Markov state.In recent years, it is based on depth nerve net The acoustic model of network (Deep Neural Network, DNN) is proved to possess in the voice recognition tasks of large vocabulary more preferable Performance, a large amount of neurons activity simulation acoustic feature on show it is more outstanding.Due to the property that DNN is linked completely, lead Cause it that cannot make full use of the structure partial in speech feature space.And convolutional neural networks (Convolutional Nerual Network, CNN) it can use its translation invariance to overcome the diversity of voice signal itself, and can be very The variation in speech feature space is explained well.Recurrent neural network (Recurrent Neural Network, RNN) passes through Recurrence is come the shortcomings that excavating the context-related information in sequence, overcome DNN to a certain extent.But RNN is in training The problem of it is easy to appear gradient disappearances, and information when being difficult to remember long.Shot and long term memory unit (Long Short-Term Memory, LSTM) enable the error at current time to preserve and be selectively transmitted to specific by specific door control unit Unit, so as to avoid gradient disappearance the problem of.Connect timing sorting algorithm (Connectionist Temporal Classifier, CTC) it was proposed by Grave etc. in 2006, it can be applied to end-to-end speech identifying system, portray phonetic feature The relevance of sequence and aligned phoneme sequence, and artificial alignment feature and phoneme need not be relied on.

Domestic and international relevant technical company is all in the end-to-end speech identification model for constantly researching and developing oneself at present.People studies in Baidu Member has delivered Deep speech for 2015, has delivered within 2016 Deep Speech2, and the two is combined using CLDNN and CTC Mode establishes speech recognition modeling, reaches excellent properties.Iflytek research team proposed depth complete sequence volume in 2016 Product neural network (DF-CNN, Deep Fully Convolutional NeuralNetwork) structure, utilizes a large amount of convolutional layers With the combination of pond layer, whole sentence voice is modeled, the ability to express of CNN is greatly strengthened.DFCNN is very more by accumulating This convolution pond layer it is right, it can be seen that very long history and Future Information, this guarantees DFCNN can outstanding earth's surface Up to voice it is long when correlation, it is more outstanding in robustness compared to RNN network structure.The researcher of IBM was in 2016 It is being delivered on ICASSP the article stated that can be trained using the technology of pond layer is connect again after the convolution kernel and multilayer convolution of 3x3 14 layers of (including full connection) Deep CNN model.The model is on Switchboard data set compared to tradition CNN application method Model can bring relatively about 10.6% WER to decline.MSRA team proposes residual error network in 2015, solves with model Depth down occur " degeneration " problem.Residual error network was also applied on speech recognition modeling later, had been proved good Effect.Google research team illustrates one kind for 2017 by Network-in-Network in icassp meeting (NiN), the acoustic model knot that Batch Normalization (BN) and Convolutional LSTM (ConvLSTM) is combined Structure.In the case where not having language model, which has reached 10.5% WER in WSJ voice recognition tasks.

CLDNN is always a kind of comparison in end-to-end speech identification model due to its simple construction and excellent performance Popular structure.But the depth of common CLDNN model is inadequate, and the feature of extraction is not abundant enough, and the speech recognition modeling of foundation is not It can achieve the effect that best.The long memory models (FC-LSTM) in short-term of full connection in its model cannot keep speech feature space Structure partial, and be easy over-fitting.

Summary of the invention

Present invention seek to address that the above problem of the prior art.Traditional CLDNN can be efficiently solved by proposing one kind In, the problem of LSTM is easy to cause over-fitting, overcomes and increases that model depth bring gradient disappears, gradient explosion and " degeneration " are asked The end-to-end speech recognition methods based on modified CLDNN structure of topic.Technical scheme is as follows:

A kind of end-to-end speech recognition methods based on modified CLDNN structure comprising following steps:

S1, it obtains voice data collection and is divided, voice data collection is divided into training set, cross validation collection and test Collection；

S2, all voice data are pre-processed, obtains the mel-frequency cepstrum coefficient MFCC of voice signal；

S3, building modified CLDNN network model, including the phonetic feature abstract being made of convolutional neural networks CNN Processing part, processing voice signal timing information the long memory models in short-term of residual error convolution and will treated feature space mapping To the deep neural network DNN of output layer；

S4, building speech recognition loss function, loss function use CTC loss；

S5, it is trained with modified CLDNN model of the training set to step S3, utilizes Adam arithmetic operators optimization step S4's Objective function；

S6, the model income cross validation to step S5 after trained, adjust the hyper parameter of model, obtain final network mould Type.

Further, the pre-treatment step of the step 2 include: preemphasis, framing, adding window, Fast Fourier Transform (FFT), Mel filtering and discrete cosine transform.

Further, the long memory models in short-term of the residual error convolution in the step S3 specifically: full connection length is remembered in short-term Recall the matrix product in model and replace with convolution algorithm and obtain the long memory models in short-term of convolution, residual error network knot is used to the model Structure obtains the long memory models in short-term of residual error convolution.

Further, the residual error network structure is used to construct deep layer network, connects skip connection by jump It is directly connected to shallow-layer network and deep layer network, so that gradient can be better transmitted to shallow-layer, residual error network is by multiple residual errors Block is constituted, and the depth residual error network structure being made of multiple residual blocks replaces the multilayer LSTM (length in traditional CLDNN model When memory models) structure.

Further, the step S4 loss function, loss function use CTC loss, specifically include:

Assuming that the size of tag element table L is K, list entries X=(x is given₁,x₂,...,x_T), corresponding output label sequence Arrange Y=(y₁,y₂,...,y_U), the task of CTC is that penalty values are fed back to neural network, are passed through under given list entries Adjustment neural network inner parameter maximizes the log probability of output label, i.e. max (lnP (Y | X)), CTC (connection timing Classification) sky label blank is also introduced to indicate the mapping for being not belonging to tag element table L；

Softmax layer after the last layer DNN is exported into the input as CTC, softmax output includes K+1 node The each element being mapped in L ∪ { blank }, entire CTC path probability are shown below:

Wherein z_tFor in t moment, softmax obtains output vector,The corresponding posterior probability of k-th of label is represented, is The alignment problem between softmax output and sequence label is solved, list entries one-to-one CTC on frame-layer face is introduced Path p=(p₁,p₂,...,p_T), sequence label Y is corresponded on the p of the path CTC by mapping Ф, since this mapping is a pair of More mapping a, so label can correspond to multiple paths CTC, so CTC of the probability of label Y by all this labels of correspondence Path probability and it is expressed as following formula:

The loss function of CTC and the sum of the negative logarithm for being defined as each training sample correct labeling, such as following formula:

Further, the step S5 utilizes the objective function of Adam arithmetic operators optimization step S4；

Calculate the gradient of t time step:

Firstly, calculating the index moving average of gradient, m0 is initialized as 0.The gradient of time step is dynamic before comprehensively considering Amount.1 coefficient of β is exponential decay rate, and control weight distribution (momentum and current gradient) usually takes the value close to 1, is defaulted as 0.9

m_t=β₁m_t-1+(1-β₁)g_t

Second, the index moving average of gradient square is calculated, v0 is initialized as 0.2 coefficient of β is exponential decay rate, control The influence situation of gradient square before, is defaulted as 0.999.

Third will lead to mt and be partial to 0 since m0 is initialized as 0, especially in training initial stage.So need herein Bias correction is carried out to gradient mean value mt, reduces influence of the deviation to training initial stage.

4th, since v0 is initialized as 0 lead to that initial stage vt is trained to be biased to 0, it is corrected.

5th, undated parameter, initial learning rate α multiplied by gradient mean value and gradient variance the ratio between square root.Wherein write from memory Recognize learning rate α=0.001, ε=10^-8.

Further, the step S6 carries out cross validation to the model after step S5 training, adjusts the hyper parameter of model, Final network model is obtained, is specifically included:

Cross validation step:

1, weight is initialized, weighting value is the random number between -0.5 to 0.5.

2, dividing learning sample space C is N parts.

3, N-1 parts are taken out according to regulation sequence from learning data file and is used as training data sample.Remaining N parts of works For verify data sample.It completes step 4 and arrives step 7.

4, it is trained reading in a sample since training data sample.

5, it calculates this sample output error and always measures EP.Two layers of weight is modified until EP < (for defined error metrics), is read Enter next training sample.

6, until in N-1 parts of training samples all sample learnings terminate, generate one group of weight, verified with this group of weight computing Sample, calculate verifying sample is proved to be successful rate RATE=(meet EP < verifying number of samples)/(total verifying number of samples)

If 7, verifying sample success rate RATE > rate (rate is defined success rate), terminate the study of this wheel.It is no Then learn all verifying samples.

Hyper parameter:

Learning rate: learning rate refers to the amplitude size that network weight is updated in optimization algorithm.Different optimization algorithms are determined Fixed different learning rate.When learning rate is excessive, it may cause model and do not restrain, loss function constantly concussion up and down；Learning rate mistake It is small, cause model convergence rate partially slow, needs longer time training.Usual value is 0.01,0.001,0.0001.

Batch size: batch size is the sample number that trained neural network is sent into model each time, in convolutional neural networks In, big batch can usually make network more rapid convergence, but due to the limitation of memory source, batch is excessive to may result in Out of Memory With or program Kernel Panic.Usual value is 16,32,64,128.

The number of iterations: the number of iterations refers to that entire training set is input to the number that neural network is trained, when verifying is wrong When accidentally rate and training error rate differ smaller, it is believed that current iteration number is suitable；When becoming larger after authentication error takes the lead in becoming smaller Then illustrate that the number of iterations is excessive, needs to reduce the number of iterations, be otherwise easy to appear over-fitting.

It advantages of the present invention and has the beneficial effect that:

Present invention introduces the long memory models in short-term of convolution (Convolutional Long Short-Term Memory, ConvLSTM the FC-LSTM in common CLDNN model) is replaced, to improve model cannot keep space structure locality and appearance The problem of easy over-fitting.In order to deepen the problems such as model depth explodes without " degeneration ", gradient disappearance and gradient, the present invention Also introduce residual error network (Residual Network, ResNet).In order to which stacked multilayer ConvLSTM improves the performance of model And gradient disappearance, gradient explosion and " degeneration " problem does not occur, the present invention has merged ConvLSTM and residual error network structure, residual Poor ConvLSTM block structure is as shown in Figure 1.Based on the above structure, the present invention proposes improvement to traditional CLDNN structure.For biography The long memory models in short-term of full connection in system CLDNN model cannot keep the structure partial of feature space, and be easy to intend The problem of conjunction, replaces tradition using the depth residual error ConvLSTM network structure being made of multiple residual error ConvLSTM blocks Multilayer LSTM structure in CLDNN model makes model have better performance in the time relationship in processing phonetic feature, and It is not easy over-fitting.Improved CNN-ResconvLSTM-DNN model can by be superimposed more residual error ConvLSTM come Deeper model is established without gradient disappearance, gradient explosion and " degeneration " problem, can be played in voice recognition tasks Better performance, structure are as shown in Figure 2.

Detailed description of the invention

Fig. 1 is that the present invention provides preferred embodiment residual error convolution long memory models block structure in short-term；

Fig. 2 is modified CLDNN model structure proposed by the present invention；

Fig. 3 is flow chart of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed Carefully describe.Described embodiment is only a part of the embodiments of the present invention.

The technical solution that the present invention solves above-mentioned technical problem is:

S1, voice data collection is divided, data set is divided into training set, cross validation collection and test set；

S2, all data are pre-processed, and then obtains the mel-frequency cepstrum coefficient (MFCC) of voice signal, it is pre- to locate Manage step are as follows:

Preemphasis: for passing through high-pass filter H (Z)=1- μ z because of signal^-1

Framing: whole section of voice signal is divided into every frame 30ms, frame moves the segment of 10ms.

Adding window: add Hamming window for each frame signal

S ' (n)=S (n) * W (n)

Fast Fourier Transform (FFT): every frame is by Fast Fourier Transform (FFT) to obtain the Energy distribution on frequency spectrum.

Mel filtering: energy spectrum is passed through into one group of Mel filter group, the frequency response of single foot filter is defined as:

Wherein m is triangular filter centre frequency.

Calculate the logarithmic energy of each filter group output:

Discrete cosine transform:

S3, building modified CLDNN network model, model includes the phonetic feature being made of convolutional neural networks (CNN) Abstract processing part, the long part memory models (ResConvLstm) in short-term of residual error convolution for handling voice signal timing information With by part deep neural network (DNN) of treated feature space is mapped to output layer；

The long memory models in short-term of convolution are the extensions of the length that is fully connected memory models in short-term, it in the state that is input to and State to state convert in all have convolutional coding structure, this structure compared to common CNN more can performance characteristic time relationship, and phase Than connecting the more difficult over-fitting of LSTM entirely, such as following formula:

i_t=σ (W_xi*x_t+W_hi*h_t-1+b_i)

f_t=σ (W_xf*x_t+W_hf*h_t-1+b_i)

o_t=σ (W_xo*x_t+W_ho*h_t-1+b_o)

σ is sigmoid activation primitive, i_t, f_t, o_t, c_t, h_tIt respectively indicates the input gate of t moment, forget door, out gate, list Member input activation and unit output vector,Indicate that the element product of vector, W indicate to connect the weight matrix between not fellow disciple, b Represent corresponding bias vector.

Residual error network structure constructs deep layer network, is directly connected to shallow-layer by jump connection (skip connection) Network and deep layer network, so that gradient can be better transmitted to shallow-layer.Residual error network is by multiple residual block (Residual Block it) constitutes, if the input of residual block is x_l, export as x_l+1, then the structure of residual block may be expressed as:

x_l+1=x_l+F(x_l,w_l) (9)

F(x_l,w_l)=w_lσ(w_l-1x_l-1) (10)

Wherein σ is activation primitive, so, for any x_L, have:

Assuming that loss function C, available:

Wherein,It ensure that information can pass random layer x back_l,It ensure that network is not in ladder The phenomenon that degree disappears.

In order to the long memory models in short-term of stacked multilayer convolution improve the performance of model and do not occur gradient disappear, gradient it is quick-fried Fried and " degeneration " problem, the present invention have merged convolution long memory models and residual error network structure in short-term, and residual error convolution length is remembered in short-term It is as shown in Figure 1 to recall model block structure.

Based on the above structure, the present invention proposes improvement to traditional CLDNN structure.For complete in traditional CLDNN model The long memory models in short-term of connection cannot keep the structure partial of feature space, and the problem of easy over-fitting, using by more The depth residual error ConvLSTM network structure of a residual error ConvLSTM block composition replaces the multilayer LSTM in traditional CLDNN model Structure makes model have better performance in the time relationship in processing phonetic feature, and is not easy over-fitting.It is improved CNN-ResconvLSTM-DNN model can be established by being superimposed more residual error ConvLSTM deeper model without Gradient disappears, gradient is exploded and " degeneration " problem, and better performance, structure such as Fig. 2 can be played in voice recognition tasks It is shown.

S4, building objective function, i.e. speech recognition word error rate (WER%), loss function use CTC loss；

Assuming that the size of tag element table L is K.Given list entries X=(x₁,x₂,...,x_T), corresponding output label sequence Arrange Y=(y₁,y₂,...,y_U).The task of CTC is that penalty values are fed back to neural network, are passed through under given list entries Adjustment neural network inner parameter maximizes the log probability of output label, i.e. max (lnP (Y | X)).CTC also introduces sky Label blank indicates mapping, such as pause, cough for being not belonging to tag element table L etc..

Softmax layer after the last layer DNN is exported into the input as CTC, softmax output includes K+1 node The each element being mapped in L ∪ { blank }.Entire CTC path probability is shown below:

Wherein z_tFor in t moment, softmax obtains output vector,Represent the corresponding posterior probability of k-th of label.For The alignment problem between softmax output and sequence label is solved, list entries one-to-one CTC on frame-layer face is introduced Path p=(p₁,p₂,...,p_T).Sequence label Y is corresponded on the p of the path CTC by mapping Ф, since this mapping is a pair of More mapping a, so label can correspond to multiple paths CTC.So CTC of the probability of label Y by all this labels of correspondence Path probability and it is expressed as following formula:

S5, it is trained with training the set pair analysis model, utilizes Adam arithmetic operators optimization objective function；

S6, collected using verifying to the model income cross validation after training, adjust the hyper parameter of model, obtain final network Model；

The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.? After the content for having read record of the invention, technical staff can be made various changes or modifications the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims

1. a kind of end-to-end speech recognition methods based on modified CLDNN structure, which comprises the following steps:

S1, it obtains voice data collection and is divided, voice data collection is divided into training set, cross validation collection and test set；

S3, building modified CLDNN network model, including the phonetic feature abstract processing being made of convolutional neural networks CNN Partially, handle voice signal timing information the long memory models in short-term of residual error convolution and will treated that feature space is mapped to is defeated The deep neural network DNN of layer out；

S4, the loss function for constructing speech recognition, loss function use CTC loss；

S5, it is trained with modified CLDNN model of the training set to step S3, utilizes the target of Adam arithmetic operators optimization step S4 Function；

S6, collected using verifying to the model income cross validation after step S5 training, adjust the hyper parameter of model, obtain final net Network model.

2. a kind of end-to-end speech recognition methods based on modified CLDNN structure according to claim 1, feature exist In, the pre-treatment step of the step 2 include: preemphasis, framing, adding window, Fast Fourier Transform (FFT), Mel filtering and discrete remaining String transformation.

3. a kind of end-to-end speech recognition methods based on modified CLDNN structure according to claim 1, feature exist In the residual error convolution in the step S3 grows memory models in short-term specifically: to the matrix in the long memory models in short-term of full connection Product replaces with convolution algorithm and obtains the long memory models in short-term of convolution, obtains residual error convolution using residual error network structure to the model Long memory models in short-term.

4. a kind of end-to-end speech recognition methods based on modified CLDNN structure according to claim 3, feature exist In the residual error network structure is used to construct deep layer network, is directly connected to shallow-layer net by jump connection skip connection Network and deep layer network, so that gradient can be better transmitted to shallow-layer, residual error network is made of multiple residual blocks, by multiple residual The depth residual error network structure of poor block composition replaces the memory models structure in short-term of the multilayer LSTM long in traditional CLDNN model.

5. a kind of end-to-end speech recognition methods based on modified CLDNN structure according to claim 3, feature exist In, the step S4, loss function uses CTC loss, it specifically includes:

Assuming that the size of tag element table L is K, list entries X=(x is given₁,x₂,...,x_T), corresponding output label sequence Y= (y₁,y₂,...,y_U), the task of CTC is penalty values to be fed back to neural network, by adjusting mind under given list entries Maximize the log probability of output label through parameters within network, i.e. max (lnP (Y | X)), the classification of CTC connection timing is also drawn Empty label blank is entered to indicate the mapping for being not belonging to tag element table L；

Softmax layer after the last layer DNN is exported into the input as CTC, softmax output includes K+1 node mapping To each element in L ∪ { blank }, entire CTC path probability is shown below:

Wherein z_tFor in t moment, softmax obtains output vector,The corresponding posterior probability of k-th of label is represented, in order to solve Alignment problem between softmax output and sequence label, introduces the list entries one-to-one path CTC p on frame-layer face =(p₁,p₂,...,p_T), sequence label Y is corresponded on the p of the path CTC by mapping Ф, since this mapping is one-to-many reflects It penetrates, so a label can correspond to multiple paths CTC, so the probability of label Y is general by the path CTC of all this labels of correspondence Rate and it is expressed as following formula:

6. a kind of end-to-end speech recognition methods based on modified CLDNN structure according to claim 5, feature exist In the step S5 utilizes the objective function of Adam arithmetic operators optimization step S4；

Calculate the gradient of t time step:

g_t=▽_θJ(θ_t-1)

Firstly, calculating the index moving average of gradient, m0 is initialized as 0, the gradient momentum of time step, β 1 before comprehensively considering Coefficient is exponential decay rate, and control weight distribution (momentum and current gradient) usually takes the value close to 1, is defaulted as 0.9

m_t=β₁m_t-1+(1-β₁)g_t

Second, the index moving average of gradient square is calculated, v0 is initialized as 0.2 coefficient of β is exponential decay rate, before control Gradient square influence situation, be defaulted as 0.999；

Third will lead to mt and be partial to 0 since m0 is initialized as 0, especially in training initial stage.So need to ladder herein It spends mean value mt and carries out bias correction, reduce influence of the deviation to training initial stage；

4th, since v0 is initialized as 0 lead to that initial stage vt is trained to be biased to 0, it is corrected；

5th, undated parameter, initial learning rate α multiplied by gradient mean value and gradient variance the ratio between square root.Wherein default is learned Habit rate α=0.001, ε=10^-8；

7. a kind of end-to-end speech recognition methods based on modified CLDNN structure according to claim 6, feature exist In the step S6 adjusts the hyper parameter of model, obtain final network mould to the model income cross validation after step S5 training Type specifically includes:

Cross validation step:

1, weight is initialized, weighting value is the random number between -0.5 to 0.5；

2, dividing learning sample space C is N parts；

3, N-1 parts are taken out according to regulation sequence from learning data file and is used as training data sample；Remaining N parts as verifying Data sample；It completes step 4 and arrives step 7；

4, it is trained reading in a sample since training data sample；

5, it calculates this sample output error and always measures EP；Two layers of weight is modified until EP < (for defined error metrics), under reading One training sample；

6, until in N-1 parts of training samples all sample learnings terminate, generate one group of weight, verify sample with this group of weight computing This, calculate verifying sample is proved to be successful rate RATE=(meet EP < verifying number of samples)/(total verifying number of samples)

If 7, verifying sample success rate RATE > rate (rate is defined success rate), terminate the study of this wheel；Otherwise it learns Practise all verifying samples；

Hyper parameter:

Learning rate: learning rate refers to the amplitude size that network weight is updated in optimization algorithm；Different optimization algorithms determines not Same learning rate；When learning rate is excessive, it may cause model and do not restrain, loss function constantly concussion up and down；Learning rate is too small then Cause model convergence rate partially slow, needs longer time training；Usual value is 0.01,0.001,0.0001；

Batch size: batch size is the sample number that trained neural network is sent into model each time, in convolutional neural networks, greatly Batch can usually make network more rapid convergence, but due to the limitation of memory source, batch is excessive may result in Out of Memory use or Program Kernel Panic；Usual value is 16,32,64,128；

The number of iterations: the number of iterations refers to that entire training set is input to the number that neural network is trained, when authentication error rate When smaller with training error rate difference, it is believed that current iteration number is suitable；It is then said when becoming larger after authentication error takes the lead in becoming smaller Bright the number of iterations is excessive, needs to reduce the number of iterations, is otherwise easy to appear over-fitting.