CN105844332A

CN105844332A - Fast recursive Elman neural network modeling and learning algorithm

Info

Publication number: CN105844332A
Application number: CN201610137875.5A
Authority: CN
Inventors: 王健; 龚晓玲; 叶振昀; 时贤; 温艳青; 杨国玲; 张炳杰
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2016-03-10
Filing date: 2016-03-10
Publication date: 2016-08-10

Abstract

The invention belongs to the field of a neural network, and specifically discloses a fast recursive Elman neural network modeling and learning algorithm, which comprises a first step of selecting a model; a second step of initializing parameters; a third step of calculating an error function Ek of a training sample, turning to a fourth step if the error function value is less than an error threshold, and otherwise, calculating the gradient of the error function with respect to weights Wk and Vk; and the fourth step of testing the precision. The fast recursive Elman neural network modeling and learning algorithm provided by the invention can complete weight selection from a hidden layer to an output layer through a generalized inverse method in one step, without iteration, while optimizing an input weight matrix; and in this way, the original two layers of weights calculated using a gradient method become one during the weight update, the training speed is greatly increased, and the shortcomings that a BP algorithm can easily fall into a local minimum, the generalization ability is poor and the like can be also avoided.

Description

Fast Recursive Elman neural net model establishing learning algorithm

Technical field

The invention belongs to field of neural networks, specially Fast Recursive Elman neural net model establishing learning algorithm.

Background technology

At document " Pham, D.T., and X.Liu.1992.Dynamic system modeling using Partially recurrentneural networks.J.of Systems Engineering, 2:90-97 " in, Pham etc. People proposes the Elman network (Modified Elman Networks) of correction, and the most this networks are as standard Elman network.

Elman network in addition to input layer, hidden layer, output layer, an also special undertaking layer.This layer unit is to use Remember the output valve in Hidden unit former moment, it is believed that be a time delay operator.Elman neutral net has n input Node,Individual hidden node and m output node.Input and recurrence weight matrix are respectivelyWithClose And weight matrix is Represent the weight vector that hidden node and output node connect. For training sample set N={ (x_i,t_i) | i=1 ..., N} inputs to network, usesRepresent the hidden of kth time circulation Layer output matrix, defines H₀=0.Assuming that g (x) is the activation primitive of hidden node, generally use sigmoid function, it may be assumed that

The input of the kth of Elman neutral net time, is to combine to actually enter X_kThe hidden layer output square of matrix and circulation last time Battle array H_k-1ForThen the input of hidden layer isHidden layer is output as H_k=g (V_kH_k-1+W_kX_k).Network is output asThe wherein activation primitive of f (x) output node, is taken as linear letter more Number, i.e. network is output as:

Definition error function is: E_k=(Y_k-T)(Y_k-T)^T.E to input layer and is accepted weight matrix L and E of layer to hidden Layer seeks gradient to the connection weights U of output layerWithThen being updated L and U by steepest descent method, formula isWithWherein, learning rate η_kObtain by the way of line search.By so Iteration, it is thus achieved that the network parameter of optimization.

This algorithm uses error back propagation method, is based primarily upon the thought that gradient declines, owing to having used under steepest Fall method is iterated, and the therefore required training time is relatively long, and is easily absorbed in local minimum, and generalization ability has much room for improvement.

Summary of the invention

For above-mentioned technology contents, the present invention provides a kind of fast algorithm, it is possible to significantly promotes network training speed, keeps away Exempt to be absorbed in local minimum, promote generalization ability.

Concrete technical scheme is:

Fast Recursive Elman neural net model establishing learning algorithm, comprises the following steps:

Step 1, Model Selection

Fast Recursive Elman neutral net have n input node,Individual hidden node and m output node；

Input and recurrence weight matrix are respectivelyWithMerging weight matrix isRepresent the weight vector that hidden node and output node connect；

For training sample set N={ (x_i,t_i) | i=1 ..., N} inputs to network, usesRepresent that kth time is followed The hidden layer output matrix of ring, defines H₀=0；

Assuming that g (x) is the activation primitive of hidden node, use sigmoid function, it may be assumed that

Step 2, parameter initialization

(1). set primary iteration number of times k=0；

(2). definition H₀=0；

(3). random assignment is from input layer to hidden layer and accepts the initial weight matrix W of layer, V, chooses between (-1,1) Random number；

Step 3, optimization initial weight

(1). calculate hidden layer output matrix H_k=g (V_kH_k-1+W_kX)；

(2). the actual of network is output asError function is E_k=Tr [(Y_k-T)(Y_k-T)^T], wherein T is Preferable output；

Then solve the network weight U from hidden layer to output layer_kProblem be converted into the problem minimizing error function, i.e. seek Look for least square solution U_kMake E_kMinimum；

Moore-penrose generalized inverse is utilized to obtain: U_k=(H_kH_k ^T)^-1H_kT^T, bring H into_kObtain hidden layer with T to arrive The weight matrix U of output layer_k；

(3). calculate the error function E of training sample_kIf error function value is less than error threshold, then go to step 4；Otherwise, Calculate error function about weights W_kAnd V_kGradient:

WhereinIt it is the generalized inverse of H；

(4). right value update formula is:

Wherein, learning rate η_kObtain by the way of line search；

(5). make k=k+1, return step (2)；

Step 4, measuring accuracy

Weights W according to the hidden layer after optimizing to input layer_kWith undertaking layer weight matrix V_k, and according to the step in step 3 Suddenly the hidden layer that (2) obtain is to the weights U of output layer_k, obtain the network parameter of this algorithm, calculate the precision of test sample.

The Fast Recursive Elman neural net model establishing learning algorithm that the present invention provides, will when optimizing input weight matrix Hidden layer is chosen by drawing by generalized inverse method one step to the weights of output layer, it is not necessary to iteration so that updating weights process In, become one layer from the original weights by gradient method calculating two-layer, make training speed be greatly improved, avoid BP simultaneously Algorithm is easily trapped into the shortcoming such as local minimum, generalization ability difference.

Accompanying drawing explanation

Fig. 1 is the Fast Recursive Elman neural network structure figure of the present invention；

Fig. 2 is the flow chart of the present invention.

Detailed description of the invention

The detailed description of the invention of the accompanying drawings present invention.

The one novel Fast Recursive Elman neural net model establishing learning algorithm of the present invention, include in the specific implementation as Lower step:

Step 1,

For training sample set N={ (x_i,t_i) | i=1 ..., N} inputs to network, determines the input node number of network N, output node number m, and the number of hidden nodes arranging network isIndividual.

Step 2,

Random assignment input and recurrence weight matrixWithChoose the random number between (-1,1)； Merging weight matrix isThe hidden layer output matrix H of definition circulation₀=0.

Step 3,

Calculate hidden layer output matrix H_k=g (V_kH_k-1+W_kX).G (x) is the activation primitive of hidden node, uses sigmoid Function, it may be assumed that

Step 4,

Calculate the actual of network to be output asAnd error function E_k=Tr [(Y_k-T)(Y_k-T)^T], wherein T Export for ideal；And utilize Moore-penrose generalized inverse to obtain: U_k=(H_kH_k ^T)^-1H_kT^T, bring H into_kObtain hidden with T Layer arrives the weight matrix U of output layer_k。

Step 5,

Calculate the error function E of training sample_kIf error function value is less than error threshold, then go to step 7；Otherwise, calculate Error function is about weights W_kAnd V_kGradient:

WhereinIt it is the generalized inverse of H；And weights are updated: right value update formula is:

W_{k + 1} = W_{k} - η_{k} \frac{\partial E_{k}}{\partial W_{k}}; V_{k + 1} = V_{k} - η_{k} \frac{\partial E_{k}}{\partial V_{k}} .

Step 6,

Make k=k+1, return step 3.

Step 7,

Weights W according to the hidden layer after optimizing to input layer_kWith undertaking layer weight matrix V_k, and according to calculated Hidden layer is to the weights U of output layer_k, obtain the network parameter of this algorithm, calculate the precision of test sample.

Below by MNIST data set to existing Elman, ELM (Extreme Learning Machine) and the present invention Algorithm tested, and their result is compared.

MNIST data set is the Corinna Cortes and the Yann of New York University's Ke Lang institute of Google laboratory LeCun has a handwritten numeral data base, and training sample has 60, the digital picture of 000 hand-written 0-9, and test sample has 10, 000.Every pictures gray level is all 8, and every pictures can use the vector sign of 784 sizes.It is more common The data set of machine learning.In three kinds of algorithms of this experiment, excitation function all uses sigmoid function, and the number of hidden nodes is all chosen 128.

The experimental result of Elman algorithm, ELM algorithm and this algorithm is as shown in table 1.

Table 1

By table 1 it can be seen that in the bigger MNIST data set instance of data volume, this algorithm and Elman are in iteration During same number, when even Elman iterations is this algorithm twice, training precision and precision of prediction obtained by this algorithm are equal Higher than Elman.And the precision of this algorithm seeks the ELM of network parameter also above using generalized inverse thought equally.This algorithm is described Compared to other two kinds of algorithms, improve the generalization ability of network, effectively raise the precision of network.

Claims

1. Fast Recursive Elman neural net model establishing learning algorithm, it is characterised in that: comprise the following steps:

Step 1, Model Selection

For training sample set N={ (x_i,t_i) | i=1 ..., N} inputs to network, usesRepresent kth time circulation Hidden layer output matrix, defines H₀=0；

Step 2, parameter initialization

(1). set primary iteration number of times k=0；

(2). definition H₀=0；

(3). random assignment is from input layer to hidden layer and accepts the initial weight matrix W of layer, V, and that chooses between (-1,1) is random Number；

Step 3, optimization initial weight

(1). calculate hidden layer output matrix H_k=g (V_kH_k-1+W_kX)；

(2). the actual of network is output asError function is E_k=Tr [(Y_k-T)(Y_k-T)^T], wherein T is preferable defeated Go out；

Then solve the network weight U from hidden layer to output layer_kProblem be converted into the problem minimizing error function, i.e. find A young waiter in a wineshop or an inn takes advantage of solution U_kMake E_kMinimum；

Moore-penrose generalized inverse is utilized to obtain: U_k=(H_kH_k ^T)^-1H_kT^T, bring H into_kHidden layer has been obtained to output with T The weight matrix U of layer_k；

(3). calculate the error function E of training sample_kIf error function value is less than error threshold, then go to step 4；Otherwise, calculate Error function is about weights W_kAnd V_kGradient:

WhereinIt it is the generalized inverse of H；

(4). right value update formula is:

Wherein, learning rate η_kObtain by the way of line search；

(5). make k=k+1, return step (2)；

Step 4, measuring accuracy

Weights W according to the hidden layer after optimizing to input layer_kWith undertaking layer weight matrix V_k, and according to the step in step 3 (2) hidden layer obtained is to the weights U of output layer_k, obtain the network parameter of this algorithm, calculate the precision of test sample.