A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network
Technical field
The present invention relates to natural language processing and Text Classification fields, and in particular to one kind is based on circulation nerve net
The activation primitive of network parameterizes improved method.
Background technique
Deep-neural-network is widely used among computer vision, however, the Recognition with Recurrent Neural Network of stack has ladder
Degree disappears and overfitting problem.Therefore, on this basis, there are some novel Recognition with Recurrent Neural Network, based on intensive connection
Two-way shot and long term memory network be a kind of highly effective Recognition with Recurrent Neural Network.
Activation primitive module is a basic module of neural network.Under normal circumstances, activation primitive has following some
Property:
(1) non-linear: almost all of function under conditions of activation primitive is non-linear can by one two layers
Expressed by neural network;
(2) differentiability: optimization method must have differentiability when being based on gradient;
(3) monotonicity: when activation primitive is dullness, single layer network can be ensured of convex function.
Based on two-way shot and long term memory network (DC (intensive connection)-Bi (two-way)-LSTM (the shot and long term note intensively connected
Recall network), i.e., two-way long short-term memory Recognition with Recurrent Neural Network) in S (Sigmoid) type activation primitive, output area is limited,
Will not occur the phenomenon that gradient explosion in training;And S type activation primitive derivation is simple, substantially reduces algorithm complexity.
But at the same time, due to the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, can not effectively corrects weight,
And easily activation primitive is made to fall into saturation region when carrying out back-propagation, it causes gradient to disappear, increases Recognition with Recurrent Neural Network
Training difficulty.Therefore, adjustment parameter expands unsaturation region, but simultaneously in view of the extension of unsaturation region while, should
The derivative value in region declines, and influences convergence rate, it is therefore desirable to Reasonable Parameters be arranged to control the non-saturated region of S type activation primitive
The size and the derivative value at origin in domain.
Summary of the invention
To solve deficiency in the prior art, the present invention provides a kind of activation primitive parametrization based on Recognition with Recurrent Neural Network
Improved method solves the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, and can not effectively correct weight, and
Easily activation primitive is made to fall into saturation region when carrying out back-propagation, causes gradient to disappear, increase Recognition with Recurrent Neural Network
The problem of training difficulty.
In order to achieve the above objectives, the present invention adopts the following technical scheme: a kind of activation letter based on Recognition with Recurrent Neural Network
Number parametrization improved method, it is characterised in that: comprising steps of
Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM;
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it
Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, foundation intensively connects two-way
Shot and long term memory network DC-Bi-LSTM;
Step 3 is trained on data set with parametrization Sigmoid activation primitive, records pair intensively connected
To the accuracy of shot and long term memory network distich subclassification, the corresponding parametrization activation primitive of best accuracy is obtained.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: described two-way
Shot and long term memory network Bi-LSTM is indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of LSTM were handled the text sequence carry out sequence sequence that data are concentrated in current time t
Hide layer state;
Indicate that l layers of LSTM carry out what backward sequence was handled to the text sequence that data are concentrated in current time t
Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse
The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the step
Hidden layer each in Bi-LSTM network is connected in two, specifically: the input of Bi-LSTM network first tier is in data set
Term vector sequence, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the normalizing
Indexation function layer is soft-max layers.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the parameter
Change sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the data
When integrating as sentiment analysis subjectivity data set subj, sentence classify best accuracy it is corresponding parametrization sigmoid function ginseng
Number are as follows: A=-0.5, B=0.
Advantageous effects of the invention: the present invention is on the basis of the two-way shot and long term memory network intensively connected
On, by parameterizing activation primitive module, so that the unsaturation region of S type activation primitive is expanded, while avoiding function
Derivative is too small, prevents the generation of gradient extinction tests.At sentiment analysis subjective data collection (subj [Pang and lee, 2004])
Parameter combination that is upper to pass through many experiments, having obtained making sentence classification accuracy to be obviously improved, and use instead other data sets into
Row verifying, still works well.
Detailed description of the invention
Fig. 1 is the accuracy effect pair that the activation primitive of parametrization tests obtained sentence classification on different data sets
Than figure.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention
Technical solution, and not intended to limit the protection scope of the present invention.
The present invention is based on DC-Bi-LSTM networks, design a kind of Sigmoid activation primitive form with generality, knot
The internal structure for closing LSTM modifies to activation primitive expression formula, and confirms letter by testing different parameter combinations
Number expression-form, and then promote the accuracy of text classification.
A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network comprising the steps of:
Step 1 constructs two-way shot and long term memory network (Bi-LSTM) based on shot and long term memory network (LSTM);
Bi-LSTM is 15 layers in the present embodiment.
Bi-LSTM can be indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of lstm were handled the text sequence carry out sequence sequence that data are concentrated in current time t
Hide layer state;
Indicate that l layers of lstm carry out what backward sequence was handled to the text sequence that data are concentrated in current time t
Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse
The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it
Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, establishes DC-Bi-LSTM network;
15 layers of Bi-LSTM network layer are connected, the input of first layer is in data set (can be subj data set)
Term vector sequence, first layer output be sequences h1, the input of the second layer is the string of the input of first layer and the output of first layer
Join sequence;And so on.
After the completion of 15 layers of Bi-LSTM building intensively connected, average pond is added after the last layer hidden layer in a network
Change layer;Finally, connecting a normalizing indexation function layer (soft-max layers) after average pond layer.So far, intensive connection
Two-way LSTM network struction finish.
Step 3, with parametrization sigmoid activation primitive, in data set (such as subj data set, Subjectivity
Datasets (the subjectivity data set of sentiment analysis)) on be trained, record the two-way LSTM network that intensively connects to sentence
The accuracy of classification obtains the corresponding parametrization activation primitive of best accuracy;
It chooses subj data set to be tested, repeatedly trains network, the accuracy of protocol sentence subclassification obtains best accurate
Spend corresponding activation primitive.
Door and out gate are forgotten due to the input gate in DC-Bi-LSTM network in conjunction with the basic structure of DC-Bi-LSTM
In output should be between [0,1], improved parametrization sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network;
The highest parametrization sigmoid activation primitive of the accuracy that will classify on subj data set is applied to other data sets
It is verified, it was demonstrated that parametrization sigmoid function effectively improves the accuracy of sentence classification.By improved activation primitive application
On the data sets such as mr, sst1, sst2, trec, cr, the sigmoid function after certificate parameter will effectively improve sentence classification
Accuracy.
Obtained accuracy is combined using different parameters on table 1subj data set
From table 1 it follows that choosing A=-0.5, the parametrization sigmoid function effect of B=0 on subj data set
Fruit is best.
From figure 1 it appears that being tested on other different data collection, relative to unmodified sigmoid function, parameter
The activation primitive of change can make performance of the DC-Bi-LSTM in terms of sentence classification effectively be promoted.
The present invention is on the basis of the two-way shot and long term memory network intensively connected, by parameterizing activation primitive mould
Block so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevents gradient from disappearing existing
The generation of elephant.Pass through many experiments on subj data set, obtains the parameter group that sentence classification accuracy is obviously improved
It closes.And use other data sets instead and verified, still work well.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations
Also it should be regarded as protection scope of the present invention.