CN109857867A

CN109857867A - A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Info

Publication number: CN109857867A
Application number: CN201910056795.0A
Authority: CN
Inventors: 于舒娟; 李润琦; 高冲; 杨杰; 张昀
Original assignee: Nanjing University Of Posts And Telecommunications Nantong Institute Ltd; Nanjing Post and Telecommunication University
Current assignee: Nanjing University Of Posts And Telecommunications Nantong Institute Ltd; Nanjing Post and Telecommunication University
Priority date: 2019-01-22
Filing date: 2019-01-22
Publication date: 2019-06-07

Abstract

本发明公开了一种基于循环神经网络的激活函数参数化改进方法，包括步骤：步骤一，以长短期记忆网络为基础，构建双向长短期记忆网络Bi‑LSTM；步骤二，将Bi‑LSTM网络中各个隐藏层串联，在网络中最后一层隐藏层之后加入平均池化层，在平均池化层之后连接一个归一指数化函数层，建立密集连接的双向长短期记忆网络DC‑Bi‑LSTM；步骤三，运用参数化Sigmoid激活函数，在数据集上进行训练，记录密集连接的双向长短期记忆网络对句子分类的精确度，得到最佳精确度对应的参数化激活函数。本发明通过参数化激活函数模块，使得S型激活函数的非饱和区域得到扩展，同时避免函数的导数过小，防止梯度消失现象的发生。The invention discloses an improved method for parameterization of activation function based on a cyclic neural network, comprising the following steps: step 1, building a bidirectional long and short-term memory network Bi-LSTM based on a long-term and short-term memory network; step 2, converting the Bi-LSTM network Each hidden layer in the network is connected in series, an average pooling layer is added after the last hidden layer in the network, and a normalized exponential function layer is connected after the average pooling layer to establish a densely connected bidirectional long short-term memory network DC‑Bi‑LSTM Step 3: Use the parameterized Sigmoid activation function to train on the data set, record the accuracy of sentence classification by the densely connected bidirectional long-term and short-term memory network, and obtain the parameterized activation function corresponding to the best accuracy. By parameterizing the activation function module, the invention expands the non-saturated area of the S-shaped activation function, and at the same time prevents the derivative of the function from being too small, and prevents the occurrence of gradient disappearance.

Description

A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Technical field

The present invention relates to natural language processing and Text Classification fields, and in particular to one kind is based on circulation nerve net The activation primitive of network parameterizes improved method.

Background technique

Deep-neural-network is widely used among computer vision, however, the Recognition with Recurrent Neural Network of stack has ladder Degree disappears and overfitting problem.Therefore, on this basis, there are some novel Recognition with Recurrent Neural Network, based on intensive connection Two-way shot and long term memory network be a kind of highly effective Recognition with Recurrent Neural Network.

Activation primitive module is a basic module of neural network.Under normal circumstances, activation primitive has following some Property:

(1) non-linear: almost all of function under conditions of activation primitive is non-linear can by one two layers Expressed by neural network；

(2) differentiability: optimization method must have differentiability when being based on gradient；

(3) monotonicity: when activation primitive is dullness, single layer network can be ensured of convex function.

Based on two-way shot and long term memory network (DC (intensive connection)-Bi (two-way)-LSTM (the shot and long term note intensively connected Recall network), i.e., two-way long short-term memory Recognition with Recurrent Neural Network) in S (Sigmoid) type activation primitive, output area is limited, Will not occur the phenomenon that gradient explosion in training；And S type activation primitive derivation is simple, substantially reduces algorithm complexity. But at the same time, due to the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, can not effectively corrects weight, And easily activation primitive is made to fall into saturation region when carrying out back-propagation, it causes gradient to disappear, increases Recognition with Recurrent Neural Network Training difficulty.Therefore, adjustment parameter expands unsaturation region, but simultaneously in view of the extension of unsaturation region while, should The derivative value in region declines, and influences convergence rate, it is therefore desirable to Reasonable Parameters be arranged to control the non-saturated region of S type activation primitive The size and the derivative value at origin in domain.

Summary of the invention

To solve deficiency in the prior art, the present invention provides a kind of activation primitive parametrization based on Recognition with Recurrent Neural Network Improved method solves the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, and can not effectively correct weight, and Easily activation primitive is made to fall into saturation region when carrying out back-propagation, causes gradient to disappear, increase Recognition with Recurrent Neural Network The problem of training difficulty.

In order to achieve the above objectives, the present invention adopts the following technical scheme: a kind of activation letter based on Recognition with Recurrent Neural Network Number parametrization improved method, it is characterised in that: comprising steps of

Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM；

Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, foundation intensively connects two-way Shot and long term memory network DC-Bi-LSTM；

Step 3 is trained on data set with parametrization Sigmoid activation primitive, records pair intensively connected To the accuracy of shot and long term memory network distich subclassification, the corresponding parametrization activation primitive of best accuracy is obtained.

A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: described two-way Shot and long term memory network Bi-LSTM is indicated with following formula:

L indicates the network number of plies, and t indicates the moment；

Indicate what l layers of LSTM were handled the text sequence carry out sequence sequence that data are concentrated in current time t Hide layer state；

Indicate that l layers of LSTM carry out what backward sequence was handled to the text sequence that data are concentrated in current time t Hide layer state；

The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse The hidden layer output obtained after the processing of sequence both direction；

Indicate l layers of hidden layer output of t moment；Indicate t moment is originally inputted text sequence；

E () indicates that the word of term vector in text sequence is embedded in format；w_tIndicate the term vector of t moment input；

Lstm (), which is represented, passes through LSTM network processes.

A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the step Hidden layer each in Bi-LSTM network is connected in two, specifically: the input of Bi-LSTM network first tier is in data set Term vector sequence, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.

A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the normalizing Indexation function layer is soft-max layers.

A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the parameter Change sigmoid activation primitive are as follows:

Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter；The input of x expression neural network.

A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the data When integrating as sentiment analysis subjectivity data set subj, sentence classify best accuracy it is corresponding parametrization sigmoid function ginseng Number are as follows: A=-0.5, B=0.

Advantageous effects of the invention: the present invention is on the basis of the two-way shot and long term memory network intensively connected On, by parameterizing activation primitive module, so that the unsaturation region of S type activation primitive is expanded, while avoiding function Derivative is too small, prevents the generation of gradient extinction tests.At sentiment analysis subjective data collection (subj [Pang and lee, 2004]) Parameter combination that is upper to pass through many experiments, having obtained making sentence classification accuracy to be obviously improved, and use instead other data sets into Row verifying, still works well.

Detailed description of the invention

Fig. 1 is the accuracy effect pair that the activation primitive of parametrization tests obtained sentence classification on different data sets Than figure.

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.

The present invention is based on DC-Bi-LSTM networks, design a kind of Sigmoid activation primitive form with generality, knot The internal structure for closing LSTM modifies to activation primitive expression formula, and confirms letter by testing different parameter combinations Number expression-form, and then promote the accuracy of text classification.

A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network comprising the steps of:

Step 1 constructs two-way shot and long term memory network (Bi-LSTM) based on shot and long term memory network (LSTM)； Bi-LSTM is 15 layers in the present embodiment.

Bi-LSTM can be indicated with following formula:

L indicates the network number of plies, and t indicates the moment；

Lstm (), which is represented, passes through LSTM network processes.

Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, establishes DC-Bi-LSTM network；

15 layers of Bi-LSTM network layer are connected, the input of first layer is in data set (can be subj data set) Term vector sequence, first layer output be sequences h¹, the input of the second layer is the string of the input of first layer and the output of first layer Join sequence；And so on.

After the completion of 15 layers of Bi-LSTM building intensively connected, average pond is added after the last layer hidden layer in a network Change layer；Finally, connecting a normalizing indexation function layer (soft-max layers) after average pond layer.So far, intensive connection Two-way LSTM network struction finish.

Step 3, with parametrization sigmoid activation primitive, in data set (such as subj data set, Subjectivity Datasets (the subjectivity data set of sentiment analysis)) on be trained, record the two-way LSTM network that intensively connects to sentence The accuracy of classification obtains the corresponding parametrization activation primitive of best accuracy；

It chooses subj data set to be tested, repeatedly trains network, the accuracy of protocol sentence subclassification obtains best accurate Spend corresponding activation primitive.

Door and out gate are forgotten due to the input gate in DC-Bi-LSTM network in conjunction with the basic structure of DC-Bi-LSTM In output should be between [0,1], improved parametrization sigmoid activation primitive are as follows:

Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter；The input of x expression neural network；

The highest parametrization sigmoid activation primitive of the accuracy that will classify on subj data set is applied to other data sets It is verified, it was demonstrated that parametrization sigmoid function effectively improves the accuracy of sentence classification.By improved activation primitive application On the data sets such as mr, sst1, sst2, trec, cr, the sigmoid function after certificate parameter will effectively improve sentence classification Accuracy.

Obtained accuracy is combined using different parameters on table 1subj data set

From table 1 it follows that choosing A=-0.5, the parametrization sigmoid function effect of B=0 on subj data set Fruit is best.

From figure 1 it appears that being tested on other different data collection, relative to unmodified sigmoid function, parameter The activation primitive of change can make performance of the DC-Bi-LSTM in terms of sentence classification effectively be promoted.

The present invention is on the basis of the two-way shot and long term memory network intensively connected, by parameterizing activation primitive mould Block so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevents gradient from disappearing existing The generation of elephant.Pass through many experiments on subj data set, obtains the parameter group that sentence classification accuracy is obviously improved It closes.And use other data sets instead and verified, still work well.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims

1. a kind of activation function parameterization improvement method based on cyclic neural network, is characterized in that: comprise the steps:

The first step is to build a bidirectional long-term and short-term memory network Bi-LSTM based on the long-term and short-term memory network LSTM;

Step 2: Connect each hidden layer in the Bi-LSTM network in series, add an average pooling layer after the last hidden layer in the Bi-LSTM network, and connect a normalized exponential function layer after the average pooling layer to establish dense connections. The bidirectional long short-term memory network DC-Bi-LSTM;

Step 3: Use the parameterized Sigmoid activation function to train on the data set, record the accuracy of sentence classification by the densely connected bidirectional long-term and short-term memory network, and obtain the parameterized activation function corresponding to the best accuracy.

2. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 1 is characterized in that: described bidirectional long short-term memory network Bi-LSTM is represented by following formula:

l represents the number of network layers, and t represents the time;

Represents the state of the hidden layer obtained by ordering the text sequence in the dataset at the current time t by the layer 1 LSTM;

Represents the state of the hidden layer obtained by the reverse sorting of the text sequence in the dataset by the 1st layer LSTM at the current time t;

Represents the hidden layer output obtained after the lth hidden layer processes the text sequence in the dataset at the current time t in both order and reverse order;

Represents the output of the lth hidden layer at time t; represents the original input text sequence at time t;

e() represents the word embedding format of the word vector in the text sequence; w _t represents the word vector input at time t;

lstm( ) represents processed by LSTM network.

3. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 1 is characterized in that: in described step 2, each hidden layer in Bi-LSTM network is connected in series, specifically: Bi-LSTM network The input of the first layer is the sequence of word vectors in the dataset, and the input of the current layer is the concatenated sequence of the input of the previous layer and the output of the previous layer.

4 . The method for improving activation function parameterization based on a cyclic neural network according to claim 1 , wherein the normalized exponential function layer is a soft-max layer. 5 .

5. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 1, is characterized in that: described parameterized sigmoid activation function is:

Among them, σ( ) is the improved parameterized sigmoid activation function, A and B are parameters; x represents the input of the neural network.

6. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 5, is characterized in that: when described data set is sentiment analysis subjectivity data set subj, the parameter corresponding to sentence classification best accuracy The parameters of the sigmoid function are: A=-0.5, B=0.