CN109857867A - A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network - Google Patents

A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network Download PDF

Info

Publication number
CN109857867A
CN109857867A CN201910056795.0A CN201910056795A CN109857867A CN 109857867 A CN109857867 A CN 109857867A CN 201910056795 A CN201910056795 A CN 201910056795A CN 109857867 A CN109857867 A CN 109857867A
Authority
CN
China
Prior art keywords
lstm
layer
network
activation function
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910056795.0A
Other languages
Chinese (zh)
Inventor
于舒娟
李润琦
高冲
杨杰
张昀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Nanjing Post and Telecommunication University
Original Assignee
Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University Of Posts And Telecommunications Nantong Institute Ltd, Nanjing Post and Telecommunication University filed Critical Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Priority to CN201910056795.0A priority Critical patent/CN109857867A/en
Publication of CN109857867A publication Critical patent/CN109857867A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

本发明公开了一种基于循环神经网络的激活函数参数化改进方法,包括步骤:步骤一,以长短期记忆网络为基础,构建双向长短期记忆网络Bi‑LSTM;步骤二,将Bi‑LSTM网络中各个隐藏层串联,在网络中最后一层隐藏层之后加入平均池化层,在平均池化层之后连接一个归一指数化函数层,建立密集连接的双向长短期记忆网络DC‑Bi‑LSTM;步骤三,运用参数化Sigmoid激活函数,在数据集上进行训练,记录密集连接的双向长短期记忆网络对句子分类的精确度,得到最佳精确度对应的参数化激活函数。本发明通过参数化激活函数模块,使得S型激活函数的非饱和区域得到扩展,同时避免函数的导数过小,防止梯度消失现象的发生。The invention discloses an improved method for parameterization of activation function based on a cyclic neural network, comprising the following steps: step 1, building a bidirectional long and short-term memory network Bi-LSTM based on a long-term and short-term memory network; step 2, converting the Bi-LSTM network Each hidden layer in the network is connected in series, an average pooling layer is added after the last hidden layer in the network, and a normalized exponential function layer is connected after the average pooling layer to establish a densely connected bidirectional long short-term memory network DC‑Bi‑LSTM Step 3: Use the parameterized Sigmoid activation function to train on the data set, record the accuracy of sentence classification by the densely connected bidirectional long-term and short-term memory network, and obtain the parameterized activation function corresponding to the best accuracy. By parameterizing the activation function module, the invention expands the non-saturated area of the S-shaped activation function, and at the same time prevents the derivative of the function from being too small, and prevents the occurrence of gradient disappearance.

Description

A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network
Technical field
The present invention relates to natural language processing and Text Classification fields, and in particular to one kind is based on circulation nerve net The activation primitive of network parameterizes improved method.
Background technique
Deep-neural-network is widely used among computer vision, however, the Recognition with Recurrent Neural Network of stack has ladder Degree disappears and overfitting problem.Therefore, on this basis, there are some novel Recognition with Recurrent Neural Network, based on intensive connection Two-way shot and long term memory network be a kind of highly effective Recognition with Recurrent Neural Network.
Activation primitive module is a basic module of neural network.Under normal circumstances, activation primitive has following some Property:
(1) non-linear: almost all of function under conditions of activation primitive is non-linear can by one two layers Expressed by neural network;
(2) differentiability: optimization method must have differentiability when being based on gradient;
(3) monotonicity: when activation primitive is dullness, single layer network can be ensured of convex function.
Based on two-way shot and long term memory network (DC (intensive connection)-Bi (two-way)-LSTM (the shot and long term note intensively connected Recall network), i.e., two-way long short-term memory Recognition with Recurrent Neural Network) in S (Sigmoid) type activation primitive, output area is limited, Will not occur the phenomenon that gradient explosion in training;And S type activation primitive derivation is simple, substantially reduces algorithm complexity. But at the same time, due to the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, can not effectively corrects weight, And easily activation primitive is made to fall into saturation region when carrying out back-propagation, it causes gradient to disappear, increases Recognition with Recurrent Neural Network Training difficulty.Therefore, adjustment parameter expands unsaturation region, but simultaneously in view of the extension of unsaturation region while, should The derivative value in region declines, and influences convergence rate, it is therefore desirable to Reasonable Parameters be arranged to control the non-saturated region of S type activation primitive The size and the derivative value at origin in domain.
Summary of the invention
To solve deficiency in the prior art, the present invention provides a kind of activation primitive parametrization based on Recognition with Recurrent Neural Network Improved method solves the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, and can not effectively correct weight, and Easily activation primitive is made to fall into saturation region when carrying out back-propagation, causes gradient to disappear, increase Recognition with Recurrent Neural Network The problem of training difficulty.
In order to achieve the above objectives, the present invention adopts the following technical scheme: a kind of activation letter based on Recognition with Recurrent Neural Network Number parametrization improved method, it is characterised in that: comprising steps of
Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM;
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, foundation intensively connects two-way Shot and long term memory network DC-Bi-LSTM;
Step 3 is trained on data set with parametrization Sigmoid activation primitive, records pair intensively connected To the accuracy of shot and long term memory network distich subclassification, the corresponding parametrization activation primitive of best accuracy is obtained.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: described two-way Shot and long term memory network Bi-LSTM is indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of LSTM were handled the text sequence carry out sequence sequence that data are concentrated in current time t Hide layer state;
Indicate that l layers of LSTM carry out what backward sequence was handled to the text sequence that data are concentrated in current time t Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the step Hidden layer each in Bi-LSTM network is connected in two, specifically: the input of Bi-LSTM network first tier is in data set Term vector sequence, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the normalizing Indexation function layer is soft-max layers.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the parameter Change sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the data When integrating as sentiment analysis subjectivity data set subj, sentence classify best accuracy it is corresponding parametrization sigmoid function ginseng Number are as follows: A=-0.5, B=0.
Advantageous effects of the invention: the present invention is on the basis of the two-way shot and long term memory network intensively connected On, by parameterizing activation primitive module, so that the unsaturation region of S type activation primitive is expanded, while avoiding function Derivative is too small, prevents the generation of gradient extinction tests.At sentiment analysis subjective data collection (subj [Pang and lee, 2004]) Parameter combination that is upper to pass through many experiments, having obtained making sentence classification accuracy to be obviously improved, and use instead other data sets into Row verifying, still works well.
Detailed description of the invention
Fig. 1 is the accuracy effect pair that the activation primitive of parametrization tests obtained sentence classification on different data sets Than figure.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.
The present invention is based on DC-Bi-LSTM networks, design a kind of Sigmoid activation primitive form with generality, knot The internal structure for closing LSTM modifies to activation primitive expression formula, and confirms letter by testing different parameter combinations Number expression-form, and then promote the accuracy of text classification.
A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network comprising the steps of:
Step 1 constructs two-way shot and long term memory network (Bi-LSTM) based on shot and long term memory network (LSTM); Bi-LSTM is 15 layers in the present embodiment.
Bi-LSTM can be indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of lstm were handled the text sequence carry out sequence sequence that data are concentrated in current time t Hide layer state;
Indicate that l layers of lstm carry out what backward sequence was handled to the text sequence that data are concentrated in current time t Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, establishes DC-Bi-LSTM network;
15 layers of Bi-LSTM network layer are connected, the input of first layer is in data set (can be subj data set) Term vector sequence, first layer output be sequences h1, the input of the second layer is the string of the input of first layer and the output of first layer Join sequence;And so on.
After the completion of 15 layers of Bi-LSTM building intensively connected, average pond is added after the last layer hidden layer in a network Change layer;Finally, connecting a normalizing indexation function layer (soft-max layers) after average pond layer.So far, intensive connection Two-way LSTM network struction finish.
Step 3, with parametrization sigmoid activation primitive, in data set (such as subj data set, Subjectivity Datasets (the subjectivity data set of sentiment analysis)) on be trained, record the two-way LSTM network that intensively connects to sentence The accuracy of classification obtains the corresponding parametrization activation primitive of best accuracy;
It chooses subj data set to be tested, repeatedly trains network, the accuracy of protocol sentence subclassification obtains best accurate Spend corresponding activation primitive.
Door and out gate are forgotten due to the input gate in DC-Bi-LSTM network in conjunction with the basic structure of DC-Bi-LSTM In output should be between [0,1], improved parametrization sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network;
The highest parametrization sigmoid activation primitive of the accuracy that will classify on subj data set is applied to other data sets It is verified, it was demonstrated that parametrization sigmoid function effectively improves the accuracy of sentence classification.By improved activation primitive application On the data sets such as mr, sst1, sst2, trec, cr, the sigmoid function after certificate parameter will effectively improve sentence classification Accuracy.
Obtained accuracy is combined using different parameters on table 1subj data set
From table 1 it follows that choosing A=-0.5, the parametrization sigmoid function effect of B=0 on subj data set Fruit is best.
From figure 1 it appears that being tested on other different data collection, relative to unmodified sigmoid function, parameter The activation primitive of change can make performance of the DC-Bi-LSTM in terms of sentence classification effectively be promoted.
The present invention is on the basis of the two-way shot and long term memory network intensively connected, by parameterizing activation primitive mould Block so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevents gradient from disappearing existing The generation of elephant.Pass through many experiments on subj data set, obtains the parameter group that sentence classification accuracy is obviously improved It closes.And use other data sets instead and verified, still work well.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (6)

1.一种基于循环神经网络的激活函数参数化改进方法,其特征在于:包括步骤:1. a kind of activation function parameterization improvement method based on cyclic neural network, is characterized in that: comprise the steps: 步骤一,以长短期记忆网络LSTM为基础,构建双向长短期记忆网络Bi-LSTM;The first step is to build a bidirectional long-term and short-term memory network Bi-LSTM based on the long-term and short-term memory network LSTM; 步骤二,将Bi-LSTM网络中各个隐藏层串联,在Bi-LSTM网络中最后一层隐藏层之后加入平均池化层,在平均池化层之后连接一个归一指数化函数层,建立密集连接的双向长短期记忆网络DC-Bi-LSTM;Step 2: Connect each hidden layer in the Bi-LSTM network in series, add an average pooling layer after the last hidden layer in the Bi-LSTM network, and connect a normalized exponential function layer after the average pooling layer to establish dense connections. The bidirectional long short-term memory network DC-Bi-LSTM; 步骤三,运用参数化Sigmoid激活函数,在数据集上进行训练,记录密集连接的双向长短期记忆网络对句子分类的精确度,得到最佳精确度对应的参数化激活函数。Step 3: Use the parameterized Sigmoid activation function to train on the data set, record the accuracy of sentence classification by the densely connected bidirectional long-term and short-term memory network, and obtain the parameterized activation function corresponding to the best accuracy. 2.根据权利要求1所述的一种基于循环神经网络的激活函数参数化改进方法,其特征是:所述双向长短期记忆网络Bi-LSTM用如下公式表示:2. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 1 is characterized in that: described bidirectional long short-term memory network Bi-LSTM is represented by following formula: l表示网络层数,t表示时刻;l represents the number of network layers, and t represents the time; 表示第l层LSTM在当前时刻t对数据集中的文本序列进行顺序排序处理得到的隐藏层状态; Represents the state of the hidden layer obtained by ordering the text sequence in the dataset at the current time t by the layer 1 LSTM; 表示第l层LSTM在当前时刻t对数据集中的文本序列进行逆序排序处理得到的隐藏层状态; Represents the state of the hidden layer obtained by the reverse sorting of the text sequence in the dataset by the 1st layer LSTM at the current time t; 表示第l层隐藏层在当前时刻t对数据集中的文本序列同时进行顺序和逆序两个方向的处理后得到的隐藏层输出; Represents the hidden layer output obtained after the lth hidden layer processes the text sequence in the dataset at the current time t in both order and reverse order; 表示t时刻第l层隐藏层输出;表示t时刻的原始输入文本序列; Represents the output of the lth hidden layer at time t; represents the original input text sequence at time t; e()表示文本序列中词向量的词嵌入格式;wt表示t时刻输入的词向量;e() represents the word embedding format of the word vector in the text sequence; w t represents the word vector input at time t; lstm(·)代表经过LSTM网络处理。lstm( ) represents processed by LSTM network. 3.根据权利要求1所述的一种基于循环神经网络的激活函数参数化改进方法,其特征是:所述步骤二中将Bi-LSTM网络中各个隐藏层串联,具体为:Bi-LSTM网络第一层的输入是数据集中的词向量序列,当前层的输入为前一层的输入与前一层的输出的串联序列。3. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 1 is characterized in that: in described step 2, each hidden layer in Bi-LSTM network is connected in series, specifically: Bi-LSTM network The input of the first layer is the sequence of word vectors in the dataset, and the input of the current layer is the concatenated sequence of the input of the previous layer and the output of the previous layer. 4.根据权利要求1所述的一种基于循环神经网络的激活函数参数化改进方法,其特征是:所述归一指数化函数层为soft-max层。4 . The method for improving activation function parameterization based on a cyclic neural network according to claim 1 , wherein the normalized exponential function layer is a soft-max layer. 5 . 5.根据权利要求1所述的一种基于循环神经网络的激活函数参数化改进方法,其特征是:所述参数化sigmoid激活函数为:5. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 1, is characterized in that: described parameterized sigmoid activation function is: 其中,σ(·)为改进的参数化S型激活函数,A,B为参数;x表示神经网络的输入。Among them, σ( ) is the improved parameterized sigmoid activation function, A and B are parameters; x represents the input of the neural network. 6.根据权利要求5所述的一种基于循环神经网络的激活函数参数化改进方法,其特征是:所述数据集为情感分析主观性数据集subj时,句子分类最佳精确度对应的参数化sigmoid函数的参数为:A=-0.5,B=0。6. a kind of activation function parameterization improvement method based on cyclic neural network according to claim 5, is characterized in that: when described data set is sentiment analysis subjectivity data set subj, the parameter corresponding to sentence classification best accuracy The parameters of the sigmoid function are: A=-0.5, B=0.
CN201910056795.0A 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network Pending CN109857867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910056795.0A CN109857867A (en) 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910056795.0A CN109857867A (en) 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Publications (1)

Publication Number Publication Date
CN109857867A true CN109857867A (en) 2019-06-07

Family

ID=66895373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910056795.0A Pending CN109857867A (en) 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Country Status (1)

Country Link
CN (1) CN109857867A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569358A (en) * 2019-08-20 2019-12-13 上海交通大学 Learning long-term dependencies and hierarchically structured text classification models, methods, and media
CN111209385A (en) * 2020-01-14 2020-05-29 重庆兆光科技股份有限公司 Consultation session unique answer optimizing method based on convex neural network
CN113609284A (en) * 2021-08-02 2021-11-05 河南大学 A method and device for automatic generation of text summaries integrating multiple semantics

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569358A (en) * 2019-08-20 2019-12-13 上海交通大学 Learning long-term dependencies and hierarchically structured text classification models, methods, and media
CN111209385A (en) * 2020-01-14 2020-05-29 重庆兆光科技股份有限公司 Consultation session unique answer optimizing method based on convex neural network
CN111209385B (en) * 2020-01-14 2024-02-02 重庆兆光科技股份有限公司 Convex neural network-based consultation dialogue unique answer optimizing method
CN113609284A (en) * 2021-08-02 2021-11-05 河南大学 A method and device for automatic generation of text summaries integrating multiple semantics
CN113609284B (en) * 2021-08-02 2024-08-06 河南大学 Automatic text abstract generation method and device integrating multiple semantics

Similar Documents

Publication Publication Date Title
CN108847223B (en) A speech recognition method based on deep residual neural network
CN105825235B (en) A kind of image-recognizing method based on multi-characteristic deep learning
CN108256482B (en) Face age estimation method for distributed learning based on convolutional neural network
CN112348191B (en) A Knowledge Base Completion Method Based on Multimodal Representation Learning
CN111507884A (en) Self-adaptive image steganalysis method and system based on deep convolutional neural network
CN109857867A (en) A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network
CN110213244A (en) A kind of network inbreak detection method based on space-time characteristic fusion
CN108427921A (en) A kind of face identification method based on convolutional neural networks
KR102154676B1 (en) Method for training top-down selective attention in artificial neural networks
CN110378208B (en) A Behavior Recognition Method Based on Deep Residual Networks
CN105205448A (en) Character recognition model training method based on deep learning and recognition method thereof
CN110458084B (en) Face age estimation method based on inverted residual error network
CN116416441A (en) Hyperspectral image feature extraction method based on multi-level variational autoencoder
CN108958217A (en) A kind of CAN bus message method for detecting abnormality based on deep learning
CN107194426A (en) A kind of image-recognizing method based on Spiking neutral nets
CN107423727B (en) Face complex expression recognition methods based on neural network
CN108776796A (en) A kind of action identification method based on global spatio-temporal attention model
CN110288192A (en) Quality detecting method, device, equipment and storage medium based on multiple Checking models
CN110443162A (en) A kind of two-part training method for disguised face identification
CN109117817A (en) The method and device of recognition of face
CN113807214A (en) Small target face recognition method based on knowledge distillation of deit affiliate network
CN107480723A (en) Texture Recognition based on partial binary threshold learning network
CN113239949A (en) Data reconstruction method based on 1D packet convolutional neural network
CN114972299A (en) A railway track defect detection method based on deep transfer learning
CN110942106A (en) A pooled convolutional neural network image classification method based on square mean

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607

RJ01 Rejection of invention patent application after publication