CN109857867A - A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network - Google Patents

A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network Download PDF

Info

Publication number
CN109857867A
CN109857867A CN201910056795.0A CN201910056795A CN109857867A CN 109857867 A CN109857867 A CN 109857867A CN 201910056795 A CN201910056795 A CN 201910056795A CN 109857867 A CN109857867 A CN 109857867A
Authority
CN
China
Prior art keywords
activation primitive
lstm
layer
network
parametrization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910056795.0A
Other languages
Chinese (zh)
Inventor
于舒娟
李润琦
高冲
杨杰
张昀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University Of Posts And Telecommunications Nantong Institute Ltd, Nanjing Post and Telecommunication University filed Critical Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Priority to CN201910056795.0A priority Critical patent/CN109857867A/en
Publication of CN109857867A publication Critical patent/CN109857867A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention discloses a kind of, and the activation primitive based on Recognition with Recurrent Neural Network parameterizes improved method, comprising steps of step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network;Step 2, hidden layer each in Bi-LSTM network is connected, average pond layer is added after the last layer hidden layer in a network, a normalizing indexation function layer is connected after average pond layer, establishes the two-way shot and long term memory network DC-Bi-LSTM intensively connected;Step 3 is trained on data set with parametrization Sigmoid activation primitive, is recorded the accuracy of the two-way shot and long term memory network distich subclassification intensively connected, is obtained the corresponding parametrization activation primitive of best accuracy.The present invention so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevent the generation of gradient extinction tests by parametrization activation primitive module.

Description

A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network
Technical field
The present invention relates to natural language processing and Text Classification fields, and in particular to one kind is based on circulation nerve net The activation primitive of network parameterizes improved method.
Background technique
Deep-neural-network is widely used among computer vision, however, the Recognition with Recurrent Neural Network of stack has ladder Degree disappears and overfitting problem.Therefore, on this basis, there are some novel Recognition with Recurrent Neural Network, based on intensive connection Two-way shot and long term memory network be a kind of highly effective Recognition with Recurrent Neural Network.
Activation primitive module is a basic module of neural network.Under normal circumstances, activation primitive has following some Property:
(1) non-linear: almost all of function under conditions of activation primitive is non-linear can by one two layers Expressed by neural network;
(2) differentiability: optimization method must have differentiability when being based on gradient;
(3) monotonicity: when activation primitive is dullness, single layer network can be ensured of convex function.
Based on two-way shot and long term memory network (DC (intensive connection)-Bi (two-way)-LSTM (the shot and long term note intensively connected Recall network), i.e., two-way long short-term memory Recognition with Recurrent Neural Network) in S (Sigmoid) type activation primitive, output area is limited, Will not occur the phenomenon that gradient explosion in training;And S type activation primitive derivation is simple, substantially reduces algorithm complexity. But at the same time, due to the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, can not effectively corrects weight, And easily activation primitive is made to fall into saturation region when carrying out back-propagation, it causes gradient to disappear, increases Recognition with Recurrent Neural Network Training difficulty.Therefore, adjustment parameter expands unsaturation region, but simultaneously in view of the extension of unsaturation region while, should The derivative value in region declines, and influences convergence rate, it is therefore desirable to Reasonable Parameters be arranged to control the non-saturated region of S type activation primitive The size and the derivative value at origin in domain.
Summary of the invention
To solve deficiency in the prior art, the present invention provides a kind of activation primitive parametrization based on Recognition with Recurrent Neural Network Improved method solves the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, and can not effectively correct weight, and Easily activation primitive is made to fall into saturation region when carrying out back-propagation, causes gradient to disappear, increase Recognition with Recurrent Neural Network The problem of training difficulty.
In order to achieve the above objectives, the present invention adopts the following technical scheme: a kind of activation letter based on Recognition with Recurrent Neural Network Number parametrization improved method, it is characterised in that: comprising steps of
Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM;
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, foundation intensively connects two-way Shot and long term memory network DC-Bi-LSTM;
Step 3 is trained on data set with parametrization Sigmoid activation primitive, records pair intensively connected To the accuracy of shot and long term memory network distich subclassification, the corresponding parametrization activation primitive of best accuracy is obtained.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: described two-way Shot and long term memory network Bi-LSTM is indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of LSTM were handled the text sequence carry out sequence sequence that data are concentrated in current time t Hide layer state;
Indicate that l layers of LSTM carry out what backward sequence was handled to the text sequence that data are concentrated in current time t Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the step Hidden layer each in Bi-LSTM network is connected in two, specifically: the input of Bi-LSTM network first tier is in data set Term vector sequence, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the normalizing Indexation function layer is soft-max layers.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the parameter Change sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the data When integrating as sentiment analysis subjectivity data set subj, sentence classify best accuracy it is corresponding parametrization sigmoid function ginseng Number are as follows: A=-0.5, B=0.
Advantageous effects of the invention: the present invention is on the basis of the two-way shot and long term memory network intensively connected On, by parameterizing activation primitive module, so that the unsaturation region of S type activation primitive is expanded, while avoiding function Derivative is too small, prevents the generation of gradient extinction tests.At sentiment analysis subjective data collection (subj [Pang and lee, 2004]) Parameter combination that is upper to pass through many experiments, having obtained making sentence classification accuracy to be obviously improved, and use instead other data sets into Row verifying, still works well.
Detailed description of the invention
Fig. 1 is the accuracy effect pair that the activation primitive of parametrization tests obtained sentence classification on different data sets Than figure.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.
The present invention is based on DC-Bi-LSTM networks, design a kind of Sigmoid activation primitive form with generality, knot The internal structure for closing LSTM modifies to activation primitive expression formula, and confirms letter by testing different parameter combinations Number expression-form, and then promote the accuracy of text classification.
A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network comprising the steps of:
Step 1 constructs two-way shot and long term memory network (Bi-LSTM) based on shot and long term memory network (LSTM); Bi-LSTM is 15 layers in the present embodiment.
Bi-LSTM can be indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of lstm were handled the text sequence carry out sequence sequence that data are concentrated in current time t Hide layer state;
Indicate that l layers of lstm carry out what backward sequence was handled to the text sequence that data are concentrated in current time t Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, establishes DC-Bi-LSTM network;
15 layers of Bi-LSTM network layer are connected, the input of first layer is in data set (can be subj data set) Term vector sequence, first layer output be sequences h1, the input of the second layer is the string of the input of first layer and the output of first layer Join sequence;And so on.
After the completion of 15 layers of Bi-LSTM building intensively connected, average pond is added after the last layer hidden layer in a network Change layer;Finally, connecting a normalizing indexation function layer (soft-max layers) after average pond layer.So far, intensive connection Two-way LSTM network struction finish.
Step 3, with parametrization sigmoid activation primitive, in data set (such as subj data set, Subjectivity Datasets (the subjectivity data set of sentiment analysis)) on be trained, record the two-way LSTM network that intensively connects to sentence The accuracy of classification obtains the corresponding parametrization activation primitive of best accuracy;
It chooses subj data set to be tested, repeatedly trains network, the accuracy of protocol sentence subclassification obtains best accurate Spend corresponding activation primitive.
Door and out gate are forgotten due to the input gate in DC-Bi-LSTM network in conjunction with the basic structure of DC-Bi-LSTM In output should be between [0,1], improved parametrization sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network;
The highest parametrization sigmoid activation primitive of the accuracy that will classify on subj data set is applied to other data sets It is verified, it was demonstrated that parametrization sigmoid function effectively improves the accuracy of sentence classification.By improved activation primitive application On the data sets such as mr, sst1, sst2, trec, cr, the sigmoid function after certificate parameter will effectively improve sentence classification Accuracy.
Obtained accuracy is combined using different parameters on table 1subj data set
From table 1 it follows that choosing A=-0.5, the parametrization sigmoid function effect of B=0 on subj data set Fruit is best.
From figure 1 it appears that being tested on other different data collection, relative to unmodified sigmoid function, parameter The activation primitive of change can make performance of the DC-Bi-LSTM in terms of sentence classification effectively be promoted.
The present invention is on the basis of the two-way shot and long term memory network intensively connected, by parameterizing activation primitive mould Block so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevents gradient from disappearing existing The generation of elephant.Pass through many experiments on subj data set, obtains the parameter group that sentence classification accuracy is obviously improved It closes.And use other data sets instead and verified, still work well.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (6)

1. a kind of activation primitive based on Recognition with Recurrent Neural Network parameterizes improved method, it is characterised in that: comprising steps of
Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM;
Step 2 connects hidden layer each in Bi-LSTM network, adds after the last layer hidden layer in Bi-LSTM network Enter average pond layer, a normalizing indexation function layer is connected after average pond layer, establishes the two-way length intensively connected Phase memory network DC-Bi-LSTM;
Step 3 is trained on data set with parametrization Sigmoid activation primitive, records the two-way length intensively connected The accuracy of short-term memory network distich subclassification obtains the corresponding parametrization activation primitive of best accuracy.
2. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature Be: the two-way shot and long term memory network Bi-LSTM is indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate that l layers of LSTM hide in current time t to what the text sequence carry out sequence sequence that data are concentrated was handled Layer state;
Indicate that l layers of LSTM hide in current time t to what the text sequence progress backward sequence that data are concentrated was handled Layer state;
Indicate text sequence progress simultaneously sequence and backward two that l layers of hidden layer concentrate data in current time t The hidden layer output obtained after the processing in a direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
3. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature It is: hidden layer each in Bi-LSTM network is connected in the step 2, specifically: the input of Bi-LSTM network first tier is Term vector sequence in data set, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.
4. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature Be: the normalizing indexation function layer is soft-max layers.
5. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature It is: the parametrization sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network.
6. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 5 parameterizes improved method, feature It is: when the data set is sentiment analysis subjectivity data set subj, the corresponding parametrization of the best accuracy of sentence classification The parameter of sigmoid function are as follows: A=-0.5, B=0.
CN201910056795.0A 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network Pending CN109857867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910056795.0A CN109857867A (en) 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910056795.0A CN109857867A (en) 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Publications (1)

Publication Number Publication Date
CN109857867A true CN109857867A (en) 2019-06-07

Family

ID=66895373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910056795.0A Pending CN109857867A (en) 2019-01-22 2019-01-22 A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network

Country Status (1)

Country Link
CN (1) CN109857867A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569358A (en) * 2019-08-20 2019-12-13 上海交通大学 Model, method and medium for learning long-term dependency and hierarchical structure text classification
CN111209385A (en) * 2020-01-14 2020-05-29 重庆兆光科技股份有限公司 Consultation session unique answer optimizing method based on convex neural network
CN113609284A (en) * 2021-08-02 2021-11-05 河南大学 Method and device for automatically generating text abstract fused with multivariate semantics

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569358A (en) * 2019-08-20 2019-12-13 上海交通大学 Model, method and medium for learning long-term dependency and hierarchical structure text classification
CN111209385A (en) * 2020-01-14 2020-05-29 重庆兆光科技股份有限公司 Consultation session unique answer optimizing method based on convex neural network
CN111209385B (en) * 2020-01-14 2024-02-02 重庆兆光科技股份有限公司 Convex neural network-based consultation dialogue unique answer optimizing method
CN113609284A (en) * 2021-08-02 2021-11-05 河南大学 Method and device for automatically generating text abstract fused with multivariate semantics
CN113609284B (en) * 2021-08-02 2024-08-06 河南大学 Automatic text abstract generation method and device integrating multiple semantics

Similar Documents

Publication Publication Date Title
CN109857867A (en) A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network
CN109615582B (en) Face image super-resolution reconstruction method for generating countermeasure network based on attribute description
CN104067314B (en) Humanoid image partition method
CN110443162B (en) Two-stage training method for disguised face recognition
CN105825235B (en) A kind of image-recognizing method based on multi-characteristic deep learning
CN109377452B (en) Face image restoration method based on VAE and generation type countermeasure network
CN106251303A (en) A kind of image denoising method using the degree of depth full convolutional encoding decoding network
CN110459225B (en) Speaker recognition system based on CNN fusion characteristics
CN110378208B (en) Behavior identification method based on deep residual error network
CN108615010A (en) Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern
CN110473142B (en) Single image super-resolution reconstruction method based on deep learning
CN110458084A (en) A kind of face age estimation method based on inversion residual error network
CN110502988A (en) Group positioning and anomaly detection method in video
CN108073917A (en) A kind of face identification method based on convolutional neural networks
CN106991355B (en) Face recognition method of analytic dictionary learning model based on topology maintenance
CN108932527A (en) Using cross-training model inspection to the method for resisting sample
CN109117795B (en) Neural network expression recognition method based on graph structure
CN111178319A (en) Video behavior identification method based on compression reward and punishment mechanism
CN106503661A (en) Face gender identification method based on fireworks depth belief network
CN113421237B (en) No-reference image quality evaluation method based on depth feature transfer learning
CN115602152B (en) Voice enhancement method based on multi-stage attention network
CN103226714A (en) Sparse coding method reinforced based on larger coding coefficient
CN113033310A (en) Expression recognition method based on visual self-attention network
CN114626042B (en) Face verification attack method and device
CN106778701A (en) A kind of fruits and vegetables image-recognizing method of the convolutional neural networks of addition Dropout

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607