CN109857867A - A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network - Google Patents
A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network Download PDFInfo
- Publication number
- CN109857867A CN109857867A CN201910056795.0A CN201910056795A CN109857867A CN 109857867 A CN109857867 A CN 109857867A CN 201910056795 A CN201910056795 A CN 201910056795A CN 109857867 A CN109857867 A CN 109857867A
- Authority
- CN
- China
- Prior art keywords
- activation primitive
- lstm
- layer
- network
- parametrization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004913 activation Effects 0.000 title claims abstract description 56
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000000306 recurrent effect Effects 0.000 title claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 18
- 230000007787 long-term memory Effects 0.000 claims abstract description 18
- 239000012141 concentrate Substances 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000006403 short-term memory Effects 0.000 claims description 2
- 230000015654 memory Effects 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 4
- 230000008033 biological extinction Effects 0.000 abstract description 2
- 239000010410 layer Substances 0.000 description 52
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 229920006395 saturated elastomer Polymers 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Abstract
The invention discloses a kind of, and the activation primitive based on Recognition with Recurrent Neural Network parameterizes improved method, comprising steps of step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network;Step 2, hidden layer each in Bi-LSTM network is connected, average pond layer is added after the last layer hidden layer in a network, a normalizing indexation function layer is connected after average pond layer, establishes the two-way shot and long term memory network DC-Bi-LSTM intensively connected;Step 3 is trained on data set with parametrization Sigmoid activation primitive, is recorded the accuracy of the two-way shot and long term memory network distich subclassification intensively connected, is obtained the corresponding parametrization activation primitive of best accuracy.The present invention so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevent the generation of gradient extinction tests by parametrization activation primitive module.
Description
Technical field
The present invention relates to natural language processing and Text Classification fields, and in particular to one kind is based on circulation nerve net
The activation primitive of network parameterizes improved method.
Background technique
Deep-neural-network is widely used among computer vision, however, the Recognition with Recurrent Neural Network of stack has ladder
Degree disappears and overfitting problem.Therefore, on this basis, there are some novel Recognition with Recurrent Neural Network, based on intensive connection
Two-way shot and long term memory network be a kind of highly effective Recognition with Recurrent Neural Network.
Activation primitive module is a basic module of neural network.Under normal circumstances, activation primitive has following some
Property:
(1) non-linear: almost all of function under conditions of activation primitive is non-linear can by one two layers
Expressed by neural network;
(2) differentiability: optimization method must have differentiability when being based on gradient;
(3) monotonicity: when activation primitive is dullness, single layer network can be ensured of convex function.
Based on two-way shot and long term memory network (DC (intensive connection)-Bi (two-way)-LSTM (the shot and long term note intensively connected
Recall network), i.e., two-way long short-term memory Recognition with Recurrent Neural Network) in S (Sigmoid) type activation primitive, output area is limited,
Will not occur the phenomenon that gradient explosion in training;And S type activation primitive derivation is simple, substantially reduces algorithm complexity.
But at the same time, due to the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, can not effectively corrects weight,
And easily activation primitive is made to fall into saturation region when carrying out back-propagation, it causes gradient to disappear, increases Recognition with Recurrent Neural Network
Training difficulty.Therefore, adjustment parameter expands unsaturation region, but simultaneously in view of the extension of unsaturation region while, should
The derivative value in region declines, and influences convergence rate, it is therefore desirable to Reasonable Parameters be arranged to control the non-saturated region of S type activation primitive
The size and the derivative value at origin in domain.
Summary of the invention
To solve deficiency in the prior art, the present invention provides a kind of activation primitive parametrization based on Recognition with Recurrent Neural Network
Improved method solves the two-way saturated characteristic of S type activation primitive, gradient is easy to cause to disappear, and can not effectively correct weight, and
Easily activation primitive is made to fall into saturation region when carrying out back-propagation, causes gradient to disappear, increase Recognition with Recurrent Neural Network
The problem of training difficulty.
In order to achieve the above objectives, the present invention adopts the following technical scheme: a kind of activation letter based on Recognition with Recurrent Neural Network
Number parametrization improved method, it is characterised in that: comprising steps of
Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM;
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it
Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, foundation intensively connects two-way
Shot and long term memory network DC-Bi-LSTM;
Step 3 is trained on data set with parametrization Sigmoid activation primitive, records pair intensively connected
To the accuracy of shot and long term memory network distich subclassification, the corresponding parametrization activation primitive of best accuracy is obtained.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: described two-way
Shot and long term memory network Bi-LSTM is indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of LSTM were handled the text sequence carry out sequence sequence that data are concentrated in current time t
Hide layer state;
Indicate that l layers of LSTM carry out what backward sequence was handled to the text sequence that data are concentrated in current time t
Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse
The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the step
Hidden layer each in Bi-LSTM network is connected in two, specifically: the input of Bi-LSTM network first tier is in data set
Term vector sequence, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the normalizing
Indexation function layer is soft-max layers.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the parameter
Change sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network.
A kind of activation primitive based on Recognition with Recurrent Neural Network above-mentioned parameterizes improved method, it is characterized in that: the data
When integrating as sentiment analysis subjectivity data set subj, sentence classify best accuracy it is corresponding parametrization sigmoid function ginseng
Number are as follows: A=-0.5, B=0.
Advantageous effects of the invention: the present invention is on the basis of the two-way shot and long term memory network intensively connected
On, by parameterizing activation primitive module, so that the unsaturation region of S type activation primitive is expanded, while avoiding function
Derivative is too small, prevents the generation of gradient extinction tests.At sentiment analysis subjective data collection (subj [Pang and lee, 2004])
Parameter combination that is upper to pass through many experiments, having obtained making sentence classification accuracy to be obviously improved, and use instead other data sets into
Row verifying, still works well.
Detailed description of the invention
Fig. 1 is the accuracy effect pair that the activation primitive of parametrization tests obtained sentence classification on different data sets
Than figure.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention
Technical solution, and not intended to limit the protection scope of the present invention.
The present invention is based on DC-Bi-LSTM networks, design a kind of Sigmoid activation primitive form with generality, knot
The internal structure for closing LSTM modifies to activation primitive expression formula, and confirms letter by testing different parameter combinations
Number expression-form, and then promote the accuracy of text classification.
A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network comprising the steps of:
Step 1 constructs two-way shot and long term memory network (Bi-LSTM) based on shot and long term memory network (LSTM);
Bi-LSTM is 15 layers in the present embodiment.
Bi-LSTM can be indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate what l layers of lstm were handled the text sequence carry out sequence sequence that data are concentrated in current time t
Hide layer state;
Indicate that l layers of lstm carry out what backward sequence was handled to the text sequence that data are concentrated in current time t
Hide layer state;
The text sequence progress simultaneously for indicating that l layers of hidden layer concentrate data in current time t is sequentially and inverse
The hidden layer output obtained after the processing of sequence both direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
Step 2, by hidden layer each in Bi-LSTM network connect, in Bi-LSTM network the last layer hidden layer it
Average pond layer is added afterwards, a normalizing indexation function layer is connected after average pond layer, establishes DC-Bi-LSTM network;
15 layers of Bi-LSTM network layer are connected, the input of first layer is in data set (can be subj data set)
Term vector sequence, first layer output be sequences h1, the input of the second layer is the string of the input of first layer and the output of first layer
Join sequence;And so on.
After the completion of 15 layers of Bi-LSTM building intensively connected, average pond is added after the last layer hidden layer in a network
Change layer;Finally, connecting a normalizing indexation function layer (soft-max layers) after average pond layer.So far, intensive connection
Two-way LSTM network struction finish.
Step 3, with parametrization sigmoid activation primitive, in data set (such as subj data set, Subjectivity
Datasets (the subjectivity data set of sentiment analysis)) on be trained, record the two-way LSTM network that intensively connects to sentence
The accuracy of classification obtains the corresponding parametrization activation primitive of best accuracy;
It chooses subj data set to be tested, repeatedly trains network, the accuracy of protocol sentence subclassification obtains best accurate
Spend corresponding activation primitive.
Door and out gate are forgotten due to the input gate in DC-Bi-LSTM network in conjunction with the basic structure of DC-Bi-LSTM
In output should be between [0,1], improved parametrization sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network;
The highest parametrization sigmoid activation primitive of the accuracy that will classify on subj data set is applied to other data sets
It is verified, it was demonstrated that parametrization sigmoid function effectively improves the accuracy of sentence classification.By improved activation primitive application
On the data sets such as mr, sst1, sst2, trec, cr, the sigmoid function after certificate parameter will effectively improve sentence classification
Accuracy.
Obtained accuracy is combined using different parameters on table 1subj data set
From table 1 it follows that choosing A=-0.5, the parametrization sigmoid function effect of B=0 on subj data set
Fruit is best.
From figure 1 it appears that being tested on other different data collection, relative to unmodified sigmoid function, parameter
The activation primitive of change can make performance of the DC-Bi-LSTM in terms of sentence classification effectively be promoted.
The present invention is on the basis of the two-way shot and long term memory network intensively connected, by parameterizing activation primitive mould
Block so that the unsaturation region of S type activation primitive is expanded, while avoiding the derivative of function too small, prevents gradient from disappearing existing
The generation of elephant.Pass through many experiments on subj data set, obtains the parameter group that sentence classification accuracy is obviously improved
It closes.And use other data sets instead and verified, still work well.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations
Also it should be regarded as protection scope of the present invention.
Claims (6)
1. a kind of activation primitive based on Recognition with Recurrent Neural Network parameterizes improved method, it is characterised in that: comprising steps of
Step 1 constructs two-way shot and long term memory network Bi-LSTM based on shot and long term memory network LSTM;
Step 2 connects hidden layer each in Bi-LSTM network, adds after the last layer hidden layer in Bi-LSTM network
Enter average pond layer, a normalizing indexation function layer is connected after average pond layer, establishes the two-way length intensively connected
Phase memory network DC-Bi-LSTM;
Step 3 is trained on data set with parametrization Sigmoid activation primitive, records the two-way length intensively connected
The accuracy of short-term memory network distich subclassification obtains the corresponding parametrization activation primitive of best accuracy.
2. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature
Be: the two-way shot and long term memory network Bi-LSTM is indicated with following formula:
L indicates the network number of plies, and t indicates the moment;
Indicate that l layers of LSTM hide in current time t to what the text sequence carry out sequence sequence that data are concentrated was handled
Layer state;
Indicate that l layers of LSTM hide in current time t to what the text sequence progress backward sequence that data are concentrated was handled
Layer state;
Indicate text sequence progress simultaneously sequence and backward two that l layers of hidden layer concentrate data in current time t
The hidden layer output obtained after the processing in a direction;
Indicate l layers of hidden layer output of t moment;Indicate t moment is originally inputted text sequence;
E () indicates that the word of term vector in text sequence is embedded in format;wtIndicate the term vector of t moment input;
Lstm (), which is represented, passes through LSTM network processes.
3. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature
It is: hidden layer each in Bi-LSTM network is connected in the step 2, specifically: the input of Bi-LSTM network first tier is
Term vector sequence in data set, the input of current layer are the tandem sequence of the input of preceding layer and the output of preceding layer.
4. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature
Be: the normalizing indexation function layer is soft-max layers.
5. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 1 parameterizes improved method, feature
It is: the parametrization sigmoid activation primitive are as follows:
Wherein, σ () is improved parametrization S type activation primitive, and A, B are parameter;The input of x expression neural network.
6. a kind of activation primitive based on Recognition with Recurrent Neural Network according to claim 5 parameterizes improved method, feature
It is: when the data set is sentiment analysis subjectivity data set subj, the corresponding parametrization of the best accuracy of sentence classification
The parameter of sigmoid function are as follows: A=-0.5, B=0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910056795.0A CN109857867A (en) | 2019-01-22 | 2019-01-22 | A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910056795.0A CN109857867A (en) | 2019-01-22 | 2019-01-22 | A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109857867A true CN109857867A (en) | 2019-06-07 |
Family
ID=66895373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910056795.0A Pending CN109857867A (en) | 2019-01-22 | 2019-01-22 | A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109857867A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569358A (en) * | 2019-08-20 | 2019-12-13 | 上海交通大学 | Model, method and medium for learning long-term dependency and hierarchical structure text classification |
CN111209385A (en) * | 2020-01-14 | 2020-05-29 | 重庆兆光科技股份有限公司 | Consultation session unique answer optimizing method based on convex neural network |
CN113609284A (en) * | 2021-08-02 | 2021-11-05 | 河南大学 | Method and device for automatically generating text abstract fused with multivariate semantics |
-
2019
- 2019-01-22 CN CN201910056795.0A patent/CN109857867A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569358A (en) * | 2019-08-20 | 2019-12-13 | 上海交通大学 | Model, method and medium for learning long-term dependency and hierarchical structure text classification |
CN111209385A (en) * | 2020-01-14 | 2020-05-29 | 重庆兆光科技股份有限公司 | Consultation session unique answer optimizing method based on convex neural network |
CN111209385B (en) * | 2020-01-14 | 2024-02-02 | 重庆兆光科技股份有限公司 | Convex neural network-based consultation dialogue unique answer optimizing method |
CN113609284A (en) * | 2021-08-02 | 2021-11-05 | 河南大学 | Method and device for automatically generating text abstract fused with multivariate semantics |
CN113609284B (en) * | 2021-08-02 | 2024-08-06 | 河南大学 | Automatic text abstract generation method and device integrating multiple semantics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109857867A (en) | A kind of activation primitive parametrization improved method based on Recognition with Recurrent Neural Network | |
CN109615582B (en) | Face image super-resolution reconstruction method for generating countermeasure network based on attribute description | |
CN110443162B (en) | Two-stage training method for disguised face recognition | |
CN105825235B (en) | A kind of image-recognizing method based on multi-characteristic deep learning | |
CN109377452B (en) | Face image restoration method based on VAE and generation type countermeasure network | |
CN108764292A (en) | Deep learning image object mapping based on Weakly supervised information and localization method | |
CN106251303A (en) | A kind of image denoising method using the degree of depth full convolutional encoding decoding network | |
CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
CN108615010A (en) | Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern | |
CN110502988A (en) | Group positioning and anomaly detection method in video | |
CN110473142B (en) | Single image super-resolution reconstruction method based on deep learning | |
CN110458084A (en) | A kind of face age estimation method based on inversion residual error network | |
CN108073917A (en) | A kind of face identification method based on convolutional neural networks | |
CN106991355B (en) | Face recognition method of analytic dictionary learning model based on topology maintenance | |
CN108932527A (en) | Using cross-training model inspection to the method for resisting sample | |
CN106503661B (en) | Face gender identification method based on fireworks deepness belief network | |
CN114842267A (en) | Image classification method and system based on label noise domain self-adaption | |
CN113421237A (en) | No-reference image quality evaluation method based on depth feature transfer learning | |
CN115602152B (en) | Voice enhancement method based on multi-stage attention network | |
CN103226714A (en) | Sparse coding method reinforced based on larger coding coefficient | |
CN107437083B (en) | Self-adaptive pooling video behavior identification method | |
CN113298689B (en) | Large-capacity image steganography method | |
CN109213753A (en) | A kind of industrial system monitoring data restoration methods based on online PCA | |
CN114626042B (en) | Face verification attack method and device | |
CN104408470A (en) | Gender detection method based on average face preliminary learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190607 |