CN110472238A - Text snippet method based on level interaction attention - Google Patents
Text snippet method based on level interaction attention Download PDFInfo
- Publication number
- CN110472238A CN110472238A CN201910677195.6A CN201910677195A CN110472238A CN 110472238 A CN110472238 A CN 110472238A CN 201910677195 A CN201910677195 A CN 201910677195A CN 110472238 A CN110472238 A CN 110472238A
- Authority
- CN
- China
- Prior art keywords
- vector
- layer
- lstm
- output
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000003993 interaction Effects 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 11
- 238000012512 characterization method Methods 0.000 claims description 10
- 238000011161 development Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 4
- 230000007787 long-term memory Effects 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 abstract description 6
- 239000000284 extract Substances 0.000 abstract description 3
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 description 7
- 238000013461 design Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000007500 overflow downdraw method Methods 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to the text snippet methods based on level interaction attention, belong to natural language processing technique field.The present invention extracts the feature breath of encoder different levels by level interaction attention to instruct the generation of abstract.While in order to avoid utilizing variation information bottleneck compressed data noise because introducing different levels feature due to bring information redundancy problem.The present invention is directed to production text snippet, under the encoding and decoding frame based on attention, encoder multilayer contextual information is extracted to instruct decoding process by attention mechanism, while information is constrained by introducing variation information bottleneck, to improve the quality of production text snippet.The experimental results showed that this method can significantly improve performance of the encoding and decoding frame in production abstract task.
Description
Technical field
The present invention relates to the text snippet methods based on level interaction attention, belong to natural language processing technique field.
Background technique
With the development of depth learning technology, production text snippet method becomes the hot spot studied instantly.Traditional
Coding/decoding model based on attention mechanism usually only considers semantic table of the semantic information of encoder high level as context
Sign, and have ignored the minutias such as the word level structure of low layer neural network acquisition.The present invention proposes a kind of based on level interaction note
The multilayer feature of meaning power mechanism extracts with fusion method the feature for obtaining encoder different levels, while introducing in decoding end
Variation information bottleneck is compressed and is denoised to fuse information, to generate higher-quality abstract.
Summary of the invention
The present invention provides the text snippet methods based on level interaction attention, can obtain encoder different levels
Feature, while introducing variation information bottleneck in decoding end and fuse information is compressed and denoised, to generate higher-quality
Abstract is not concerned only with encoder higher level of abstraction feature when generating abstract, while extracting the detailed information of low layer to improve abstract
Generate quality.
The technical scheme is that the text snippet method based on level interaction attention, described based on level interaction
Specific step is as follows for the text snippet method of attention:
Step1, use text snippet field English data set Gigaword as training set, using pretreatment script pair
Data set is pre-processed, and 3,800,000 and 18.9 ten thousand training set and development set are respectively obtained, and each training sample includes a pair
Input text and abstract sentence;
As a preferred solution of the present invention, the specific steps of the step Step1 are as follows: data are standardized,
All turn small letter including all words of data set, all numbers are replaced with into #, the word by frequency of occurrence in corpus less than 5 times
Replace with UNK mark etc.;As test set after selecting a part of data to be removed and screen in development set.
Step2, training set is encoded using two-way LSTM, the number of plies is set as three layers;Encoder uses two-way length
Phase memory network Bi-Directional LSTM, BILSTM, BILSTM include forward and backward LSTM, and forward direction LSTM is from left-hand
Right reading list entries obtains forward coding vector, and backward LSTM is read after sequence obtains from right to left to coding vector, most
The vector that forward and backward coding vector splices to obtain list entries is characterized afterwards.
Step3, decoder use unidirectional LSTM network, input sentence to be decoded and calculate each layer context vector: decoding
Device uses unidirectional LSTM network, reads the state vector initialization of encoder last moment, is then characterized according to Input context
Vector generates abstract sequence by word, wherein the length for generating abstract is necessarily less than the length equal to list entries;In decoding,
The word that decoder reads last moment target word is embedded in vector, the hidden state vector of last moment and the context at current time
Vector generates the hidden state vector at the moment;Introduce attention mechanism, according to the hidden state of last moment decoder, encode to
The context vector at current time is calculated in amount;Then it is calculated by current time context vector and hidden state vector
To the output vector at current time, and then the output vector at the current time output probability on goal-selling vocabulary is calculated.
Step4, for multilayer coding/decoding model, codec includes multilayer LSTM, falls into a trap and counts in each layer of LSTM
Hidden state representation between layer and current layer, so that the context vector on upper layer is fused to current layer;
As a preferred solution of the present invention, the specific steps of the step Step4 are as follows:
Step4.1, the input of the context vector and hidden state vector on upper layer as current layer is merged;
Step4.2, the input feeding LSTM of current layer is obtained into the output of current layer network;
Target output is calculated in word in the output vector of the last layer of Step4.3, calculating multilayer decoder network
Probability distribution on table.
Step5, the output of each layer context vector and current layer with characteristic information is spliced, is obtained current
The hidden state of decoder of layer;
As a preferred solution of the present invention, the specific steps of the step Step5 are as follows:
Step5.1, in network current layer, splice respectively to each layer context vector is obtained, obtain cross-layer fusion
Context vector and the hidden state of decoder, it comprises the characteristic informations of encoder different levels;
Step5.2, output vector is calculated using the hidden state of decoder and context vector, and then can calculates
Output probability of the output vector on vocabulary.
Step6, the contextual information for incorporating different levels can bring the redundancy and noise of information, utilize variation information bottle
Neck is compressed and is denoised to data.
As a preferred solution of the present invention, the specific steps of the step Step6 are as follows:
Step6.1, given list entries, coding/decoding model generate abstract sequence by calculating probability;
Step6.2, the log-likelihood function of abstract probability is generated by maximization come learning model parameter;
Step6.3, intermediate characterization of the information bottleneck as coding is introduced, constructs the damage from intermediate characterization to output sequence
It loses, the intersection entropy loss as classification;
Step6.4, constraint is added, it is desirable that the distribution of probability and the KL divergence Kullback-Leibler of standardized normal distribution
Divergence is small as far as possible.
The beneficial effects of the present invention are:
1, the present invention proposes to obtain different layers by attention based on the coding/decoding model of level interaction attention mechanism
Secondary semantic information improves the generation quality of abstract.
2, present invention firstly provides variation information bottleneck is applied to summarization generation task, data are compressed and is gone
It makes an uproar, the contextual information for advantageously reducing involvement different levels can bring the redundancy and noise of information.
3, the present invention proposes that a kind of level interaction attention mechanism extracts encoder different levels feature, makes a summary generating
When be not concerned only with encoder higher level of abstraction feature, while extracting the detailed information of low layer to improve summarization generation quality.
Detailed description of the invention
Fig. 1 is the flow chart in the present invention;
Fig. 2 is the encoding and decoding frame diagram proposed by the present invention based on attention;
Fig. 3 is syncretizing mechanism figure in layer proposed by the present invention;
Fig. 4 is cross-layer syncretizing mechanism figure proposed by the present invention.
Specific embodiment
Embodiment 1: as Figure 1-Figure 4, described to be based on level based on the text snippet method of level interaction attention
Specific step is as follows for the text snippet method of interaction attention:
Step1, use English data set Gigaword as training set, data set is carried out using pretreatment script pre-
Processing, respectively obtains 3,800,000 and 18.9 ten thousand training set and development set, and each training sample includes a pair of of input text and plucks
Want sentence;
Step2, encoder encode training set using two-way LSTM, and the number of plies is set as three layers;
Step3, decoder use unidirectional LSTM network, input sentence to be decoded and calculate each layer context vector;
Step4, for multilayer coding/decoding model, codec includes multilayer LSTM, falls into a trap in each layer of LSTM and counts layer in
Hidden state representation between current layer, so that the context vector on upper layer is fused to current layer;
Step5, the output of each layer context vector and current layer with characteristic information is spliced, is obtained current
The hidden state of decoder of layer;
Step6, the contextual information for incorporating different levels can bring the redundancy and noise of information, utilize variation information bottle
Neck (Variational Information Bottleneck, VIB) is compressed and is denoised to data.
As a preferred solution of the present invention, the specific steps of the step Step1 are as follows: data are standardized,
All turn small letter including all words of data set, all numbers are replaced with into #, the word by frequency of occurrence in corpus less than 5 times
Replace with UNK mark etc..8000 are randomly choosed from 18.9 ten thousand development sets and is used as development set, select 2000 datas as survey
Examination collection.Sentence of the test set Central Plains text size less than 5 is removed, finally screening obtains 1951 datas as test set.In order to
The generalization ability of model is verified, simultaneous selection DUC2004 of the present invention is as test set.DUC2004 data set only includes 500
Text, the corresponding 4 standards abstract sentence of each input text.
The design of this preferred embodiment is important component of the invention, and predominantly the present invention collects corpus process, is this hair
Bright identification events sequential relationship provides data supporting.
As a preferred solution of the present invention, the specific steps of the step Step2 are as follows:
Inventive encoder uses two-way shot and long term memory network (Bi-Directional LSTM, BILSTM), with
LSTM is compared, and BILSTM includes forward and backward LSTM, forward direction LSTM read from left to right list entries obtain forward coding to
AmountAnd backward LSTM is read after sequence obtains from right to left to coding vectorFollowing institute
Show.
Wherein,WithTo LSTM and backward LSTM network before respectively indicating, finally forward and backward is compiled
Code vector splices the vector characterization for obtaining list entries
The design of this preferred embodiment is important component of the invention, and the process predominantly of the invention encoded utilizes LSTM
Modeling is carried out to sentence to have a problem that, can not exactly encode information from back to front.In more fine-grained classification, such as
For the commendation of strong degree, the commendation of weak degree, neutrality, the derogatory sense of weak degree, Qiang Chengdu derogatory sense five classification tasks need
Pay attention to emotion word, degree word, the interaction between negative word.Two-way semantic dependency can be preferably captured by BiLSTM.
As a preferred solution of the present invention, the specific steps of the step Step3 are as follows:
Step3.1, decoder use unidirectional LSTM network, and wherein s indicates sequence starting.
t0Moment, decoder read s and the state vector of encoder last moment to predict y1Output probability;Then root
Vector is characterized according to Input context, abstract sequence is generated by word, wherein the length for generating abstract is necessarily less than equal to list entries
Length;
Step3.2, moment t is being decoded, the word that decoder reads t-1 moment target word is embedded in vector wt-1, hidden state to
Measure st-1With context vector ctGenerate the hidden state vector s of t momentt, as shown in formula (3):
st=LSTM (wt-1,st-1,ct) (3)
Step3.3, as shown in Fig. 2, decoder introduce attention mechanism, according to the hidden state s of t-1 moment decodert-1、
The context vector c of t moment is calculated in coding vector ht.Shown in detailed process such as formula (4,5,6):
Step3.4, then pass through t moment context vector ctWith hidden state vector stBe calculated the output of t moment to
Measure pt, and then p is calculatedtThe output probability p on goal-selling vocabularyvocab,t.It is specific to calculate as shown in formula (7,8):
pt=tanh (Wm([st;ct])+bm) (7)
Pvocab,t=softmax (Wppt+bp) (8)
The design of this preferred embodiment is important component of the invention, decoded process predominantly of the invention.LSTM is avoided
Long-rang dependence problem.For LSTM, " remembeing " information is a kind of behavior of default for a long time, rather than is difficult to learn
Thing.
As a preferred solution of the present invention, the specific steps of the step Step4 are as follows:
Syncretizing mechanism in layer:
Syncretizing mechanism (Inner-Layer Merge) is intended to incorporate upper layer context vector the coding of current layer in layer
In, to realize the fusion of multilevel encoder information.
Step4.1, the input of k-1 layers of context vector and hidden state vector as kth layer is merged.It is specific to calculate public affairs
Shown in formula such as formula (9,10,11).
Wherein,For the context vector of the k-1 layer of acquisition,For -1 layer of kth of hidden state vector.Pass through calculating
Obtain the input vector of kth layer
Step4.2, then being sent to kth layer LSTM obtains the output of kth layer network
The output vector p of the last layer of Step4.3, calculating multilayer decoder networkt, it is defeated that target is finally calculated
Probability distribution P on vocabulary outvocab。
The design of this preferred embodiment is important component of the invention, this based on the more of level interaction attention mechanism
Layer feature extraction and fusion method obtain the features of encoder different levels, solves traditional based on attention mechanism
Coding/decoding model usually only considers characterizing semantics of the semantic information of encoder high level as context, and has ignored low layer mind
The minutias such as the word level structure obtained through network.To generate higher-quality abstract.
As a preferred solution of the present invention, the specific steps of the step Step5 are as follows:
Cross-layer syncretizing mechanism:
Cross-layer syncretizing mechanism (Cross-Layer Merge) melts in multilayer context vector of the last layer to acquisition
It closes, it is specific as shown in Figure 4.
Step5.1, at r layers of network, to obtaining each layer context vectorWithSpliced respectively, obtains cross-layer fusion
Context vector ctWith the hidden state s of decodert, it comprises the characteristic informations of encoder different levels.
Step5.2, s is finally utilizedtAnd ctOutput vector p is calculatedt.Under specific formula such as formula (12,13,14):
pt=tanh (Wm([st;ct])+bm) (14)
P is finally calculatedtOutput probability P on vocabulary vocabt,vocab。
The design of this preferred embodiment is important component of the invention, this based on the more of level interaction attention mechanism
Layer feature extraction and fusion method obtain the features of encoder different levels, solves traditional based on attention mechanism
Coding/decoding model usually only considers characterizing semantics of the semantic information of encoder high level as context, and has ignored low layer mind
The minutias such as the word level structure obtained through network.To generate higher-quality abstract.
As a preferred solution of the present invention, the specific steps of the step Step6 are as follows:
Variation information bottleneck introduces intermediate characterization of the Z as source input X by the classification task of X to Y, construct from
X → Z → Y information bottleneck RIB(θ), shown in calculating process such as formula (15,16):
RIB(θ)=I (Z, Y;θ)-βI(Z,X;θ) (15)
Wherein I (Z, Y;θ) indicate the mutual information between Y and Z.Our target is using mutual information as information content
Measurement, the distribution of study coding Z, so that the information content from X → Y is as few as possible, forced model allows most important information to flow through letter
Bottleneck is ceased, and ignores the information unrelated with task, to realize information de-redundancy and denoising.
For abstract task, list entries x is given, coding/decoding model is by calculating probability Pθ(y | x) generate abstract
Sequences y, wherein θ is the parameter of model, such as weight matrix W and offset b.Shown in specific formula such as formula (17).
Wherein, y < t=(y1,y2,…yt-1) indicate to have decoded all words before t moment.As shown in formula (18), model
By maximizing the log-likelihood function for generating abstract probability come learning model parameter θ.
Loss=-logPθ(y|x) (18)
Therefore, in traditional coding/decoding model, we introduce centre of the information bottleneck z=f (x, y < t) as coding
Characterization constructs the loss from intermediate characterization z to output sequence y, as the intersection entropy loss of classification, calculation formula such as formula (19)
It is shown.
Constraint is added simultaneously, it is desirable that PθThe distribution of (z | x) and the KL divergence (Kullback-of standardized normal distribution Q (z)
Leibler divergence) it is small as far as possible, after VIB is added, shown in training loss function such as formula (20):
Wherein λ is hyper parameter, we are set as 1e-3.
The design of this preferred embodiment introduces variation information bottleneck to be compressed and be denoised to data, advantageously reduces involvement
The contextual information of different levels can bring the redundancy and noise of information.
Step7, in order to verify effect of the invention, experimental data set introduced below, evaluation index, the detailed ginseng of experiment
The benchmark model of number setting and comparison, and experimental result is analyzed and discussed.
Experiment is made using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) value
For the evaluation index of model.ROUGE is to be made a summary by a kind of evaluation index of Lin et al. autoabstract proposed based on generating
The quality of abstract is evaluated with n-gram word group (n-gram) co-occurrence information in canonical reference abstract.
Wherein, n-gram indicates n-gram word, and { Gold } indicates canonical reference abstract, Countmatch(n-gram) model is indicated
The n-gram phrase number of co-occurrence in abstract and canonical reference abstract is generated, Count (n-gram) is indicated in canonical reference abstract
The n-gram phrase number of appearance.The present invention calculates ROUGE value using pyrouge script, finally selects Rouge-1
(unigram), Rouge-2 (bigram), Rouge-L (longest common subsequence) value is as model performance
Evaluation index.
Encoder and decoder select 3 layers of LSTM, and encoder is two-way LSTM, and decoder is unidirectional LSTM.It compiles
The hidden state of code device and decoder is disposed as 512.In order to reduce the parameter of model, it is total that encoder and decoder are arranged in we
Enjoy word embeding layer.Word insertion dimension is set as 512, and the present invention does not use Word2vec, the pre-training such as Glove, Bert word to
Amount, but random initializtion is carried out to word embeding layer.The size that the vocabulary of codec is arranged in the present invention is 50k, unregistered word
It is substituted using UNK.Other parameter settings are as follows: dropout 0.3, and optimizer selects Adam, and batch size is set as 64.
For the purposes of improving the generation quality of abstract, the present invention uses Beam Search strategy, Beam Size in the mode inference stage
It is set as 12.
The present invention chooses following 6 models as benchmark model, and the training data and test data of all benchmark models are equal
It is identical as the present invention.
ABS: text snippet is generated using based on the encoder of convolutional neural networks (CNN) and NNLM decoder.
ABS+: it is based on ABS model, is finely adjusted using DUC2003 data the set pair analysis model, performance is in DUC2004 data
It further gets a promotion on collection.
RFs2s: encoder and decoder are all made of GRU, and part of speech label, name entity indicia have been merged in encoder input
Etc. linguistic features.
CAs2s: codec is all realized by convolutional neural networks, and linear door control unit is added in convolution process
(Gated Linear Unit, GLU), the optimisation strategies such as multistep attention.
SEASS: it in the coding/decoding model of traditional attention mechanism, proposes to increase selectivity gate net in coding side
Network controls flowing of the information from coding side to decoding end, to realize the purification of encoded information.
CGU: it is similar to SEASS, it proposes to pass through self-attention and Inception convolution network
Optimized Coding Based device, the global information characterization of construction source input.
Our_s2s: the coding/decoding model that the present invention realizes.
Table 1 lists the Rouge-1 of model and benchmark model of the present invention on Gigaword test set, Rouge-2 and
The F1 value comparison result of Rouge-L.Wherein Our_s2s is the coding/decoding model with attention that the present invention realizes, Inner_
S2s and Cross_s2s, which is illustrated respectively on the basis of Our_s2s, increases syncretizing mechanism and cross-layer syncretizing mechanism in layer, Beam and
Greedy is indicated in test phase using Beam Search strategy or greedy search strategy.
1 Gigaword test set experimental result of table compares
Model | RG-1(F1) | RG-2(F1) | RG-L(F1) |
ABS(Beam) | 29.55 | 11.2 | 26.42 |
ABS+(Beam) | 29.76 | 11.88 | 26.96 |
RFs2s(Beam) | 32.67 | 15.59 | 30.64 |
CAs2s(Beam) | 33.78 | 15.97 | 31.15 |
SEASS(Greedy) | 35.48 | 16.50 | 32.93 |
SEASS(Beam) | 36.15 | 17.54 | 33.63 |
CGU(Beam) | 36.31 | 18.00 | 33.82 |
Our_s2s(Beam) | 33.62 | 16.35 | 31.34 |
Inner_s2s(Greedy) | 36.05 | 17.18 | 33.47 |
Inner_s2s(Beam) | 36.52 | 17.75 | 33.81 |
Cross_s2s(Greedy) | 36.23 | 17.19 | 33.71 |
Cross_s2s(Beam) | 36.97 | 18.36 | 34.35 |
As it can be seen from table 1 Inner_s2s proposed by the present invention, which compares benchmark model with Cross_s2s, to be had centainly
The promotion of degree, especially with Beam Search search strategy Cross_s2s model at RG-1, RG-2 and RG-L tri-
Optimum performance is achieved in index.It can also be seen that Cross_s2s is searched for compared to Inner_s2s in Greedy and Beam
Under strategy, model performance is all more preferable.
In order to further verify the generalization ability of model, the present invention tests on DUC2004 data set, experiment knot
Fruit is as shown in table 2.DUC2004 data set requires to generate the fixed abstract (75bytes) of length, with research work phase before
With [1,7,18], the present invention is arranged the length for generating and making a summary and is fixed as 18 words, to meet the needs of shortest length.DUC2004
Data set generally uses recall rate rather than evaluation index of the F1 value as model performance.The each former sentence pair of DUC2004 data set is answered
Four artificial abstracts are made a summary as standards, therefore the present invention verifies respectively on four standards abstracts, and are tested with four times
The average value of result is demonstrate,proved as evaluation result.
2 DUC2004 test set experimental result of table compares
Model | RG-1(R) | RG-2(R) | RG-L(R) |
ABS | 26.55 | 7.06 | 22.05 |
ABS+ | 28.18 | 8.49 | 23.81 |
RFs2s | 28.35 | 9.46 | 24.59 |
CAs2s | 28.97 | 8.26 | 24.06 |
SEASS | 29.21 | 9.56 | 25.51 |
Inner_s2s | 30.29 | 13.24 | 27.94 |
Cross_s2s | 30.14 | 13.05 | 27.85 |
From table 2 it can be seen that Inner_s2s and Cross_s2s model performance proposed by the present invention is close, but in RG-
The Recall value of 1, RG-2 and RG-L, tri- indexs has been more than benchmark model.Especially compared with ABS+, although its model is sharp
Tuning is carried out with DUC2003 data set, but Inner_s2s model proposed by the present invention is still in RG-1, RG-2 and RG-L
On be respectively increased 2.11,4.75 and 4.13.Compared with current optimal models SEASS, RG-2 index improves nearly 3 hundred
Branch.
The present invention is directed to production text snippet, under the encoding and decoding frame based on attention, proposes based on level interaction
Attention mechanism.Encoder multilayer contextual information is extracted to instruct decoding process by attention mechanism, while passing through introducing
Variation information bottleneck constrains information, to improve the quality of production text snippet.
Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to upper
Embodiment is stated, within the knowledge of a person skilled in the art, present inventive concept can also not departed from
Under the premise of various changes can be made.
Claims (7)
1. the text snippet method based on level interaction attention, it is characterised in that: the text based on level interaction attention
Specific step is as follows for this method of abstracting:
Step1, use English data set Gigaword as training set, data set pre-process using pretreatment script
To training set and development set, each training sample includes a pair of of input text and abstract sentence;
Step2, encoder encode training set using two-way LSTM, and the number of plies is set as three layers;
Step3, decoder use unidirectional LSTM network, input sentence to be decoded and calculate each layer context vector;
Step4, for multilayer coding/decoding model, codec includes multilayer LSTM, falls into a trap in each layer of LSTM and counts layer in and work as
Hidden state representation between front layer, so that the context vector on upper layer is fused to current layer;
Step5, the output of each layer context vector and current layer with characteristic information is spliced, obtains the solution of current layer
The code hidden state of device;
Step6, the contextual information for incorporating different levels can bring the redundancy and noise of information, utilize variation information bottleneck logarithm
According to being compressed and denoised.
2. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step
The specific steps of Step1 are as follows: data are standardized, including all words of data set all turn small letter, by all numbers
Word replaces with #, and the word by frequency of occurrence in corpus less than 5 times replaces with UNK mark;A part of data are selected from development set
Test set is used as after being removed and screening.
3. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step
The specific steps of Step2: encoder uses two-way shot and long term memory network Bi-Directional LSTM, BILSTM, BILSTM
Including forward and backward LSTM, forward direction LSTM reads list entries from left to right and obtains forward coding vector, and backward LSTM from
The left reading sequence of dextrad obtain after to coding vector, forward and backward coding vector is finally spliced into the vector for obtaining list entries
Characterization.
4. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that:
The specific steps of the step Step3 are as follows:
Step3.1, decoder use unidirectional LSTM network, read the state vector initialization of encoder last moment, then root
Vector is characterized according to Input context, abstract sequence is generated by word, wherein the length for generating abstract is necessarily less than equal to list entries
Length;
Step3.2, in decoding, the word that decoder reads last moment target word is embedded in vector, the hidden state of last moment to
Amount and the context vector at current time generate the hidden state vector at the moment;
Step3.3, attention mechanism is introduced, when being calculated current according to the hidden state of last moment decoder, coding vector
The context vector at quarter;
Step3.4, the output vector that current time is then calculated by current time context vector and hidden state vector,
And then the output vector at the current time output probability on goal-selling vocabulary is calculated.
5. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step
The specific steps of Step4 are as follows:
Step4.1, the input of the context vector and hidden state vector on upper layer as current layer is merged;
Step4.2, the input feeding LSTM of current layer is obtained into the output of current layer network;
Target output is calculated on vocabulary in the output vector of the last layer of Step4.3, calculating multilayer decoder network
Probability distribution.
6. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step
The specific steps of Step5 are as follows:
Step5.1, in network current layer, splice respectively to each layer context vector is obtained, obtain cross-layer fusion up and down
The literary hidden state of vector sum decoder, it comprises the characteristic informations of encoder different levels;
Step5.2, output vector is calculated using the hidden state of decoder and context vector, so can calculate output to
Measure the output probability on vocabulary.
7. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step
The specific steps of Step6 are as follows:
Step6.1, given list entries, coding/decoding model generate abstract sequence by calculating probability;
Step6.2, the log-likelihood function of abstract probability is generated by maximization come learning model parameter;
Step6.3, intermediate characterization of the information bottleneck as coding is introduced, constructs the loss from intermediate characterization to output sequence, makees
For the intersection entropy loss of classification;
Step6.4, constraint is added, it is desirable that the distribution of probability and the KL divergence Kullback-Leibler of standardized normal distribution
Divergence is small as far as possible.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910677195.6A CN110472238B (en) | 2019-07-25 | 2019-07-25 | Text summarization method based on hierarchical interaction attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910677195.6A CN110472238B (en) | 2019-07-25 | 2019-07-25 | Text summarization method based on hierarchical interaction attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110472238A true CN110472238A (en) | 2019-11-19 |
CN110472238B CN110472238B (en) | 2022-11-18 |
Family
ID=68509298
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910677195.6A Active CN110472238B (en) | 2019-07-25 | 2019-07-25 | Text summarization method based on hierarchical interaction attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472238B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061862A (en) * | 2019-12-16 | 2020-04-24 | 湖南大学 | Method for generating abstract based on attention mechanism |
CN111488440A (en) * | 2020-03-30 | 2020-08-04 | 华南理工大学 | Problem generation method based on multi-task combination |
CN111680151A (en) * | 2020-05-06 | 2020-09-18 | 华东师范大学 | Personalized commodity comment abstract generation method based on hierarchical transformer |
CN111723196A (en) * | 2020-05-21 | 2020-09-29 | 西北工业大学 | Single document abstract generation model construction method and device based on multi-task learning |
CN111782810A (en) * | 2020-06-30 | 2020-10-16 | 湖南大学 | Text abstract generation method based on theme enhancement |
CN111931518A (en) * | 2020-10-15 | 2020-11-13 | 北京金山数字娱乐科技有限公司 | Translation model training method and device |
CN111966820A (en) * | 2020-07-21 | 2020-11-20 | 西北工业大学 | Method and system for constructing and extracting generative abstract model |
CN112528598A (en) * | 2020-12-07 | 2021-03-19 | 上海交通大学 | Automatic text abstract evaluation method based on pre-training language model and information theory |
CN112632228A (en) * | 2020-12-30 | 2021-04-09 | 深圳供电局有限公司 | Text mining-based auxiliary bid evaluation method and system |
CN111538829B (en) * | 2020-04-27 | 2021-04-20 | 众能联合数字技术有限公司 | Novel extraction method for webpage text key content of engineering machinery rental scene |
CN112765345A (en) * | 2021-01-22 | 2021-05-07 | 重庆邮电大学 | Text abstract automatic generation method and system fusing pre-training model |
CN112836040A (en) * | 2021-01-31 | 2021-05-25 | 云知声智能科技股份有限公司 | Multi-language abstract generation method and device, electronic equipment and computer readable medium |
CN113139468A (en) * | 2021-04-24 | 2021-07-20 | 西安交通大学 | Video abstract generation method fusing local target features and global features |
CN113434683A (en) * | 2021-06-30 | 2021-09-24 | 平安科技(深圳)有限公司 | Text classification method, device, medium and electronic equipment |
CN114154493A (en) * | 2022-01-28 | 2022-03-08 | 北京芯盾时代科技有限公司 | Short message category identification method and device |
CN118069833A (en) * | 2024-04-17 | 2024-05-24 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Hierarchical abstract generation method, device, equipment and readable storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180261214A1 (en) * | 2017-02-06 | 2018-09-13 | Facebook, Inc. | Sequence-to-sequence convolutional architecture |
CN108628823A (en) * | 2018-03-14 | 2018-10-09 | 中山大学 | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training |
CN108647214A (en) * | 2018-03-29 | 2018-10-12 | 中国科学院自动化研究所 | Coding/decoding method based on deep-neural-network translation model |
CN108804677A (en) * | 2018-06-12 | 2018-11-13 | 合肥工业大学 | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism |
CN108897740A (en) * | 2018-05-07 | 2018-11-27 | 内蒙古工业大学 | A kind of illiteracy Chinese machine translation method based on confrontation neural network |
CN108959246A (en) * | 2018-06-12 | 2018-12-07 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on improved attention mechanism |
CN109145105A (en) * | 2018-07-26 | 2019-01-04 | 福州大学 | A kind of text snippet model generation algorithm of fuse information selection and semantic association |
CN109241536A (en) * | 2018-09-21 | 2019-01-18 | 浙江大学 | It is a kind of based on deep learning from the sentence sort method of attention mechanism |
WO2019028269A2 (en) * | 2017-08-02 | 2019-02-07 | Strong Force Iot Portfolio 2016, Llc | Methods and systems for detection in an industrial internet of things data collection environment with large data sets |
WO2019025601A1 (en) * | 2017-08-03 | 2019-02-07 | Koninklijke Philips N.V. | Hierarchical neural networks with granularized attention |
CN109408633A (en) * | 2018-09-17 | 2019-03-01 | 中山大学 | A kind of construction method of the Recognition with Recurrent Neural Network model of multilayer attention mechanism |
CN109858032A (en) * | 2019-02-14 | 2019-06-07 | 程淑玉 | Merge more granularity sentences interaction natural language inference model of Attention mechanism |
CN109918510A (en) * | 2019-03-26 | 2019-06-21 | 中国科学技术大学 | Cross-cutting keyword extracting method |
CN109948166A (en) * | 2019-03-25 | 2019-06-28 | 腾讯科技(深圳)有限公司 | Text interpretation method, device, storage medium and computer equipment |
CN110032638A (en) * | 2019-04-19 | 2019-07-19 | 中山大学 | A kind of production abstract extraction method based on coder-decoder |
-
2019
- 2019-07-25 CN CN201910677195.6A patent/CN110472238B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180261214A1 (en) * | 2017-02-06 | 2018-09-13 | Facebook, Inc. | Sequence-to-sequence convolutional architecture |
WO2019028269A2 (en) * | 2017-08-02 | 2019-02-07 | Strong Force Iot Portfolio 2016, Llc | Methods and systems for detection in an industrial internet of things data collection environment with large data sets |
WO2019025601A1 (en) * | 2017-08-03 | 2019-02-07 | Koninklijke Philips N.V. | Hierarchical neural networks with granularized attention |
CN108628823A (en) * | 2018-03-14 | 2018-10-09 | 中山大学 | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training |
CN108647214A (en) * | 2018-03-29 | 2018-10-12 | 中国科学院自动化研究所 | Coding/decoding method based on deep-neural-network translation model |
CN108897740A (en) * | 2018-05-07 | 2018-11-27 | 内蒙古工业大学 | A kind of illiteracy Chinese machine translation method based on confrontation neural network |
CN108804677A (en) * | 2018-06-12 | 2018-11-13 | 合肥工业大学 | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism |
CN108959246A (en) * | 2018-06-12 | 2018-12-07 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on improved attention mechanism |
CN109145105A (en) * | 2018-07-26 | 2019-01-04 | 福州大学 | A kind of text snippet model generation algorithm of fuse information selection and semantic association |
CN109408633A (en) * | 2018-09-17 | 2019-03-01 | 中山大学 | A kind of construction method of the Recognition with Recurrent Neural Network model of multilayer attention mechanism |
CN109241536A (en) * | 2018-09-21 | 2019-01-18 | 浙江大学 | It is a kind of based on deep learning from the sentence sort method of attention mechanism |
CN109858032A (en) * | 2019-02-14 | 2019-06-07 | 程淑玉 | Merge more granularity sentences interaction natural language inference model of Attention mechanism |
CN109948166A (en) * | 2019-03-25 | 2019-06-28 | 腾讯科技(深圳)有限公司 | Text interpretation method, device, storage medium and computer equipment |
CN109918510A (en) * | 2019-03-26 | 2019-06-21 | 中国科学技术大学 | Cross-cutting keyword extracting method |
CN110032638A (en) * | 2019-04-19 | 2019-07-19 | 中山大学 | A kind of production abstract extraction method based on coder-decoder |
Non-Patent Citations (3)
Title |
---|
SINA AHMADI: "Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction", 《互联网检索ARXIV.ORG/PDF/1810.00660.PDF》 * |
汪琪等: "基于注意力卷积的神经机器翻译", 《计算机科学》 * |
陈龙杰等: "基于多注意力尺度特征融合的图像描述生成算法", 《计算机应用》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061862A (en) * | 2019-12-16 | 2020-04-24 | 湖南大学 | Method for generating abstract based on attention mechanism |
CN111488440A (en) * | 2020-03-30 | 2020-08-04 | 华南理工大学 | Problem generation method based on multi-task combination |
CN111488440B (en) * | 2020-03-30 | 2024-02-13 | 华南理工大学 | Problem generation method based on multi-task combination |
CN111538829B (en) * | 2020-04-27 | 2021-04-20 | 众能联合数字技术有限公司 | Novel extraction method for webpage text key content of engineering machinery rental scene |
CN111680151A (en) * | 2020-05-06 | 2020-09-18 | 华东师范大学 | Personalized commodity comment abstract generation method based on hierarchical transformer |
CN111723196A (en) * | 2020-05-21 | 2020-09-29 | 西北工业大学 | Single document abstract generation model construction method and device based on multi-task learning |
CN111782810A (en) * | 2020-06-30 | 2020-10-16 | 湖南大学 | Text abstract generation method based on theme enhancement |
CN111966820A (en) * | 2020-07-21 | 2020-11-20 | 西北工业大学 | Method and system for constructing and extracting generative abstract model |
CN111931518A (en) * | 2020-10-15 | 2020-11-13 | 北京金山数字娱乐科技有限公司 | Translation model training method and device |
CN112528598B (en) * | 2020-12-07 | 2022-04-05 | 上海交通大学 | Automatic text abstract evaluation method based on pre-training language model and information theory |
CN112528598A (en) * | 2020-12-07 | 2021-03-19 | 上海交通大学 | Automatic text abstract evaluation method based on pre-training language model and information theory |
CN112632228A (en) * | 2020-12-30 | 2021-04-09 | 深圳供电局有限公司 | Text mining-based auxiliary bid evaluation method and system |
CN112765345A (en) * | 2021-01-22 | 2021-05-07 | 重庆邮电大学 | Text abstract automatic generation method and system fusing pre-training model |
CN112836040A (en) * | 2021-01-31 | 2021-05-25 | 云知声智能科技股份有限公司 | Multi-language abstract generation method and device, electronic equipment and computer readable medium |
CN112836040B (en) * | 2021-01-31 | 2022-09-23 | 云知声智能科技股份有限公司 | Method and device for generating multilingual abstract, electronic equipment and computer readable medium |
CN113139468A (en) * | 2021-04-24 | 2021-07-20 | 西安交通大学 | Video abstract generation method fusing local target features and global features |
CN113434683A (en) * | 2021-06-30 | 2021-09-24 | 平安科技(深圳)有限公司 | Text classification method, device, medium and electronic equipment |
CN113434683B (en) * | 2021-06-30 | 2023-08-29 | 平安科技(深圳)有限公司 | Text classification method, device, medium and electronic equipment |
CN114154493A (en) * | 2022-01-28 | 2022-03-08 | 北京芯盾时代科技有限公司 | Short message category identification method and device |
CN114154493B (en) * | 2022-01-28 | 2022-06-28 | 北京芯盾时代科技有限公司 | Short message category identification method and device |
CN118069833A (en) * | 2024-04-17 | 2024-05-24 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Hierarchical abstract generation method, device, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110472238B (en) | 2022-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110472238A (en) | Text snippet method based on level interaction attention | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN109522411A (en) | A kind of writing householder method neural network based | |
CN108519890A (en) | A kind of robustness code abstraction generating method based on from attention mechanism | |
CN110929030A (en) | Text abstract and emotion classification combined training method | |
CN110427616B (en) | Text emotion analysis method based on deep learning | |
Konstas et al. | Inducing document plans for concept-to-text generation | |
CN112183094B (en) | Chinese grammar debugging method and system based on multiple text features | |
CN110717843A (en) | Reusable law strip recommendation framework | |
CN111125333B (en) | Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism | |
CN113435211B (en) | Text implicit emotion analysis method combined with external knowledge | |
CN111078866A (en) | Chinese text abstract generation method based on sequence-to-sequence model | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN110083824A (en) | A kind of Laotian segmenting method based on Multi-Model Combination neural network | |
CN115310448A (en) | Chinese named entity recognition method based on combining bert and word vector | |
CN115481219A (en) | Electricity selling company evaluation emotion classification method based on grammar sequence embedded model | |
CN115545033A (en) | Chinese field text named entity recognition method fusing vocabulary category representation | |
CN114117041B (en) | Attribute-level emotion analysis method based on specific attribute word context modeling | |
CN112287687B (en) | Case tendency extraction type summarization method based on case attribute perception | |
Wu et al. | Context-aware style learning and content recovery networks for neural style transfer | |
CN117877460A (en) | Speech synthesis method, device, speech synthesis model training method and device | |
Vaessen et al. | The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning | |
CN114238649A (en) | Common sense concept enhanced language model pre-training method | |
CN113468366A (en) | Music automatic labeling method | |
CN113935308A (en) | Method and system for automatically generating text abstract facing field of geoscience |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |