CN110472238A - Text snippet method based on level interaction attention - Google Patents

Text snippet method based on level interaction attention Download PDF

Info

Publication number
CN110472238A
CN110472238A CN201910677195.6A CN201910677195A CN110472238A CN 110472238 A CN110472238 A CN 110472238A CN 201910677195 A CN201910677195 A CN 201910677195A CN 110472238 A CN110472238 A CN 110472238A
Authority
CN
China
Prior art keywords
vector
layer
lstm
output
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910677195.6A
Other languages
Chinese (zh)
Other versions
CN110472238B (en
Inventor
余正涛
周高峰
黄于欣
高盛祥
郭军军
王振晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201910677195.6A priority Critical patent/CN110472238B/en
Publication of CN110472238A publication Critical patent/CN110472238A/en
Application granted granted Critical
Publication of CN110472238B publication Critical patent/CN110472238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to the text snippet methods based on level interaction attention, belong to natural language processing technique field.The present invention extracts the feature breath of encoder different levels by level interaction attention to instruct the generation of abstract.While in order to avoid utilizing variation information bottleneck compressed data noise because introducing different levels feature due to bring information redundancy problem.The present invention is directed to production text snippet, under the encoding and decoding frame based on attention, encoder multilayer contextual information is extracted to instruct decoding process by attention mechanism, while information is constrained by introducing variation information bottleneck, to improve the quality of production text snippet.The experimental results showed that this method can significantly improve performance of the encoding and decoding frame in production abstract task.

Description

Text snippet method based on level interaction attention
Technical field
The present invention relates to the text snippet methods based on level interaction attention, belong to natural language processing technique field.
Background technique
With the development of depth learning technology, production text snippet method becomes the hot spot studied instantly.Traditional Coding/decoding model based on attention mechanism usually only considers semantic table of the semantic information of encoder high level as context Sign, and have ignored the minutias such as the word level structure of low layer neural network acquisition.The present invention proposes a kind of based on level interaction note The multilayer feature of meaning power mechanism extracts with fusion method the feature for obtaining encoder different levels, while introducing in decoding end Variation information bottleneck is compressed and is denoised to fuse information, to generate higher-quality abstract.
Summary of the invention
The present invention provides the text snippet methods based on level interaction attention, can obtain encoder different levels Feature, while introducing variation information bottleneck in decoding end and fuse information is compressed and denoised, to generate higher-quality Abstract is not concerned only with encoder higher level of abstraction feature when generating abstract, while extracting the detailed information of low layer to improve abstract Generate quality.
The technical scheme is that the text snippet method based on level interaction attention, described based on level interaction Specific step is as follows for the text snippet method of attention:
Step1, use text snippet field English data set Gigaword as training set, using pretreatment script pair Data set is pre-processed, and 3,800,000 and 18.9 ten thousand training set and development set are respectively obtained, and each training sample includes a pair Input text and abstract sentence;
As a preferred solution of the present invention, the specific steps of the step Step1 are as follows: data are standardized, All turn small letter including all words of data set, all numbers are replaced with into #, the word by frequency of occurrence in corpus less than 5 times Replace with UNK mark etc.;As test set after selecting a part of data to be removed and screen in development set.
Step2, training set is encoded using two-way LSTM, the number of plies is set as three layers;Encoder uses two-way length Phase memory network Bi-Directional LSTM, BILSTM, BILSTM include forward and backward LSTM, and forward direction LSTM is from left-hand Right reading list entries obtains forward coding vector, and backward LSTM is read after sequence obtains from right to left to coding vector, most The vector that forward and backward coding vector splices to obtain list entries is characterized afterwards.
Step3, decoder use unidirectional LSTM network, input sentence to be decoded and calculate each layer context vector: decoding Device uses unidirectional LSTM network, reads the state vector initialization of encoder last moment, is then characterized according to Input context Vector generates abstract sequence by word, wherein the length for generating abstract is necessarily less than the length equal to list entries;In decoding, The word that decoder reads last moment target word is embedded in vector, the hidden state vector of last moment and the context at current time Vector generates the hidden state vector at the moment;Introduce attention mechanism, according to the hidden state of last moment decoder, encode to The context vector at current time is calculated in amount;Then it is calculated by current time context vector and hidden state vector To the output vector at current time, and then the output vector at the current time output probability on goal-selling vocabulary is calculated.
Step4, for multilayer coding/decoding model, codec includes multilayer LSTM, falls into a trap and counts in each layer of LSTM Hidden state representation between layer and current layer, so that the context vector on upper layer is fused to current layer;
As a preferred solution of the present invention, the specific steps of the step Step4 are as follows:
Step4.1, the input of the context vector and hidden state vector on upper layer as current layer is merged;
Step4.2, the input feeding LSTM of current layer is obtained into the output of current layer network;
Target output is calculated in word in the output vector of the last layer of Step4.3, calculating multilayer decoder network Probability distribution on table.
Step5, the output of each layer context vector and current layer with characteristic information is spliced, is obtained current The hidden state of decoder of layer;
As a preferred solution of the present invention, the specific steps of the step Step5 are as follows:
Step5.1, in network current layer, splice respectively to each layer context vector is obtained, obtain cross-layer fusion Context vector and the hidden state of decoder, it comprises the characteristic informations of encoder different levels;
Step5.2, output vector is calculated using the hidden state of decoder and context vector, and then can calculates Output probability of the output vector on vocabulary.
Step6, the contextual information for incorporating different levels can bring the redundancy and noise of information, utilize variation information bottle Neck is compressed and is denoised to data.
As a preferred solution of the present invention, the specific steps of the step Step6 are as follows:
Step6.1, given list entries, coding/decoding model generate abstract sequence by calculating probability;
Step6.2, the log-likelihood function of abstract probability is generated by maximization come learning model parameter;
Step6.3, intermediate characterization of the information bottleneck as coding is introduced, constructs the damage from intermediate characterization to output sequence It loses, the intersection entropy loss as classification;
Step6.4, constraint is added, it is desirable that the distribution of probability and the KL divergence Kullback-Leibler of standardized normal distribution Divergence is small as far as possible.
The beneficial effects of the present invention are:
1, the present invention proposes to obtain different layers by attention based on the coding/decoding model of level interaction attention mechanism Secondary semantic information improves the generation quality of abstract.
2, present invention firstly provides variation information bottleneck is applied to summarization generation task, data are compressed and is gone It makes an uproar, the contextual information for advantageously reducing involvement different levels can bring the redundancy and noise of information.
3, the present invention proposes that a kind of level interaction attention mechanism extracts encoder different levels feature, makes a summary generating When be not concerned only with encoder higher level of abstraction feature, while extracting the detailed information of low layer to improve summarization generation quality.
Detailed description of the invention
Fig. 1 is the flow chart in the present invention;
Fig. 2 is the encoding and decoding frame diagram proposed by the present invention based on attention;
Fig. 3 is syncretizing mechanism figure in layer proposed by the present invention;
Fig. 4 is cross-layer syncretizing mechanism figure proposed by the present invention.
Specific embodiment
Embodiment 1: as Figure 1-Figure 4, described to be based on level based on the text snippet method of level interaction attention Specific step is as follows for the text snippet method of interaction attention:
Step1, use English data set Gigaword as training set, data set is carried out using pretreatment script pre- Processing, respectively obtains 3,800,000 and 18.9 ten thousand training set and development set, and each training sample includes a pair of of input text and plucks Want sentence;
Step2, encoder encode training set using two-way LSTM, and the number of plies is set as three layers;
Step3, decoder use unidirectional LSTM network, input sentence to be decoded and calculate each layer context vector;
Step4, for multilayer coding/decoding model, codec includes multilayer LSTM, falls into a trap in each layer of LSTM and counts layer in Hidden state representation between current layer, so that the context vector on upper layer is fused to current layer;
Step5, the output of each layer context vector and current layer with characteristic information is spliced, is obtained current The hidden state of decoder of layer;
Step6, the contextual information for incorporating different levels can bring the redundancy and noise of information, utilize variation information bottle Neck (Variational Information Bottleneck, VIB) is compressed and is denoised to data.
As a preferred solution of the present invention, the specific steps of the step Step1 are as follows: data are standardized, All turn small letter including all words of data set, all numbers are replaced with into #, the word by frequency of occurrence in corpus less than 5 times Replace with UNK mark etc..8000 are randomly choosed from 18.9 ten thousand development sets and is used as development set, select 2000 datas as survey Examination collection.Sentence of the test set Central Plains text size less than 5 is removed, finally screening obtains 1951 datas as test set.In order to The generalization ability of model is verified, simultaneous selection DUC2004 of the present invention is as test set.DUC2004 data set only includes 500 Text, the corresponding 4 standards abstract sentence of each input text.
The design of this preferred embodiment is important component of the invention, and predominantly the present invention collects corpus process, is this hair Bright identification events sequential relationship provides data supporting.
As a preferred solution of the present invention, the specific steps of the step Step2 are as follows:
Inventive encoder uses two-way shot and long term memory network (Bi-Directional LSTM, BILSTM), with LSTM is compared, and BILSTM includes forward and backward LSTM, forward direction LSTM read from left to right list entries obtain forward coding to AmountAnd backward LSTM is read after sequence obtains from right to left to coding vectorFollowing institute Show.
Wherein,WithTo LSTM and backward LSTM network before respectively indicating, finally forward and backward is compiled Code vector splices the vector characterization for obtaining list entries
The design of this preferred embodiment is important component of the invention, and the process predominantly of the invention encoded utilizes LSTM Modeling is carried out to sentence to have a problem that, can not exactly encode information from back to front.In more fine-grained classification, such as For the commendation of strong degree, the commendation of weak degree, neutrality, the derogatory sense of weak degree, Qiang Chengdu derogatory sense five classification tasks need Pay attention to emotion word, degree word, the interaction between negative word.Two-way semantic dependency can be preferably captured by BiLSTM.
As a preferred solution of the present invention, the specific steps of the step Step3 are as follows:
Step3.1, decoder use unidirectional LSTM network, and wherein s indicates sequence starting.
t0Moment, decoder read s and the state vector of encoder last moment to predict y1Output probability;Then root Vector is characterized according to Input context, abstract sequence is generated by word, wherein the length for generating abstract is necessarily less than equal to list entries Length;
Step3.2, moment t is being decoded, the word that decoder reads t-1 moment target word is embedded in vector wt-1, hidden state to Measure st-1With context vector ctGenerate the hidden state vector s of t momentt, as shown in formula (3):
st=LSTM (wt-1,st-1,ct) (3)
Step3.3, as shown in Fig. 2, decoder introduce attention mechanism, according to the hidden state s of t-1 moment decodert-1、 The context vector c of t moment is calculated in coding vector ht.Shown in detailed process such as formula (4,5,6):
Step3.4, then pass through t moment context vector ctWith hidden state vector stBe calculated the output of t moment to Measure pt, and then p is calculatedtThe output probability p on goal-selling vocabularyvocab,t.It is specific to calculate as shown in formula (7,8):
pt=tanh (Wm([st;ct])+bm) (7)
Pvocab,t=softmax (Wppt+bp) (8)
The design of this preferred embodiment is important component of the invention, decoded process predominantly of the invention.LSTM is avoided Long-rang dependence problem.For LSTM, " remembeing " information is a kind of behavior of default for a long time, rather than is difficult to learn Thing.
As a preferred solution of the present invention, the specific steps of the step Step4 are as follows:
Syncretizing mechanism in layer:
Syncretizing mechanism (Inner-Layer Merge) is intended to incorporate upper layer context vector the coding of current layer in layer In, to realize the fusion of multilevel encoder information.
Step4.1, the input of k-1 layers of context vector and hidden state vector as kth layer is merged.It is specific to calculate public affairs Shown in formula such as formula (9,10,11).
Wherein,For the context vector of the k-1 layer of acquisition,For -1 layer of kth of hidden state vector.Pass through calculating Obtain the input vector of kth layer
Step4.2, then being sent to kth layer LSTM obtains the output of kth layer network
The output vector p of the last layer of Step4.3, calculating multilayer decoder networkt, it is defeated that target is finally calculated Probability distribution P on vocabulary outvocab
The design of this preferred embodiment is important component of the invention, this based on the more of level interaction attention mechanism Layer feature extraction and fusion method obtain the features of encoder different levels, solves traditional based on attention mechanism Coding/decoding model usually only considers characterizing semantics of the semantic information of encoder high level as context, and has ignored low layer mind The minutias such as the word level structure obtained through network.To generate higher-quality abstract.
As a preferred solution of the present invention, the specific steps of the step Step5 are as follows:
Cross-layer syncretizing mechanism:
Cross-layer syncretizing mechanism (Cross-Layer Merge) melts in multilayer context vector of the last layer to acquisition It closes, it is specific as shown in Figure 4.
Step5.1, at r layers of network, to obtaining each layer context vectorWithSpliced respectively, obtains cross-layer fusion Context vector ctWith the hidden state s of decodert, it comprises the characteristic informations of encoder different levels.
Step5.2, s is finally utilizedtAnd ctOutput vector p is calculatedt.Under specific formula such as formula (12,13,14):
pt=tanh (Wm([st;ct])+bm) (14)
P is finally calculatedtOutput probability P on vocabulary vocabt,vocab
The design of this preferred embodiment is important component of the invention, this based on the more of level interaction attention mechanism Layer feature extraction and fusion method obtain the features of encoder different levels, solves traditional based on attention mechanism Coding/decoding model usually only considers characterizing semantics of the semantic information of encoder high level as context, and has ignored low layer mind The minutias such as the word level structure obtained through network.To generate higher-quality abstract.
As a preferred solution of the present invention, the specific steps of the step Step6 are as follows:
Variation information bottleneck introduces intermediate characterization of the Z as source input X by the classification task of X to Y, construct from X → Z → Y information bottleneck RIB(θ), shown in calculating process such as formula (15,16):
RIB(θ)=I (Z, Y;θ)-βI(Z,X;θ) (15)
Wherein I (Z, Y;θ) indicate the mutual information between Y and Z.Our target is using mutual information as information content Measurement, the distribution of study coding Z, so that the information content from X → Y is as few as possible, forced model allows most important information to flow through letter Bottleneck is ceased, and ignores the information unrelated with task, to realize information de-redundancy and denoising.
For abstract task, list entries x is given, coding/decoding model is by calculating probability Pθ(y | x) generate abstract Sequences y, wherein θ is the parameter of model, such as weight matrix W and offset b.Shown in specific formula such as formula (17).
Wherein, y < t=(y1,y2,…yt-1) indicate to have decoded all words before t moment.As shown in formula (18), model By maximizing the log-likelihood function for generating abstract probability come learning model parameter θ.
Loss=-logPθ(y|x) (18)
Therefore, in traditional coding/decoding model, we introduce centre of the information bottleneck z=f (x, y < t) as coding Characterization constructs the loss from intermediate characterization z to output sequence y, as the intersection entropy loss of classification, calculation formula such as formula (19) It is shown.
Constraint is added simultaneously, it is desirable that PθThe distribution of (z | x) and the KL divergence (Kullback-of standardized normal distribution Q (z) Leibler divergence) it is small as far as possible, after VIB is added, shown in training loss function such as formula (20):
Wherein λ is hyper parameter, we are set as 1e-3.
The design of this preferred embodiment introduces variation information bottleneck to be compressed and be denoised to data, advantageously reduces involvement The contextual information of different levels can bring the redundancy and noise of information.
Step7, in order to verify effect of the invention, experimental data set introduced below, evaluation index, the detailed ginseng of experiment The benchmark model of number setting and comparison, and experimental result is analyzed and discussed.
Experiment is made using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) value For the evaluation index of model.ROUGE is to be made a summary by a kind of evaluation index of Lin et al. autoabstract proposed based on generating The quality of abstract is evaluated with n-gram word group (n-gram) co-occurrence information in canonical reference abstract.
Wherein, n-gram indicates n-gram word, and { Gold } indicates canonical reference abstract, Countmatch(n-gram) model is indicated The n-gram phrase number of co-occurrence in abstract and canonical reference abstract is generated, Count (n-gram) is indicated in canonical reference abstract The n-gram phrase number of appearance.The present invention calculates ROUGE value using pyrouge script, finally selects Rouge-1 (unigram), Rouge-2 (bigram), Rouge-L (longest common subsequence) value is as model performance Evaluation index.
Encoder and decoder select 3 layers of LSTM, and encoder is two-way LSTM, and decoder is unidirectional LSTM.It compiles The hidden state of code device and decoder is disposed as 512.In order to reduce the parameter of model, it is total that encoder and decoder are arranged in we Enjoy word embeding layer.Word insertion dimension is set as 512, and the present invention does not use Word2vec, the pre-training such as Glove, Bert word to Amount, but random initializtion is carried out to word embeding layer.The size that the vocabulary of codec is arranged in the present invention is 50k, unregistered word It is substituted using UNK.Other parameter settings are as follows: dropout 0.3, and optimizer selects Adam, and batch size is set as 64. For the purposes of improving the generation quality of abstract, the present invention uses Beam Search strategy, Beam Size in the mode inference stage It is set as 12.
The present invention chooses following 6 models as benchmark model, and the training data and test data of all benchmark models are equal It is identical as the present invention.
ABS: text snippet is generated using based on the encoder of convolutional neural networks (CNN) and NNLM decoder.
ABS+: it is based on ABS model, is finely adjusted using DUC2003 data the set pair analysis model, performance is in DUC2004 data It further gets a promotion on collection.
RFs2s: encoder and decoder are all made of GRU, and part of speech label, name entity indicia have been merged in encoder input Etc. linguistic features.
CAs2s: codec is all realized by convolutional neural networks, and linear door control unit is added in convolution process (Gated Linear Unit, GLU), the optimisation strategies such as multistep attention.
SEASS: it in the coding/decoding model of traditional attention mechanism, proposes to increase selectivity gate net in coding side Network controls flowing of the information from coding side to decoding end, to realize the purification of encoded information.
CGU: it is similar to SEASS, it proposes to pass through self-attention and Inception convolution network Optimized Coding Based device, the global information characterization of construction source input.
Our_s2s: the coding/decoding model that the present invention realizes.
Table 1 lists the Rouge-1 of model and benchmark model of the present invention on Gigaword test set, Rouge-2 and The F1 value comparison result of Rouge-L.Wherein Our_s2s is the coding/decoding model with attention that the present invention realizes, Inner_ S2s and Cross_s2s, which is illustrated respectively on the basis of Our_s2s, increases syncretizing mechanism and cross-layer syncretizing mechanism in layer, Beam and Greedy is indicated in test phase using Beam Search strategy or greedy search strategy.
1 Gigaword test set experimental result of table compares
Model RG-1(F1) RG-2(F1) RG-L(F1)
ABS(Beam) 29.55 11.2 26.42
ABS+(Beam) 29.76 11.88 26.96
RFs2s(Beam) 32.67 15.59 30.64
CAs2s(Beam) 33.78 15.97 31.15
SEASS(Greedy) 35.48 16.50 32.93
SEASS(Beam) 36.15 17.54 33.63
CGU(Beam) 36.31 18.00 33.82
Our_s2s(Beam) 33.62 16.35 31.34
Inner_s2s(Greedy) 36.05 17.18 33.47
Inner_s2s(Beam) 36.52 17.75 33.81
Cross_s2s(Greedy) 36.23 17.19 33.71
Cross_s2s(Beam) 36.97 18.36 34.35
As it can be seen from table 1 Inner_s2s proposed by the present invention, which compares benchmark model with Cross_s2s, to be had centainly The promotion of degree, especially with Beam Search search strategy Cross_s2s model at RG-1, RG-2 and RG-L tri- Optimum performance is achieved in index.It can also be seen that Cross_s2s is searched for compared to Inner_s2s in Greedy and Beam Under strategy, model performance is all more preferable.
In order to further verify the generalization ability of model, the present invention tests on DUC2004 data set, experiment knot Fruit is as shown in table 2.DUC2004 data set requires to generate the fixed abstract (75bytes) of length, with research work phase before With [1,7,18], the present invention is arranged the length for generating and making a summary and is fixed as 18 words, to meet the needs of shortest length.DUC2004 Data set generally uses recall rate rather than evaluation index of the F1 value as model performance.The each former sentence pair of DUC2004 data set is answered Four artificial abstracts are made a summary as standards, therefore the present invention verifies respectively on four standards abstracts, and are tested with four times The average value of result is demonstrate,proved as evaluation result.
2 DUC2004 test set experimental result of table compares
Model RG-1(R) RG-2(R) RG-L(R)
ABS 26.55 7.06 22.05
ABS+ 28.18 8.49 23.81
RFs2s 28.35 9.46 24.59
CAs2s 28.97 8.26 24.06
SEASS 29.21 9.56 25.51
Inner_s2s 30.29 13.24 27.94
Cross_s2s 30.14 13.05 27.85
From table 2 it can be seen that Inner_s2s and Cross_s2s model performance proposed by the present invention is close, but in RG- The Recall value of 1, RG-2 and RG-L, tri- indexs has been more than benchmark model.Especially compared with ABS+, although its model is sharp Tuning is carried out with DUC2003 data set, but Inner_s2s model proposed by the present invention is still in RG-1, RG-2 and RG-L On be respectively increased 2.11,4.75 and 4.13.Compared with current optimal models SEASS, RG-2 index improves nearly 3 hundred Branch.
The present invention is directed to production text snippet, under the encoding and decoding frame based on attention, proposes based on level interaction Attention mechanism.Encoder multilayer contextual information is extracted to instruct decoding process by attention mechanism, while passing through introducing Variation information bottleneck constrains information, to improve the quality of production text snippet.
Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to upper Embodiment is stated, within the knowledge of a person skilled in the art, present inventive concept can also not departed from Under the premise of various changes can be made.

Claims (7)

1. the text snippet method based on level interaction attention, it is characterised in that: the text based on level interaction attention Specific step is as follows for this method of abstracting:
Step1, use English data set Gigaword as training set, data set pre-process using pretreatment script To training set and development set, each training sample includes a pair of of input text and abstract sentence;
Step2, encoder encode training set using two-way LSTM, and the number of plies is set as three layers;
Step3, decoder use unidirectional LSTM network, input sentence to be decoded and calculate each layer context vector;
Step4, for multilayer coding/decoding model, codec includes multilayer LSTM, falls into a trap in each layer of LSTM and counts layer in and work as Hidden state representation between front layer, so that the context vector on upper layer is fused to current layer;
Step5, the output of each layer context vector and current layer with characteristic information is spliced, obtains the solution of current layer The code hidden state of device;
Step6, the contextual information for incorporating different levels can bring the redundancy and noise of information, utilize variation information bottleneck logarithm According to being compressed and denoised.
2. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step The specific steps of Step1 are as follows: data are standardized, including all words of data set all turn small letter, by all numbers Word replaces with #, and the word by frequency of occurrence in corpus less than 5 times replaces with UNK mark;A part of data are selected from development set Test set is used as after being removed and screening.
3. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step The specific steps of Step2: encoder uses two-way shot and long term memory network Bi-Directional LSTM, BILSTM, BILSTM Including forward and backward LSTM, forward direction LSTM reads list entries from left to right and obtains forward coding vector, and backward LSTM from The left reading sequence of dextrad obtain after to coding vector, forward and backward coding vector is finally spliced into the vector for obtaining list entries Characterization.
4. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that:
The specific steps of the step Step3 are as follows:
Step3.1, decoder use unidirectional LSTM network, read the state vector initialization of encoder last moment, then root Vector is characterized according to Input context, abstract sequence is generated by word, wherein the length for generating abstract is necessarily less than equal to list entries Length;
Step3.2, in decoding, the word that decoder reads last moment target word is embedded in vector, the hidden state of last moment to Amount and the context vector at current time generate the hidden state vector at the moment;
Step3.3, attention mechanism is introduced, when being calculated current according to the hidden state of last moment decoder, coding vector The context vector at quarter;
Step3.4, the output vector that current time is then calculated by current time context vector and hidden state vector, And then the output vector at the current time output probability on goal-selling vocabulary is calculated.
5. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step The specific steps of Step4 are as follows:
Step4.1, the input of the context vector and hidden state vector on upper layer as current layer is merged;
Step4.2, the input feeding LSTM of current layer is obtained into the output of current layer network;
Target output is calculated on vocabulary in the output vector of the last layer of Step4.3, calculating multilayer decoder network Probability distribution.
6. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step The specific steps of Step5 are as follows:
Step5.1, in network current layer, splice respectively to each layer context vector is obtained, obtain cross-layer fusion up and down The literary hidden state of vector sum decoder, it comprises the characteristic informations of encoder different levels;
Step5.2, output vector is calculated using the hidden state of decoder and context vector, so can calculate output to Measure the output probability on vocabulary.
7. the text snippet method according to claim 1 based on level interaction attention, it is characterised in that: the step The specific steps of Step6 are as follows:
Step6.1, given list entries, coding/decoding model generate abstract sequence by calculating probability;
Step6.2, the log-likelihood function of abstract probability is generated by maximization come learning model parameter;
Step6.3, intermediate characterization of the information bottleneck as coding is introduced, constructs the loss from intermediate characterization to output sequence, makees For the intersection entropy loss of classification;
Step6.4, constraint is added, it is desirable that the distribution of probability and the KL divergence Kullback-Leibler of standardized normal distribution Divergence is small as far as possible.
CN201910677195.6A 2019-07-25 2019-07-25 Text summarization method based on hierarchical interaction attention Active CN110472238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910677195.6A CN110472238B (en) 2019-07-25 2019-07-25 Text summarization method based on hierarchical interaction attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910677195.6A CN110472238B (en) 2019-07-25 2019-07-25 Text summarization method based on hierarchical interaction attention

Publications (2)

Publication Number Publication Date
CN110472238A true CN110472238A (en) 2019-11-19
CN110472238B CN110472238B (en) 2022-11-18

Family

ID=68509298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910677195.6A Active CN110472238B (en) 2019-07-25 2019-07-25 Text summarization method based on hierarchical interaction attention

Country Status (1)

Country Link
CN (1) CN110472238B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061862A (en) * 2019-12-16 2020-04-24 湖南大学 Method for generating abstract based on attention mechanism
CN111488440A (en) * 2020-03-30 2020-08-04 华南理工大学 Problem generation method based on multi-task combination
CN111680151A (en) * 2020-05-06 2020-09-18 华东师范大学 Personalized commodity comment abstract generation method based on hierarchical transformer
CN111723196A (en) * 2020-05-21 2020-09-29 西北工业大学 Single document abstract generation model construction method and device based on multi-task learning
CN111782810A (en) * 2020-06-30 2020-10-16 湖南大学 Text abstract generation method based on theme enhancement
CN111931518A (en) * 2020-10-15 2020-11-13 北京金山数字娱乐科技有限公司 Translation model training method and device
CN111966820A (en) * 2020-07-21 2020-11-20 西北工业大学 Method and system for constructing and extracting generative abstract model
CN112528598A (en) * 2020-12-07 2021-03-19 上海交通大学 Automatic text abstract evaluation method based on pre-training language model and information theory
CN112632228A (en) * 2020-12-30 2021-04-09 深圳供电局有限公司 Text mining-based auxiliary bid evaluation method and system
CN111538829B (en) * 2020-04-27 2021-04-20 众能联合数字技术有限公司 Novel extraction method for webpage text key content of engineering machinery rental scene
CN112765345A (en) * 2021-01-22 2021-05-07 重庆邮电大学 Text abstract automatic generation method and system fusing pre-training model
CN112836040A (en) * 2021-01-31 2021-05-25 云知声智能科技股份有限公司 Multi-language abstract generation method and device, electronic equipment and computer readable medium
CN113139468A (en) * 2021-04-24 2021-07-20 西安交通大学 Video abstract generation method fusing local target features and global features
CN113434683A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text classification method, device, medium and electronic equipment
CN114154493A (en) * 2022-01-28 2022-03-08 北京芯盾时代科技有限公司 Short message category identification method and device
CN118069833A (en) * 2024-04-17 2024-05-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Hierarchical abstract generation method, device, equipment and readable storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180261214A1 (en) * 2017-02-06 2018-09-13 Facebook, Inc. Sequence-to-sequence convolutional architecture
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
CN108647214A (en) * 2018-03-29 2018-10-12 中国科学院自动化研究所 Coding/decoding method based on deep-neural-network translation model
CN108804677A (en) * 2018-06-12 2018-11-13 合肥工业大学 In conjunction with the deep learning question classification method and system of multi-layer attention mechanism
CN108897740A (en) * 2018-05-07 2018-11-27 内蒙古工业大学 A kind of illiteracy Chinese machine translation method based on confrontation neural network
CN108959246A (en) * 2018-06-12 2018-12-07 北京慧闻科技发展有限公司 Answer selection method, device and electronic equipment based on improved attention mechanism
CN109145105A (en) * 2018-07-26 2019-01-04 福州大学 A kind of text snippet model generation algorithm of fuse information selection and semantic association
CN109241536A (en) * 2018-09-21 2019-01-18 浙江大学 It is a kind of based on deep learning from the sentence sort method of attention mechanism
WO2019028269A2 (en) * 2017-08-02 2019-02-07 Strong Force Iot Portfolio 2016, Llc Methods and systems for detection in an industrial internet of things data collection environment with large data sets
WO2019025601A1 (en) * 2017-08-03 2019-02-07 Koninklijke Philips N.V. Hierarchical neural networks with granularized attention
CN109408633A (en) * 2018-09-17 2019-03-01 中山大学 A kind of construction method of the Recognition with Recurrent Neural Network model of multilayer attention mechanism
CN109858032A (en) * 2019-02-14 2019-06-07 程淑玉 Merge more granularity sentences interaction natural language inference model of Attention mechanism
CN109918510A (en) * 2019-03-26 2019-06-21 中国科学技术大学 Cross-cutting keyword extracting method
CN109948166A (en) * 2019-03-25 2019-06-28 腾讯科技(深圳)有限公司 Text interpretation method, device, storage medium and computer equipment
CN110032638A (en) * 2019-04-19 2019-07-19 中山大学 A kind of production abstract extraction method based on coder-decoder

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180261214A1 (en) * 2017-02-06 2018-09-13 Facebook, Inc. Sequence-to-sequence convolutional architecture
WO2019028269A2 (en) * 2017-08-02 2019-02-07 Strong Force Iot Portfolio 2016, Llc Methods and systems for detection in an industrial internet of things data collection environment with large data sets
WO2019025601A1 (en) * 2017-08-03 2019-02-07 Koninklijke Philips N.V. Hierarchical neural networks with granularized attention
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
CN108647214A (en) * 2018-03-29 2018-10-12 中国科学院自动化研究所 Coding/decoding method based on deep-neural-network translation model
CN108897740A (en) * 2018-05-07 2018-11-27 内蒙古工业大学 A kind of illiteracy Chinese machine translation method based on confrontation neural network
CN108804677A (en) * 2018-06-12 2018-11-13 合肥工业大学 In conjunction with the deep learning question classification method and system of multi-layer attention mechanism
CN108959246A (en) * 2018-06-12 2018-12-07 北京慧闻科技发展有限公司 Answer selection method, device and electronic equipment based on improved attention mechanism
CN109145105A (en) * 2018-07-26 2019-01-04 福州大学 A kind of text snippet model generation algorithm of fuse information selection and semantic association
CN109408633A (en) * 2018-09-17 2019-03-01 中山大学 A kind of construction method of the Recognition with Recurrent Neural Network model of multilayer attention mechanism
CN109241536A (en) * 2018-09-21 2019-01-18 浙江大学 It is a kind of based on deep learning from the sentence sort method of attention mechanism
CN109858032A (en) * 2019-02-14 2019-06-07 程淑玉 Merge more granularity sentences interaction natural language inference model of Attention mechanism
CN109948166A (en) * 2019-03-25 2019-06-28 腾讯科技(深圳)有限公司 Text interpretation method, device, storage medium and computer equipment
CN109918510A (en) * 2019-03-26 2019-06-21 中国科学技术大学 Cross-cutting keyword extracting method
CN110032638A (en) * 2019-04-19 2019-07-19 中山大学 A kind of production abstract extraction method based on coder-decoder

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SINA AHMADI: "Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction", 《互联网检索ARXIV.ORG/PDF/1810.00660.PDF》 *
汪琪等: "基于注意力卷积的神经机器翻译", 《计算机科学》 *
陈龙杰等: "基于多注意力尺度特征融合的图像描述生成算法", 《计算机应用》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061862A (en) * 2019-12-16 2020-04-24 湖南大学 Method for generating abstract based on attention mechanism
CN111488440A (en) * 2020-03-30 2020-08-04 华南理工大学 Problem generation method based on multi-task combination
CN111488440B (en) * 2020-03-30 2024-02-13 华南理工大学 Problem generation method based on multi-task combination
CN111538829B (en) * 2020-04-27 2021-04-20 众能联合数字技术有限公司 Novel extraction method for webpage text key content of engineering machinery rental scene
CN111680151A (en) * 2020-05-06 2020-09-18 华东师范大学 Personalized commodity comment abstract generation method based on hierarchical transformer
CN111723196A (en) * 2020-05-21 2020-09-29 西北工业大学 Single document abstract generation model construction method and device based on multi-task learning
CN111782810A (en) * 2020-06-30 2020-10-16 湖南大学 Text abstract generation method based on theme enhancement
CN111966820A (en) * 2020-07-21 2020-11-20 西北工业大学 Method and system for constructing and extracting generative abstract model
CN111931518A (en) * 2020-10-15 2020-11-13 北京金山数字娱乐科技有限公司 Translation model training method and device
CN112528598B (en) * 2020-12-07 2022-04-05 上海交通大学 Automatic text abstract evaluation method based on pre-training language model and information theory
CN112528598A (en) * 2020-12-07 2021-03-19 上海交通大学 Automatic text abstract evaluation method based on pre-training language model and information theory
CN112632228A (en) * 2020-12-30 2021-04-09 深圳供电局有限公司 Text mining-based auxiliary bid evaluation method and system
CN112765345A (en) * 2021-01-22 2021-05-07 重庆邮电大学 Text abstract automatic generation method and system fusing pre-training model
CN112836040A (en) * 2021-01-31 2021-05-25 云知声智能科技股份有限公司 Multi-language abstract generation method and device, electronic equipment and computer readable medium
CN112836040B (en) * 2021-01-31 2022-09-23 云知声智能科技股份有限公司 Method and device for generating multilingual abstract, electronic equipment and computer readable medium
CN113139468A (en) * 2021-04-24 2021-07-20 西安交通大学 Video abstract generation method fusing local target features and global features
CN113434683A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text classification method, device, medium and electronic equipment
CN113434683B (en) * 2021-06-30 2023-08-29 平安科技(深圳)有限公司 Text classification method, device, medium and electronic equipment
CN114154493A (en) * 2022-01-28 2022-03-08 北京芯盾时代科技有限公司 Short message category identification method and device
CN114154493B (en) * 2022-01-28 2022-06-28 北京芯盾时代科技有限公司 Short message category identification method and device
CN118069833A (en) * 2024-04-17 2024-05-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Hierarchical abstract generation method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN110472238B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN110472238A (en) Text snippet method based on level interaction attention
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN109522411A (en) A kind of writing householder method neural network based
CN108519890A (en) A kind of robustness code abstraction generating method based on from attention mechanism
CN110929030A (en) Text abstract and emotion classification combined training method
CN110427616B (en) Text emotion analysis method based on deep learning
Konstas et al. Inducing document plans for concept-to-text generation
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN110717843A (en) Reusable law strip recommendation framework
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN113435211B (en) Text implicit emotion analysis method combined with external knowledge
CN111078866A (en) Chinese text abstract generation method based on sequence-to-sequence model
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN110083824A (en) A kind of Laotian segmenting method based on Multi-Model Combination neural network
CN115310448A (en) Chinese named entity recognition method based on combining bert and word vector
CN115481219A (en) Electricity selling company evaluation emotion classification method based on grammar sequence embedded model
CN115545033A (en) Chinese field text named entity recognition method fusing vocabulary category representation
CN114117041B (en) Attribute-level emotion analysis method based on specific attribute word context modeling
CN112287687B (en) Case tendency extraction type summarization method based on case attribute perception
Wu et al. Context-aware style learning and content recovery networks for neural style transfer
CN117877460A (en) Speech synthesis method, device, speech synthesis model training method and device
Vaessen et al. The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN113468366A (en) Music automatic labeling method
CN113935308A (en) Method and system for automatically generating text abstract facing field of geoscience

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant