CN117251562A - Text abstract generation method based on fact consistency enhancement - Google Patents
Text abstract generation method based on fact consistency enhancement Download PDFInfo
- Publication number
- CN117251562A CN117251562A CN202311278088.9A CN202311278088A CN117251562A CN 117251562 A CN117251562 A CN 117251562A CN 202311278088 A CN202311278088 A CN 202311278088A CN 117251562 A CN117251562 A CN 117251562A
- Authority
- CN
- China
- Prior art keywords
- fact
- word
- attention
- triplet
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 239000013598 vector Substances 0.000 claims abstract description 211
- 238000004364 calculation method Methods 0.000 claims abstract description 34
- 230000007246 mechanism Effects 0.000 claims abstract description 25
- 238000003058 natural language processing Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000003066 decision tree Methods 0.000 claims description 3
- 239000003607 modifier Substances 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 239000012633 leachable Substances 0.000 claims 1
- 238000010845 search algorithm Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 208000002925 dental caries Diseases 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of natural language processing, discloses a text abstract generation method based on fact consistency enhancement, solves the problem that different importance degrees among neglected fact triples contribute differently to a final abstract result in the prior art, and improves the credibility of the generated text abstract. The invention adopts a transducer architecture to construct a sequence-to-sequence text abstract generation model, introduces a fact attention module between a feedforward network module and a cross attention module of a decoder, is used for calculating the influence of each fact triplet on the generated word based on the attention vector of each fact triplet and the word vector of the generated word output by the cross attention module, and updates the word vector of the generated word according to the influence; and the calculation of the attention vector of the fact triplet is obtained by self-attention mechanism calculation on the basis of the coding vector of the fact triplet.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text abstract generation method based on fact consistency enhancement.
Background
Along with the wide application of the mobile internet and the rapid popularization of intelligent terminal equipment, people can create and release contents on the network, and can conveniently acquire network public information, so that the daily life and work of people are more convenient, and the mental life of people is greatly enriched. However, the web text information is often long in space and high in understanding difficulty, and compared with short video information, the web text information has the characteristics of long reading time, unobtrusive emphasis and the like, so that the interest of a reader is difficult to lift in the current fast-paced life, and the reading experience of the reader is reduced to a certain extent.
The generation of a text summary model aims at converting text or a set of text into a short summary containing key information, and the advent of this technology solves the problem of information overload. The generated abstract model is usually based on a coding and decoding framework to realize sequence-to-sequence tasks, and a short and high-fluency abstract is generated by utilizing the strong mapping capability of a pre-training model based on a Transformer and a network architecture. However, the word or phrase inconsistent with the original expression meaning may still be generated due to the way that the generated abstract model outputs the word based on the probability, so that the abstract result is in a factual error and the credibility of the abstract result is affected, thereby limiting the popularization and landing of the generated abstract model and further reducing the research and application value of the generated abstract model.
Therefore, how to solve the problem of the fact error easily occurring in the generated abstract model, and effectively enhance the fact consistency of the abstract result is one of the hot matters focused in the field at present.
Fact consistency enhancements aim to reduce the probability of a factual error in the generated summary result. The fact consistency study of automatic text summarization technology began at the earliest in 2018 Ziqiang Cao et al, who found that there were about 30% errors or non-verifiable facts in the summary results generated by the mainstream generative summary model. Subsequently Ziqiang Cao et al proposed the FTSum model which utilized a newly added encoder to splice the fact triples with the sentence codes of the original document, thereby enabling the model to notice the effect of the fact triples, but which ignored the problem of different importance levels between the fact triples contributing differently to the final summary result.
Gunel B et al propose constructing knowledge graphs from wiki encyclopedia data, introducing entity level knowledge therein into a codec framework, guiding a model to generate correct facts, but ignoring the problem that a traditional codec framework easily outputs vocabularies which do not coincide with original facts in the decoding process, thereby generating facts errors.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the method for generating the text abstract based on the fact consistency enhancement solves the problem that different importance degrees among the fact triples are ignored in the prior art and contribute to the final abstract result differently, and improves the credibility of the generated text abstract.
The technical scheme adopted for solving the technical problems is as follows:
a text abstract generation method based on fact consistency enhancement adopts a converter architecture to construct a sequence-to-sequence text abstract generation model, wherein a decoder of the converter architecture comprises a self-attention module, a cross-attention module and a feedforward network module which are sequentially connected, and the fact attention module is introduced between the feedforward network module and the cross-attention module of the decoder;
define a fact triplet and is denoted as F i =<s i ,r i ,o i >Wherein s is a subject word, r is a relation modifier, o is an object word, the subscript i is the sequence number of the fact triplet, and each word in any of the fact triples is from the same sentence in the original text; the fact attention module takes attention vectors of all the fact triples in the original text and a third word vector sequence output by the decoder cross attention module as inputs, obtains influence coefficients of all the fact triples on all the generated words based on a cross attention mechanism, and updates the third word vector sequence based on the influence coefficients of all the fact triples on all the generated words to obtain a fourth word vector sequence which is used as input of the decoder feedforward network module;
The attention vector of the fact triplet is calculated as follows:
a1, processing an original text by using a natural language processing tool, and extracting a fact triplet F i Constructing a fact triplet set F; then, triad F with facts i The method comprises the steps of constructing nodes and connecting construction edges of the included words, mapping a fact triplet set F into a graph, and calculating to obtain node vectors of all nodes in the graph by utilizing a graph neural network; then, the node vectors of the nodes contained in each fact triplet are spliced to obtain the coding characteristics of each fact tripletWherein,and->Respectively represent fact triples F i Is comprised of i 、r i And o i A node vector of the corresponding node;
a2, will doThe coding characteristics of the real triples are input into a cyclic neural network to obtain the vector representation z of each real triplet fused with the semantic information of the front and rear real triples i ;
A3, using the vector of each fact triplet to represent z i Based on the self-attention mechanism, attention vectors for each fact triplet are obtained.
Further, in the step A2, the recurrent neural network is a Bi-LSTM network; based on the coding characteristics of the fact triples, the Bi-LSTM network is utilized to respectively obtain the forward hidden layer vectors of each fact triplet And the backward hidden layer vector->Then, forward hidden layer vector +.>And the backward hidden layer vector->Vector splicing is carried out to obtain vector representation +.>
Further, in step A1, a node vector of each node in the graph is obtained by calculation using the graph neural network, including:
firstly, initializing node vectors of all nodes in a graph by using a pre-training model;
then, the node vector of each node in the graph is updated using the GCN network according to the following formula:
wherein ReLU represents an activation function, H l And H l+1 The outputs of the first and the l+1 layers of the GCN network are represented, respectively, a represents the adjacency matrix of the graph,representation matrix, W l Representing the weight matrix of the first layer of the GCN network.
Further, in step A1, the method further includes:
first, using the decision tree model of the natural language processing tool, based on the relationship types contained therein, triad F is used for each fact i Classifying the relationship between the two; then, based on the relationship classification result, a fact triplet F is obtained i The relation triples among the two groups construct a relation triplet set R;
then, the coding features of the fact triples contained in each relation triplet are spliced with the relation vector to obtain the coding features R of the relation triples j =[h 1j ,cr j ,h 2j ],h 1j And h 2j Coding features, cr, respectively representing fact triples contained in the jth relational triplet j Representing a relationship vector obtained by the relationship contained in the j-th relationship triplet based on the pre-training model;
then, the coding features of each relation triplet and the coding features of each fact triplet are utilized to perform cross attention calculation, the coding features of each fact triplet are updated based on the calculation result, and the coding features h 'of the updated fact triplet are used' i As input to step A2.
Further, the cross attention calculation is performed by using the coding features of each relation triplet and the coding features of each fact triplet, and the updating of the coding features of each fact triplet is performed based on the calculation result, which specifically includes:
α ij =h i *R j
wherein alpha is ij Expressing the relevance of the ith fact triplet and the jth relationship triplet, beta ij Expressing the relevance weights of the ith fact triplet and the jth relationship triplet, R represents the relationship triplet set.
Further, in step A3, z is represented by a vector of each fact triplet i Based on the self-attention mechanism, attention vectors of the fact triples are obtained; the self-attention mechanism is multi-head self-attention, and the calculation process comprises the following steps:
A31, conversion matrix W based on each attention head k1 、W k2 And W is k3 Representing the vector of the fact triplet z i Conversion to Query valuesKey value->And Value->
A32 based onAnd->Performing attention calculation of each attention head:
wherein d k Is thatIs a dimension of (2);
a33, aggregating the attention calculation results of all the attention heads to obtain an attention vector Z i :
Wherein K is the number of attention heads;
a34, using linear transformation, the attention vector Z i The dimension change is a vector representation z i Is to obtain the attention vector r of the fact triplet i ;
r i =Z i *W 0
Wherein W is 0 Is a linear transformation matrix.
Further, at each time step, the decoding process of the decoder is as follows:
b1, based on each generated word obtained before the current time step t, utilizing a pre-training model to obtain word embedding of the generated word, embedding the words of each generated word, and forming a first word vector sequence of the generated word according to the generation sequence;
b2, inputting the first word vector sequence into a self-attention module of a decoder; firstly, updating a first word vector sequence based on a self-attention mechanism; then, carrying out residual connection and normalization on the input first word vector sequence and the updated first word vector sequence to obtain a second word vector sequence of the generated word;
B3, inputting the second word vector sequence into the decoderCross-attention module of (a); first, an encoder in a transform architecture constructs Key values and Value values based on context vectors generated by an original text, and outputs the attention distribution a of each word in the original text t Constructing a Query value by using the second word vector sequence, and combining the attention distribution a t Updating the second word vector sequence; then, carrying out residual connection and normalization on the input second word vector sequence and the updated second word vector sequence to obtain a third word vector sequence of the generated word;
b4, inputting a third word vector sequence into a fact attention module of the decoder; firstly, based on the attention vector and the third word vector sequence of each fact triplet, obtaining the influence coefficient of all the fact triples on each generated word by using a cross attention mechanism; then, carrying out residual connection according to word correspondence on the influence coefficients of the input third word vector sequence and all the fact triples on each generated word, and normalizing to obtain a fourth word vector sequence of the generated word;
b5, inputting the fourth word vector sequence into a feedforward network module of the decoder; first, using a feed forward network, the formula p is calculated t =max(0,ln t W 1 +b 1 )W 2 +b 2 Calculating; then, the output p of the feedforward network layer t And the fourth word vector sequence ln t Residual connection is carried out, and normalization is carried out, so as to obtain the output of the decoderWherein W is 1 、W 2 、b 1 And b 2 Are all learnable parameters of the feed-forward network.
Further, based on the attention vector and the third word vector sequence of each fact triplet, the influence coefficient of all the fact triples on each generated word is obtained by using a cross attention mechanism, and the method specifically comprises the following steps:
wherein,a word vector representing an mth generated word in the third word vector sequence; alpha im Representing the relevance of the mth generated term to the ith fact triplet, beta im Relevance weights representing the mth generated term and the ith fact triplet, u m Representing the coefficient of influence of all fact triples on the mth generated word, F represents the fact triplet set.
Further, the text abstract generation model also comprises a pointer network;
at each time step, the processing procedure of the pointer network includes:
first, the output of the decoder at the current time step t is calculated using the linear layer as followsFeature space mapped to word table:
wherein W is vocab And b vocab Representing a matrix of learnable parameters corresponding to a word table, the word table being a word table of a pre-training model;
Then, based on l t Calculating vocabulary distribution of current time step tAnd pointer probability->
Wherein w is gen And b gen Representing a learnable parameter;probability of generating word from word list representing current time step, < >>Representing the output probability distribution of each word in the word list as the generated word of the current time step;
thereafter, based onAnd->Calculating the final probability distribution P of the current time step t t (w):
Wherein P is t (w) represents the probability of generating words for the word w in the extended vocabulary as the current time step t, the extended vocabulary comprising the word table and all words contained in the original text, N represents the number of words in the original text, N represents the sequence number of the words in the original text,attention profile a for each word vector of the original text generated for the cross attention module t Attention of the nth word in +.>Representing the sum of the attention of the word w in the original text.
Further, the text abstract generation model further comprises a bundle searching algorithm, and the processing procedure of the bundle searching algorithm comprises the following steps of:
first, the final probability distribution P of the network output is pointed by the current time step t t (w) screening P words with the highest probability to construct candidate words;
then, judging, if the candidate words contain words contained in the original text, selecting the words contained in the original text and having the highest output probability as the generated words of the current time step; otherwise, using the P candidate words and the generated words generated before the current time step to form P candidate abstracts, calculating the combination probability of the candidate abstracts according to the following formula, and selecting the candidate word corresponding to one candidate abstract with the largest combination probability as the generated word of the current time step:
P Y =argmax(log(Π t p(y t |y 1 ,y 2 ,…,y t-1 )))
Wherein Y represents the candidate abstract and p represents the output probability of the corresponding word.
The beneficial effects of the invention are as follows:
the invention adopts a transducer architecture to construct a sequence-to-sequence text abstract generation model, improves a decoder of the transducer architecture, introduces a fact attention module between a feedforward network module and a cross attention module of the decoder, calculates the influence of each fact triplet on the generated word based on the attention vector of each fact triplet and the word vector of the generated word output by the cross attention module in the decoder, and updates the word vector of the generated word according to the influence; and the calculation of the attention vector of the fact triplet is obtained by self-attention mechanism calculation on the basis of the coding vector of the fact triplet. Therefore, the problem that different importance degrees among neglected fact triples contribute differently to the final abstract result can be avoided, and the fact consistency of text abstract generation can be improved.
Further, given the long text, there is also a complex relationship between the fact triples, which may also result in different fact triples contributing differently to the final result. Therefore, the influence of complex relationships among different real triples is also integrated in the calculation process of the real triplet coding vector.
Furthermore, the invention also introduces a pointer network, the output vector of the current time step decoder is mapped to the feature space of the word list through the pointer network, the mapping vector representation is obtained, the probability of generating words of the current time step from the word list and the output probability distribution of each word in the word list as the current time step generating word are calculated according to the mapping vector representation, and then the probability of generating words of each word in the extended word list formed by the word list and the original words is calculated finally as the current time step; the fact that the prior decoding information is directly mapped to the word list in order to obtain the distribution probability of the current generated word in the traditional decoding process is avoided, and the fact that words which do not belong to the original text but belong to the word list are possibly introduced into the generated abstract is avoided. The invention avoids generating words which do not belong to the original text but belong to the word list as much as possible through the introduction of the pointer network, thereby reducing the probability of occurrence of the fact error and further improving the fact consistency of the generation of the text abstract.
Furthermore, the invention also introduces a cluster search algorithm, wherein the cluster search algorithm is used for utilizing the probability that each word in the expanded word list output by the current time step pointer network is finally used as a current time step generation word, screening a plurality of words with the highest probability to construct candidate words, and if the candidate words contain words contained in the original text, selecting the words contained in the original text and having the highest output probability as the generation word of the current time step; otherwise, using the plurality of candidate words and the generated words generated before the current time step to form a plurality of candidate abstracts, calculating the combination probability of the candidate abstracts, and selecting the candidate word corresponding to one candidate abstract with the largest combination probability as the generated word of the current time step. When the calculated candidate words do not contain the word in the original text, the candidate abstracts are constructed by utilizing the candidate words and the generated words before the current time step, and the combination probability of each candidate abstract is calculated, so that the approximate global optimal solution in relative sense is found on the phrase layer, the rationality of the combination of the generated words and the generated word senses of the current time step is improved, and the fact consistency of the generation of the text abstracts is improved.
Drawings
FIG. 1 is a schematic block diagram of text excerpt generation based on fact consistency enhancement in an embodiment of the present invention.
Detailed Description
The invention aims to provide a text abstract generation method based on fact consistency enhancement, which solves the problem that different importance degrees among neglected fact triples contribute to a final abstract result differently in the prior art, and improves the credibility of the generated text abstract.
The text abstract generation method adopts a transducer architecture to construct a sequence-to-sequence text abstract generation model. The converter framework is an Encoder-Decoder framework, and in the coding part, firstly, the input original text is segmented to obtain a word sequence of the original text; then, obtaining word vectors of words in the word sequence through a pre-training model to form a word vector sequence of the original text; finally, the word vector sequence is input into an encoder of a transducer architecture, and the context vector of the original text is obtained through encoding by the encoder. In the decoding section, digest text is generated stepwise based on the context vector obtained by encoding, i.e., the generated words are obtained one by one in time steps, each time step being decoded to generate a word.
The key of the invention is to improve the decoder of the converter architecture, and introduce a fact attention module between the feedforward network module and the cross attention module of the existing decoder, that is, the decoder in the invention comprises a self attention module, a cross attention module, a fact attention module and a feedforward network module which are connected in sequence. The fact attention module calculates the influence of each fact triplet on the generated word based on the attention vector of each fact triplet and the word vector of the generated word output by the cross attention module, and updates the word vector of the generated word according to the influence.
Specifically, in the method of the present invention, at each time step, the decoding process of the decoder is:
firstly, all words generated before the current time step are obtained into a first word vector sequence of generated words through a pre-training model; then, updating the first word vector sequence by using a self-attention module, and carrying out residual connection and normalization processing on the first word vector sequence and the input first word vector sequence to obtain a second word vector sequence of the generated word; then, based on the cross attention module, updating the second word vector sequence by using the context vector obtained by the coding part, and carrying out residual connection and normalization processing on the second word vector sequence to obtain a third word vector sequence of the generated word; and then, based on the attention vector and the third word vector sequence of each fact triplet, obtaining the influence coefficient of all the fact triples on each generated word by using a fact attention module and adopting a cross attention mechanism, carrying out residual connection and normalization processing on the influence coefficient and the third word vector sequence according to word correspondence to obtain a fourth word vector sequence of the generated word, and taking the fourth word vector sequence as the input of a feedforward network to obtain the decoding output vector of the current time step.
The decoding process obtains the distribution probability of the word of the current time step, and can directly map the distribution probability to a word list to obtain a generated word; but this approach may introduce words in the generated abstract that do not belong to the original text but to the word list, and may to some extent result in a factual error due to insufficient accuracy of the distribution probability calculation. In order to effectively alleviate the problem, the invention introduces a pointer network, maps the decoding output vector of the current time step to the feature space of the word list to obtain a mapping vector representation, calculates the probability of generating words of the current time step from the word list and the output probability distribution of each word in the word list as the current time step generating word according to the mapping vector representation, and further calculates the probability of generating words of the word list and all words in the expansion word list formed by the original text finally as the current time step.
Furthermore, the method of the invention also introduces a bundle searching method, and finds the approximate global optimal solution in relative sense on the phrase layer through the calculation of the word combination probability of the current generated word and the generated word, improves the rationality of the word combination of the generated word and the generated word sense of the current time step, and avoids generating words which do not belong to the original text but belong to the word list, thereby further improving the fact consistency and the credibility of the generation of the text abstract. The cluster searching algorithm uses the probability that each word in the expanded word list output by the current time step pointer network is finally used as a current time step generation word, screens a plurality of words with the maximum probability to construct candidate words, and if the candidate words contain words contained in the original text, selects the words contained in the original text and with the maximum output probability as the generation words of the current time step; otherwise, using the plurality of candidate words and the generated words generated before the current time step to form a plurality of candidate abstracts, calculating the combination probability of the candidate abstracts, and selecting the candidate word corresponding to one candidate abstract with the largest combination probability as the generated word of the current time step.
The method of the present invention is further described below with reference to examples.
Examples:
in the text digest generation method based on fact consistency enhancement in this embodiment, as shown in fig. 1, a text digest generation model is based on an Encoder-Decoder framework of a transducer architecture, and compared with the prior art, the structure of an Encoder remains unchanged, and a fact attention module is introduced into the Decoder, and an improved Decoder comprises a self attention module, a cross attention module, a fact attention module and a feedforward network module.
Three aspects of the real triplet attention vector generation, the coding and decoding process and the model training are specifically described below.
1. Fact triplet attention vector generation
The attention vector of the fact triples is used for introducing the weight of the fact triples in the subsequent decoding process, and calculating different contributions of each fact triplet to the generated word, and comprises the following processes:
a1, extracting fact triplet
In this step, the original text is processed by using a natural language processing tool, and a real triplet F is extracted from the original text i A fact triplet set F is constructed. In this embodiment, the natural language processing tool is the StanfordNLP toolkit.
Specifically, first, an original text D is input into a StanfordNLP toolkit, and word segmentation is performed to obtain a word sequence { x ] of the original text 1 ,x 2 ,…,x b ,…,x N X, where x n Represents the nth word in the text, N represents the sequence number of the word in the original text, and N represents the number of words in the original text.
Then, outputting the fact triplet F extracted from the original text D through the processes of part-of-speech tagging, syntactic component analysis, coreference resolution, relation extraction and the like by using a StanfordNLP toolkit i A fact triplet set F is constructed. Set f= { F 1 ,F 2 ,…,F i ,…,F I "facts triplet F i =<s i ,r i ,o i >Wherein s is a subject word, r is a relation modifier, o is a guest word, subscript I is a sequence number of a fact triplet, I represents the number of the fact triplet, and any of the fact triples F i S in (3) i 、r i And o i All from the same sentence. For example,<kebi, like cola>Is a fact triplet.
A2, extracting relation triples
In this step, the theory of the structure of the conquers is applied (Rhetorical structure theory), and complex relationships between the fact triples are extracted on the basis of the fact triples. Further, using the decision tree model of the natural language processing tool, based on the relationship types contained therein, for each fact triplet F i Classifying the relationship between the two; then, based on the relationship classification result, a fact triplet F is obtained i Between (a) and (b)And (3) constructing a relation triplet set R.
In this embodiment, the natural language processing tool is a StanfordNLP toolkit, and since the RST tree model in the thesaurus structure theory is already built into the stanfordcore nlp toolkit, the natural language processing tool may be called directly, specifically, the original text d= { x 1 ,x 2 ,…,x n ,…,x N The fact triplet set f= { F 1 ,F 2 ,…,F i ,…,F I Input stanfordcore nlp toolkit. Using the StanfordcsoreNLP toolkit, based on the relationship types contained in its RST tree model, for each fact triplet F i The relationships among the RST tree models are classified, and 23 relationships are contained in the RST tree models at present; then, based on the relationship classification result, a fact triplet F is obtained i Relation triplet R between j And constructing a relation triplet set R.
Set r= { R 1 ,R 2 …,R j ,…R J Relational triplet R j =<F 1j ,CR j ,F 2j >Wherein F is 1j And F 2j Representing the fact triples, CR, contained in the j-th relationship triplet, respectively j And the relation contained in the J relation triples is represented, the subscript J is the sequence number of the relation triples, and J represents the number of the relation triples. For example: fact triples<The fei Ming pertains to the Sichuan province >And (3) with<Small, clear, master, sichuan, etc>Is a "background" relationship and,<jack, like, cola>And (3) with<Jack, suffering from dental caries>Is a "causal" relationship.
A3, vector coding
In the step, the fact triplet set is mapped into a graph structure, nodes of the graph structure are coded by utilizing a graph convolution network, complex relation information is integrated, and coding vectors of all the fact triples are obtained.
Specifically, the method comprises the following processing steps:
a31 mapping the fact triplet set into a graph
In order to encode with a graph convolutional neural network, unstructured fact triples are required to be aggregated F,converted into a structured graph. For each fact triplet F i =<s i ,r i ,o i >Creating a subject node s i Relation node r i And object node o i The method comprises the steps of carrying out a first treatment on the surface of the The host node and the relation node are connected through an edge, the object node and the relation node are connected through an edge, and the edge does not represent any information and only represents that the object node and the relation node are connected. I.e. in the respective fact triplet F i The included word constructs nodes, contacts construct edges, and maps the fact triplet set F into a graph.
A32, node initialization
The step uses a pre-trained Bert model to initialize node vectors of nodes in the graph.
A33, node update
The method comprises the steps of updating node vectors of nodes in a graph by using a GCN (generalized gateway network):
the GCN network can collect neighbor node information around any node and convert the characteristic information of the node by utilizing the information, and the process can be repeated for a plurality of times, and the specific calculation process is as follows:
wherein ReLU represents an activation function, H l And H l+1 The outputs of the first and the l+1 layers of the GCN network are represented, respectively, a represents the adjacency matrix of the graph,representation matrix, W l Representing the weight matrix of the first layer of the GCN network.
After the GCN network aggregation calculation, each node contains information of surrounding nodes, even two-hop and three-hop distance external nodes, effectively captures the associated information and contains the real relation to a certain extent. In this embodiment, the GCN network is two-layered.
A34, constructing fact triplet coding features
In the present step, the step of the method,splicing node vectors of nodes contained in each fact triplet to obtain coding characteristics of each fact tripletWherein (1)>And->Respectively represent fact triples F i Is comprised of i 、r i And o i A node vector of the corresponding node.
A35, updating fact triplet coding feature
In order to effectively integrate the influence of complex relations among the fact triples on information contained in different fact triples, the coding features of the relation triples are firstly constructed, and then the coding features of the fact triples are updated by adopting an attention mechanism based on the coding features of the relation triples.
Specifically, first, the coding features of the fact triples contained in each relation triplet are spliced with the relation vector to obtain the coding features R of the relation triples j =[h 1j ,cr j ,h 2j ],h 1j And h 2j Coding features, cr, respectively representing fact triples contained in the jth relational triplet j Representing the relationship vector obtained by the relationship contained in the j-th relationship triplet based on the pre-training model.
Then, the coding features of each relation triplet and the coding features of each fact triplet are utilized to perform cross attention calculation, the coding features of each fact triplet are updated based on the calculation result, and the coding features h 'of the updated fact triplet are obtained' i The calculation process is as follows:
α ij =h i *R j
wherein alpha is ij Expressing the relevance of the ith fact triplet and the jth relationship triplet, beta ij Expressing the relevance weights of the ith fact triplet and the jth relationship triplet, R represents the relationship triplet set.
A4, fusion front and rear semantics
For the fact triplet coding feature obtained in the previous step, although a certain relationship structure information among the fact triples is hidden, the method only comprises local structure information, and the capturing capability of the remote fact triplet is insufficient under the long-term condition, so as to solve the problem of long-distance dependence, in the step, the coding feature of the fact triplet is input into a cyclic neural network, and the vector representation z of each fact triplet, which is fused with the semantic information of the front and rear fact triples, is obtained i 。
Specifically, in this embodiment, the coding feature h 'of the fact triplet updated in step a 35' i Inputting Bi-LSTM network to obtain vector representation z of each fact triplet fusing semantic information of the fact triples before and after the fact triplet i . Further, in the Bi-LSTM network, based on the forward LSTM, the forward hidden layer vector is obtainedBased on backward LSTM, a backward hidden layer vector is obtained>Then, forward hidden layer vector +.>And the backward hidden layer vector->Vector splicing is carried out to obtain the vector representation z of the semantic information of the fact triples before and after fusion i The formula is as follows:
a5, self-attention calculation
In this step, z is represented by a vector of fact triples i Based on the self-attention mechanism, attention vectors for each fact triplet are obtained. In order to improve the capturing capability of different information and further improve the effect of attention calculation, the self-attention mechanism is multi-head self-attention, and the method specifically comprises the following steps:
a51, conversion matrix W based on each attention head k1 、W k2 And W is k3 Representing the vector of the fact triplet z i Conversion to Query valuesKey value->And Value->
A52 based onAnd->Performing attention calculation of each attention head:
wherein d k Is thatIs a dimension of (2);
A53, aggregating the attention calculation results of the attention heads to obtain an attention vector Z i :
Wherein K is the number of attention heads;
a54, using linear transformation, vector Z of attention i The dimension change is a vector representation z i Is to obtain the attention vector r of the fact triplet i ;
r i =Z i *W 0
Wherein W is 0 Is a linear transformation matrix.
2. Encoding process
For the encoding process, firstly, word segmentation is carried out on an original text to obtain a word sequence of the original text; then, extracting word vectors of words in the word sequence of the original text by pre-training the Bert model to obtain the word vector sequence of the original text; thereafter, for the word vector sequence of the original text, the encoder encodes to obtain a context vector of the original text. Since the encoding process is identical to the encoding process of the prior art transducer architecture, no further description is given here.
3. Decoding process
And in the decoding process, generating words one by one according to time steps, namely, each time step is decoded to generate a word.
Decoding for each time step, comprising the steps of:
b1, acquiring a first word vector sequence of a generated word before the current time step:
in the step, based on each generated word obtained before the current time step t, a pre-training Bert model is utilized to obtain word embedding of the generated word, the words of each generated word are embedded, and a first word vector sequence of the generated word is formed according to the generation order.
B2, performing self-attention calculation on the first word vector sequence to obtain a second word vector sequence of the generated word:
in the step, a first word vector sequence is input into a self-attention module of a decoder; firstly, updating a first word vector sequence based on a self-attention mechanism; and then, in order to relieve the gradient disappearance problem of deep network training and accelerate network convergence, carrying out residual connection on the input first word vector sequence and the updated first word vector sequence, and normalizing to obtain a second word vector sequence of the generated word.
B3, updating the second word vector sequence based on a cross attention mechanism by utilizing the context vector of the original text to obtain a third word vector sequence of the generated word:
in the step, a second word vector sequence is input into a cross attention module of a decoder; first, an encoder in a transform architecture constructs Key values and Value values based on context vectors (obtained by an encoding section) generated from an original text, and outputs the attention distribution a of each word in the original text t Constructing a Query value by using the second word vector sequence, and combining the attention distribution a t Updating the second word vector sequence; and then, carrying out residual connection and normalization on the input second word vector sequence and the updated second word vector sequence to obtain a third word vector sequence of the generated word.
And B4, calculating influence coefficients of all the fact triples on each generated word by using a cross attention mechanism based on the attention vector and the third word vector sequence of each fact triplet to obtain a fourth word vector sequence of the generated word:
in the step, a third word vector sequence is input into a fact attention module of a decoder; firstly, based on the attention vector and the third word vector sequence of each fact triplet, obtaining the influence coefficient of all the fact triples on each generated word by using a cross attention mechanism; and then, carrying out residual connection according to word correspondence on the influence coefficients of the input third word vector sequence and all the fact triples on each generated word, and normalizing to obtain a fourth word vector sequence of the generated word.
The influence coefficients of the fact triples on the generated words are calculated by using a cross attention mechanism, and the influence coefficients are specifically as follows:
first, calculate the relevance of each fact triplet to each generated word:
wherein alpha is im Representing the relevance of the mth generated term and the ith fact triplet,a word vector representing an mth generated word in the third word vector sequence;
then, the relevance weights of each generated word and each fact triplet are calculated:
Wherein beta is im Representing the relevance weights of the mth generated word and the ith fact triplet, F representing the fact triplet set; then, the influence coefficient of the fact triplet on each generated word is calculated:
u m representing the coefficient of influence of all factual triples on the mth generated word.
B5, taking the fourth word vector sequence as the input of a feedforward network to obtain a decoding output vector of the current time step:
in the step, inputting a fourth word vector sequence into a feedforward network module of a decoder; first, using a feed forward network, the formula p is calculated t =max(0,ln t W 1 +b 1 )W 2 +b 2 Calculating; then, the output p of the feedforward network layer t And the fourth word vector sequence ln t Residual connection is carried out, and normalization is carried out, so as to obtain the output of the decoderWherein W is 1 、W 2 、b 1 And b 2 Are all learnable parameters of the feed-forward network.
B6, predicting the probability of generating words by taking the words in the word list and the words in the original text as the current time step through a pointer network based on the decoding output vector:
in order to obtain the distribution probability of the current generated word, the conventional decoding process maps the previous decoding information onto the word list directly, and may introduce a factual error generated by words which do not belong to the original text but belong to the word list in the generated abstract. In order to minimize the resulting actual errors, in this step, a network of pointers is introduced to minimize the generation of words not belonging to the original text but belonging to the word list, thereby reducing the probability of actual errors.
Specifically, first, the output of the decoder at the current time step t is calculated using the linear layer as followsFeature space mapped to word table:
wherein W is vocab And b vocab Representing a matrix of learnable parameters corresponding to a word table, the word table being a word table of a pre-training model;
then, based on the mapping vectorRepresentation l t Calculating vocabulary distribution of current time step tAnd pointer probability->
Wherein w is gen And b gen Representing a learnable parameter;probability of generating word from word list representing current time step, < >>Representing the output probability distribution of each word in the word list as the generated word of the current time step;
finally, based onAnd->Calculating the final probability distribution P of the current time step t t (w):
Wherein P is t (w) represents the probability of generating words for word w in an expanded vocabulary including all words contained in the word list and the original text as the current time step t, N represents the original textN represents the number of words in the original text,attention profile a for each word vector of the original text generated for the cross attention module t Attention of the nth word in +.>Representing the sum of the attention of the word w in the original text.
And B7, based on the probability that the word in the predicted word list and the word in the original text are used as the generated word of the current time step, acquiring the generated word of the current time step by adopting a cluster search algorithm:
In this step, since the words in the word list belong to foreign words for the original text, and the foreign words are more prone to occurrence of a factual error than the generation of the abstract by the words in the original text, when the foreign words are actually required to be introduced, the cluster search algorithm calculates the combination probability between the foreign words to be introduced and the generated words, thereby judging the rationality of the words.
Specifically, the processing procedure includes:
first, the final probability distribution P of the network output is pointed by the current time step t t (w) screening P words with the highest probability to construct candidate words;
then, judging, if the candidate words contain words contained in the original text, selecting the words contained in the original text and having the highest output probability as the generated words of the current time step; otherwise, using the P candidate words and the generated words generated before the current time step to form M candidate abstracts, calculating the combination probability of the candidate abstracts according to the following formula, and selecting a candidate word corresponding to one candidate abstract with the largest combination probability as the generated word of the current time step:
P Y =argmax(log(Π t p(y t |y 1 ,y 2 ,…,y t-1 )))
wherein Y represents the candidate abstract and p represents the output probability of the corresponding word.
And B1-B7 are a time step generating process, a time step outputs a generating word, and the generating process is circularly executed until the number of the generating words of the preset abstract is reached, so that the generating abstract can be obtained.
4. Model training
The training process of the text abstract generation model in this embodiment is as follows:
c1, selecting a sample from a training set as a current training sample, wherein the sample comprises an original text and a real abstract thereof; specifically, a common data set in the text abstract field can be used as basic data, such as a CNN/DailyMail data set as a training set;
c2, inputting an original text of the current training sample into a text abstract generating model;
obtaining a context vector of the original text by using an encoder of the text abstract generation model;
generating generated words of each time step one by one according to steps B1-B7 by using a text abstract generating model, and finally forming generated abstracts of original texts of the training sample through the generated words of each time step, wherein the number of the generated words is consistent with the number of words of a real abstract;
and C3, performing loss calculation according to the following loss function, and updating a text abstract generation model based on the loss:
wherein y is m Word vectors representing the mth word in the real abstract,representing a word vector of an mth generated word in the generated abstract, wherein T represents matrix transposition;
and C4, repeating the steps C1 to C3 until training of the text abstract generation model is completed.
Finally, it should be noted that the above examples are only preferred embodiments and are not intended to limit the invention. It should be noted that modifications, equivalents, improvements and others may be made by those skilled in the art without departing from the spirit of the invention and the scope of the claims, and are intended to be included within the scope of the invention.
Claims (10)
1. A text abstract generation method based on fact consistency enhancement adopts a transform architecture to construct a sequence-to-sequence text abstract generation model, and a decoder of the transform architecture comprises a self-attention module, a cross-attention module and a feed-forward network module which are sequentially connected, and is characterized in that:
introducing a fact attention module between a feed forward network module and a cross attention module of the decoder;
define a fact triplet and is denoted as F i =<s i ,r i ,o i >Wherein s is a subject word, r is a relation modifier, o is an object word, the subscript i is the sequence number of the fact triplet, and each word in any of the fact triples is from the same sentence in the original text; the fact attention module takes attention vectors of all the fact triples in the original text and a third word vector sequence output by the decoder cross attention module as inputs, obtains influence coefficients of all the fact triples on all the generated words based on a cross attention mechanism, and updates the third word vector sequence based on the influence coefficients of all the fact triples on all the generated words to obtain a fourth word vector sequence which is used as input of the decoder feedforward network module;
The attention vector of the fact triplet is calculated as follows:
a1, processing an original text by using a natural language processing tool, and extracting a fact triplet F i Constructing a fact triplet set F; then, triad F with facts i The method comprises the steps of constructing nodes and connecting construction edges of the included words, mapping a fact triplet set F into a graph, and calculating to obtain node vectors of all nodes in the graph by utilizing a graph neural network;then, the node vectors of the nodes contained in each fact triplet are spliced to obtain the coding characteristics of each fact tripletWherein (1)>And->Respectively represent fact triples F i Is comprised of i 、r i And o i A node vector of the corresponding node;
a2, inputting coding features of the fact triples into a cyclic neural network to obtain vector representation z of each fact triplet fused with semantic information of the front and rear fact triples i ;
A3, using the vector of each fact triplet to represent z i Based on the self-attention mechanism, attention vectors for each fact triplet are obtained.
2. A method for generating a text excerpt based on fact consistency enhancement as recited in claim 1, wherein,
in the step A2, the circulating neural network is a Bi-LSTM network; based on the coding characteristics of the fact triples, the Bi-LSTM network is utilized to respectively obtain the forward hidden layer vectors of each fact triplet And the backward hidden layer vector->Then, forward hidden layer vector +.>And the backward hidden layer vector->Vector splicing is carried out to obtain vector representation +.>
3. A method for generating a text excerpt based on fact consistency enhancement as recited in claim 1, wherein,
in step A1, a node vector of each node in the graph is obtained by calculation using the graph neural network, including:
firstly, initializing node vectors of all nodes in a graph by using a pre-training model;
then, the node vector of each node in the graph is updated using the GCN network according to the following formula:
wherein ReLU represents an activation function, H l And H l+1 The outputs of the first and the l+1 layers of the GCN network are represented, respectively, a represents the adjacency matrix of the graph,representation matrix, W t Representing the weight matrix of the first layer of the GCN network.
4. A method for generating a text excerpt based on fact consistency enhancement as recited in claim 1, wherein,
in step A1, further comprising:
first, using the decision tree model of the natural language processing tool, based on the relationship types contained therein, triad F is used for each fact i Classifying the relationship between the two; then, based on the relationship classification result, a fact triplet F is obtained i Relationship triples among them, and building relationship triplesA group set R;
then, the coding features of the fact triples contained in each relation triplet are spliced with the relation vector to obtain the coding features R of the relation triples j =[h 1j ,cr j ,h 2j ],h 1j And h 2j Coding features, cr, respectively representing fact triples contained in the jth relational triplet j Representing a relationship vector obtained by the relationship contained in the j-th relationship triplet based on the pre-training model;
then, the coding features of each relation triplet and the coding features of each fact triplet are utilized to perform cross attention calculation, the coding features of each fact triplet are updated based on the calculation result, and the coding features h 'of the updated fact triplet are used' i As input to step A2.
5. A method for generating a text excerpt based on fact consistency enhancement as recited in claim 4, wherein,
and performing cross attention calculation by utilizing the coding features of each relation triplet and the coding features of each fact triplet, and updating the coding features of each fact triplet based on calculation results, wherein the method specifically comprises the following steps:
α ij =h i *R j
wherein alpha is ij Expressing the relevance of the ith fact triplet and the jth relationship triplet, beta ij Expressing the relevance weights of the ith fact triplet and the jth relationship triplet, R represents the relationship triplet set.
6. According to any one of claims 1 to 5A text abstract generation method based on fact consistency enhancement is characterized in that in step A3, z is represented by vectors of fact triples i Based on the self-attention mechanism, attention vectors of the fact triples are obtained; the self-attention mechanism is multi-head self-attention, and the calculation process comprises the following steps:
a31, conversion matrix W based on each attention head k1 、W k2 And W is k3 Representing the vector of the fact triplet z i Conversion to Query valuesKey value->And Value->
A32 based onAnd->Performing attention calculation of each attention head:
wherein d k Is thatIs a dimension of (2);
a33, aggregating the attention calculation results of all the attention heads to obtain an attention vector Z i :
Wherein K is the number of attention heads;
a34, using linear transformation, the attention vector Z i The dimension change is a vector representation z i Is to obtain the attention vector r of the fact triplet i ;
r i =Z i *W 0
Wherein W is 0 Is a linear transformation matrix.
7. A text digest generation method based on fact consistency enhancement as recited in claim 1, wherein at each time step, the decoding process of said decoder is:
b1, based on each generated word obtained before the current time step t, utilizing a pre-training model to obtain word embedding of the generated word, embedding the words of each generated word, and forming a first word vector sequence of the generated word according to the generation sequence;
B2, inputting the first word vector sequence into a self-attention module of a decoder; firstly, updating a first word vector sequence based on a self-attention mechanism; then, carrying out residual connection and normalization on the input first word vector sequence and the updated first word vector sequence to obtain a second word vector sequence of the generated word;
b3, inputting the second word vector sequence into a cross attention module of the decoder; first, by a transducerThe structured encoder constructs Key Value and Value based on context vector generated by the original text, and outputs the attention distribution a of each word in the original text t Constructing a Query value by using the second word vector sequence, and combining the attention distribution a t Updating the second word vector sequence; then, carrying out residual connection and normalization on the input second word vector sequence and the updated second word vector sequence to obtain a third word vector sequence of the generated word;
b4, inputting a third word vector sequence into a fact attention module of the decoder; firstly, based on the attention vector and the third word vector sequence of each fact triplet, obtaining the influence coefficient of all the fact triples on each generated word by using a cross attention mechanism; then, carrying out residual connection according to word correspondence on the influence coefficients of the input third word vector sequence and all the fact triples on each generated word, and normalizing to obtain a fourth word vector sequence of the generated word;
B5, inputting the fourth word vector sequence into a feedforward network module of the decoder; first, using a feed forward network, the formula p is calculated t =max(0,ln t W 1 +b 1 )W 2 +b 2 Calculating; then, the output p of the feedforward network layer t And the fourth word vector sequence ln t Residual connection is carried out, and normalization is carried out, so as to obtain the output of the decoderWherein W is 1 、W 2 、b 1 And b 2 Are all learnable parameters of the feed-forward network.
8. A text summarization method based on fact-consistency enhancement according to claim 1 or 7, wherein,
based on the attention vector and the third word vector sequence of each fact triplet, the influence coefficients of all the fact triples on each generated word are obtained by using a cross attention mechanism, and the method specifically comprises the following steps:
wherein,a word vector representing an mth generated word in the third word vector sequence; alpha im Representing the relevance of the mth generated term to the ith fact triplet, beta im Relevance weights representing the mth generated term and the ith fact triplet, u m Representing the coefficient of influence of all fact triples on the mth generated word, F represents the fact triplet set.
9. A method for generating a text excerpt based on fact consistency enhancement as recited in claim 7, wherein,
the text abstract generation model also comprises a pointer network;
At each time step, the processing procedure of the pointer network includes:
first, the output of the decoder at the current time step t is calculated using the linear layer as followsFeature space mapped to word table:
wherein W is vocab And b vocab Representing leachable parameters corresponding to word listsThe word list is a word list of the pre-training model;
then, based on l t Calculating vocabulary distribution of current time step tAnd pointer probability->
Wherein w is gen And b gen Representing a learnable parameter;probability of generating word from word list representing current time step, < >>Representing the output probability distribution of each word in the word list as the generated word of the current time step;
thereafter, based onAnd->Calculating the final probability distribution P of the current time step t t (w):
Wherein P is t (w) represents the probability of generating words for the word w in the extended vocabulary as the current time step t, the extended vocabulary comprising the word table and all words contained in the original text, N represents the number of words in the original text, N represents the sequence number of the words in the original text,attention profile a for each word vector of the original text generated for the cross attention module t Attention of the nth word in +.>Representing the sum of the attention of the word w in the original text.
10. A method for generating a text excerpt based on fact consistency enhancement as recited in claim 9, wherein,
the text abstract generation model also comprises a bundle searching algorithm, and the processing procedure of the bundle searching algorithm comprises the following steps of:
first, the final probability distribution P of the network output is pointed by the current time step t t (w) screening P words with the highest probability to construct candidate words;
then, judging, if the candidate words contain words contained in the original text, selecting the words contained in the original text and having the highest output probability as the generated words of the current time step; otherwise, using the P candidate words and the generated words generated before the current time step to form P candidate abstracts, calculating the combination probability of the candidate abstracts according to the following formula, and selecting the candidate word corresponding to one candidate abstract with the largest combination probability as the generated word of the current time step:
P Y =argmax(log(Π t p(y t |y 1 ,y 2 ,...,y t-1 )))
wherein Y represents the candidate abstract and p represents the output probability of the corresponding word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311278088.9A CN117251562A (en) | 2023-09-28 | 2023-09-28 | Text abstract generation method based on fact consistency enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311278088.9A CN117251562A (en) | 2023-09-28 | 2023-09-28 | Text abstract generation method based on fact consistency enhancement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117251562A true CN117251562A (en) | 2023-12-19 |
Family
ID=89136597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311278088.9A Pending CN117251562A (en) | 2023-09-28 | 2023-09-28 | Text abstract generation method based on fact consistency enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117251562A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556787A (en) * | 2024-01-11 | 2024-02-13 | 西湖大学 | Method and system for generating target text sequence for natural language text sequence |
-
2023
- 2023-09-28 CN CN202311278088.9A patent/CN117251562A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556787A (en) * | 2024-01-11 | 2024-02-13 | 西湖大学 | Method and system for generating target text sequence for natural language text sequence |
CN117556787B (en) * | 2024-01-11 | 2024-04-26 | 西湖大学 | Method and system for generating target text sequence for natural language text sequence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110929030B (en) | Text abstract and emotion classification combined training method | |
CN109492113B (en) | Entity and relation combined extraction method for software defect knowledge | |
CN112347796B (en) | Mongolian Chinese neural machine translation method based on combination of distillation BERT and improved Transformer | |
CN111506732B (en) | Text multi-level label classification method | |
CN111651973B (en) | Text matching method based on syntactic perception | |
CN115204143B (en) | Method and system for calculating text similarity based on prompt | |
CN113657115A (en) | Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion | |
CN110717345A (en) | Translation realignment recurrent neural network cross-language machine translation method | |
CN116401376A (en) | Knowledge graph construction method and system for manufacturability inspection | |
CN114708474A (en) | Image semantic understanding algorithm fusing local and global features | |
CN115759119B (en) | Financial text emotion analysis method, system, medium and equipment | |
CN117251562A (en) | Text abstract generation method based on fact consistency enhancement | |
CN115719072A (en) | Chapter-level neural machine translation method and system based on mask mechanism | |
CN115510841A (en) | Text matching method based on data enhancement and graph matching network | |
CN112380882B (en) | Mongolian Chinese neural machine translation method with error correction function | |
CN114595700A (en) | Zero-pronoun and chapter information fused Hanyue neural machine translation method | |
CN114548053A (en) | Text comparison learning error correction system, method and device based on editing method | |
CN111428518A (en) | Low-frequency word translation method and device | |
CN114358006A (en) | Text content abstract generation method based on knowledge graph | |
CN116562275B (en) | Automatic text summarization method combined with entity attribute diagram | |
CN113515960A (en) | Automatic translation quality evaluation method fusing syntactic information | |
CN111931496A (en) | Text style conversion system and method based on recurrent neural network model | |
CN114239565B (en) | Emotion cause recognition method and system based on deep learning | |
CN115223549A (en) | Vietnamese speech recognition corpus construction method | |
CN116522165A (en) | Public opinion text matching system and method based on twin structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |