CN117251562A

CN117251562A - A text summary generation method based on fact consistency enhancement

Info

Publication number: CN117251562A
Application number: CN202311278088.9A
Authority: CN
Inventors: 卢国明; 秦科; 赵太银; 罗光春; 任振华; 刘婧宜
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2023-12-19

Abstract

The present invention relates to the technical field of natural language processing. It discloses a text summary generation method based on enhanced fact consistency, which solves the problem that the existing technology ignores the different contributions of different importance levels between fact triplets to the final summary result, and improves the generation of text summaries. The credibility of the text summary. The present invention uses the Transformer architecture to build a sequence-to-sequence text summary generation model, and introduces a fact attention module between the feedforward network module and the cross-attention module of the decoder for attention vector sum based on each fact triplet. The cross attention module outputs the word vector of the generated word to calculate the influence of each fact triplet on the generated word, and uses this to update the word vector of the generated word; while the attention vector of the fact triplet is calculated in the fact triplet Based on the encoding vector, it is calculated through the self-attention mechanism.

Description

A text summary generation method based on factual consistency enhancement

技术领域Technical field

本发明涉及自然语言处理技术领域，具体涉及一种基于事实一致性增强的文本摘要生成方法。The invention relates to the technical field of natural language processing, and in particular to a text summary generation method based on enhanced fact consistency.

背景技术Background technique

随着移动互联网的广泛应用和智能终端设备的快速普及，人们可以在网络上创作内容并发布，也可以方便地获取网络公开信息，在使得人们的日常生活与工作更加便利的同时，也极大的丰富了人们的精神生活。然而，网络文本信息往往篇幅较长，理解难度较大，与短视频信息相比具有阅读时间长，重点不突出等特点，这在当今快节奏生活中难以提起读者的兴趣，也在一定程度上降低了读者的阅读体验。With the widespread application of mobile Internet and the rapid popularization of smart terminal devices, people can create and publish content on the Internet, and can also easily obtain public information on the Internet. While making people's daily life and work more convenient, it also greatly It enriches people's spiritual life. However, online text information is often longer and more difficult to understand. Compared with short video information, it takes longer to read and has less focused focus. This makes it difficult to interest readers in today's fast-paced life. To a certain extent, It reduces the reader’s reading experience.

生成式文本摘要模型旨在将文本或文本集合转换为包含关键信息的简短摘要，该技术的出现解决了信息过载的问题。生成式摘要模型通常基于编解码框架实现序列到序列任务，利用基于Transformer的预训练模型与网络架构的强大映射能力生成简短且流畅性高的摘要。然而生成式摘要模型基于概率输出单词的方式仍然可能导致生成与原文表达意思不一致的单词或词组，导致摘要结果出现事实性错误，影响摘要结果的可信度，从而限制生成式摘要模型的推广与落地，进一步降低生成式摘要模型的研究与应用价值。The generative text summarization model aims to convert text or text collections into short summaries containing key information. The emergence of this technology solves the problem of information overload. Generative summary models are usually based on the encoding and decoding framework to implement sequence-to-sequence tasks, and use the powerful mapping capabilities of the Transformer-based pre-training model and the network architecture to generate short and highly fluent summaries. However, the way the generative summary model outputs words based on probability may still lead to the generation of words or phrases that are inconsistent with the meaning expressed in the original text, leading to factual errors in the summary results, affecting the credibility of the summary results, thus limiting the promotion and use of the generative summary model. implementation, further reducing the research and application value of the generative summary model.

因此，如何解决生成式摘要模型容易出现的事实性错误问题，有效增强摘要结果的事实一致性，是目前该领域内关注的热点内容之一。Therefore, how to solve the problem of factual errors that are prone to occur in generative summary models and effectively enhance the factual consistency of summary results is one of the hot topics in this field.

事实一致性增强旨在减少生成式摘要结果中出现事实性错误的概率。对自动文本摘要技术的事实一致性研究，最早开始于2018年Ziqiang Cao等人，他们发现主流的生成式摘要模型生成的摘要结果中存在大约30％的错误或无法印证的事实。随后Ziqiang Cao等人提出了FTSum模型，它利用一个新增的编码器将事实三元组编码，与原文档的句编码进行拼接，从而使模型能够注意到事实三元组的影响，但其忽略了事实三元组之间不同重要程度对最终摘要结果贡献不同的问题。Factual consistency enhancement aims to reduce the probability of factual errors in generative summary results. Research on factual consistency of automatic text summarization technology first started in 2018. Ziqiang Cao et al. found that there were about 30% errors or unverifiable facts in the summary results generated by mainstream generative summary models. Later, Ziqiang Cao et al. proposed the FTSum model, which uses a new encoder to encode fact triples and splice them with the sentence encoding of the original document, so that the model can notice the influence of fact triples, but it ignores them. It solves the problem that different importance levels among fact triples contribute differently to the final summary result.

Gunel B等人提出将wiki百科数据构建知识图，并将其中的实体级知识引入到编解码框架中，指导模型生成正确事实，但其忽略了传统的编解码框架在解码过程中容易输出与原文事实不相符的词汇，进而产生事实性错误的问题。Gunel B and others proposed to construct a knowledge graph from wiki encyclopedia data and introduce the entity-level knowledge into the encoding and decoding framework to guide the model to generate correct facts. However, they ignored that the traditional encoding and decoding framework easily outputs information different from the original text during the decoding process. Vocabulary that does not match the facts will lead to factual errors.

发明内容Contents of the invention

本发明所要解决的技术问题是：提出一种基于事实一致性增强的文本摘要生成方法，解决现有技术忽略事实三元组之间不同重要程度对最终摘要结果贡献不同的问题，提高生成的文本摘要的可信度。The technical problem to be solved by the present invention is to propose a text summary generation method based on enhanced fact consistency, solve the problem that the existing technology ignores the different contributions of different importance levels between fact triplets to the final summary result, and improve the generated text The credibility of the abstract.

本发明解决上述技术问题采用的技术方案是：The technical solution adopted by the present invention to solve the above technical problems is:

一种基于事实一致性增强的文本摘要生成方法，采用Transformer架构构建序列到序列的文本摘要生成模型，所述Transformer架构的解码器包括依次连接的自注意力模块、交叉注意力模块和前馈网络模块，在所述解码器的前馈网络模块和交叉注意力模块之间，引入事实注意力模块；A text summary generation method based on fact consistency enhancement, using the Transformer architecture to build a sequence-to-sequence text summary generation model. The decoder of the Transformer architecture includes a self-attention module, a cross-attention module and a feed-forward network connected in sequence. module, introducing a fact attention module between the feedforward network module and the cross-attention module of the decoder;

定义事实三元组，并表示为F_i＝<s_i,r_i,o_i>，其中，s为主体单词，r为关系修饰词，o为客体单词，下标i为事实三元组的序号，且任一事实三元组中各单词均来自于原始文本中的同一个句子；所述事实注意力模块，以原始文本中各事实三元组的注意力向量，以及所述解码器交叉注意力模块输出的第三词向量序列作为输入，基于交叉注意力机制，获得所有事实三元组对各生成词的影响系数，并基于所有事实三元组对各生成词的影响系数对第三词向量序列进行更新，获得第四词向量序列，作为所述解码器前馈网络模块的输入；Define the fact triplet and express it as F _i =<s _i ,ri _, o _i >, where s is the subject word, r is the relational modifier, o is the object word, and the subscript i is the fact triplet. serial number, and each word in any fact triplet comes from the same sentence in the original text; the fact attention module uses the attention vector of each fact triplet in the original text, and the decoder cross The third word vector sequence output by the attention module is used as input. Based on the cross-attention mechanism, the influence coefficients of all fact triples on each generated word are obtained, and the influence coefficients of all fact triples on each generated word are calculated for the third word vector sequence. The word vector sequence is updated to obtain a fourth word vector sequence, which is used as the input of the decoder feedforward network module;

所述事实三元组的注意力向量，按如下步骤进行计算：The attention vector of the fact triplet is calculated as follows:

A1、利用自然语言处理工具，对原始文本进行处理，提取事实三元组F_i，构建事实三元组集合F；然后，以各事实三元组F_i所包含单词构建节点、联系构建边，将事实三元组集合F映射为图，并利用图神经网络，计算获得图中各节点的节点向量；之后，将各事实三元组所包含节点的节点向量进行拼接，获得各事实三元组的编码特征其中，和/>分别表示事实三元组F_i所包含s_i、r_i和o_i对应节点的节点向量；A1. Use natural language processing tools to process the original text, extract fact triples F _i , and construct a set of fact triples F; then, use the words contained in each fact triplet F _i to construct nodes and connections to construct edges. Map the set of fact triples F into a graph, and use the graph neural network to calculate the node vectors of each node in the graph; then, splice the node vectors of the nodes contained in each fact triplet to obtain each fact triplet. encoding features in, and/> Respectively represent the node vectors of the nodes corresponding to si _, r _i and o _i contained in the fact triplet F _i ;

A2、将事实三元组的编码特征，输入循环神经网络，获得各事实三元组的融合其前后事实三元组语义信息的向量表示z_i；A2. Input the coding features of the fact triples into the recurrent neural network, and obtain the vector representation z _i of the semantic information of the fused fact triples before and after each fact triple;

A3、利用各事实三元组的向量表示z_i，基于自注意力机制，获得各事实三元组的注意力向量。A3. Use the vector representation z _i of each fact triplet and obtain the attention vector of each fact triplet based on the self-attention mechanism.

进一步的，步骤A2中，所述循环神经网络为Bi-LSTM网络；基于事实三元组的编码特征，利用Bi-LSTM网络，分别获得各事实三元组的前向隐层向量与后向隐层向量/>然后，将前向隐层向量/>与后向隐层向量/>进行向量拼接，获得各事实三元组的融合其前后事实三元组语义信息的向量表示/> Further, in step A2, the recurrent neural network is a Bi-LSTM network; based on the encoding characteristics of the fact triplet, the Bi-LSTM network is used to obtain the forward hidden layer vector of each fact triplet. and backward hidden layer vector/> Then, convert the forward hidden layer vector/> and backward hidden layer vector/> Perform vector splicing to obtain the vector representation of the semantic information of each fact triplet before and after the fusion of each fact triplet/>

进一步的，步骤A1中，利用图神经网络，计算获得图中各节点的节点向量，包括：Further, in step A1, the graph neural network is used to calculate the node vectors of each node in the graph, including:

首先，利用预训练模型，初始化图中各节点的节点向量；First, use the pre-trained model to initialize the node vectors of each node in the graph;

然后，按如下公式，利用GCN网络，对图中各节点的节点向量进行更新：Then, use the GCN network to update the node vectors of each node in the graph according to the following formula:

其中，ReLU表示激活函数，H^l和H^l+1分别表示GCN网络第l和第l+1层的输出，A表示图的邻接矩阵，表示度矩阵，W^l表示GCN网络第l层的权重矩阵。Among them, ReLU represents the activation function, H ^l and H ^l+1 represent the outputs of the l and l+1 layers of the GCN network respectively, and A represents the adjacency matrix of the graph. represents the degree matrix, and W ^l represents the weight matrix of the lth layer of the GCN network.

进一步的，步骤A1中，还包括：Further, step A1 also includes:

首先，利用自然语言处理工具的决策树模型，基于其所包含的关系类型，对各事实三元组F_i之间的关系进行分类；然后，基于关系分类结果，获得事实三元组F_i之间的关系三元组，构建关系三元组集合R；First, use the decision tree model of the natural language processing tool to classify the relationship between each fact triplet F _i based on the relationship type it contains; then, based on the relationship classification results, obtain the relationship between the fact triplet F _i The relationship triplet between them constructs the relationship triplet set R;

然后，将各关系三元组所包含事实三元组的编码特征和关系向量进行拼接，获得关系三元组的编码特征R_j＝[h_1j,cr_j,h_2j]，h_1j和h_2j分别表示第j个关系三元组所包含事实三元组的编码特征，cr_j表示第j个关系三元组所包含关系基于预训练模型所获得的关系向量；Then, the coding features and relationship vectors of the fact triples contained in each relationship triplet are spliced to obtain the coding features of the relationship triplet R _j = [h _1j , cr _j , h _2j ], h _1j and h _2j Respectively represent the encoding characteristics of the fact triples contained in the j-th relationship triplet, cr _j represents the relationship vector obtained by the relationship contained in the j-th relationship triplet based on the pre-training model;

之后，利用各关系三元组的编码特征和各事实三元组的编码特征，进行交叉注意力计算，基于计算结果对各事实三元组的编码特征进行更新，将更新后的事实三元组的编码特征h′_i作为步骤A2的输入。After that, the coding features of each relationship triplet and the coding features of each fact triplet are used to perform cross-attention calculations. Based on the calculation results, the coding features of each fact triplet are updated, and the updated fact triplet is The encoding feature h′ _i is used as the input of step A2.

进一步的，利用各关系三元组的编码特征和各事实三元组的编码特征，进行交叉注意力计算，基于计算结果对各事实三元组的编码特征进行更新，具体包括：Further, the coding features of each relationship triplet and the coding features of each fact triplet are used to perform cross-attention calculations, and the coding features of each fact triplet are updated based on the calculation results, including:

α_ij＝h_i*R_j α _ij =h _i *R _j

其中，α_ij表述第i个事实三元组和第j个关系三元组的相关性，β_ij表述第i个事实三元组和第j个关系三元组的相关性权重，R表示关系三元组集合。Among them, α _ij expresses the correlation between the i-th fact triplet and the j-th relationship triplet, β _ij expresses the correlation weight between the i-th fact triplet and the j-th relationship triplet, and R represents the relationship. Collection of triples.

进一步的，步骤A3中，利用各事实三元组的向量表示z_i，基于自注意力机制，获得各事实三元组的注意力向量；所述自注意力机制为多头自注意力，其计算过程包括：Further, in step A3, the vector representation z _i of each fact triple is used to obtain the attention vector of each fact triple based on the self-attention mechanism; the self-attention mechanism is multi-head self-attention, which calculates The process includes:

A31、基于各注意力头的转换矩阵W_k1、W_k2和W_k3，将事实三元组的向量表示z_i，转换为Query值Key值/>和Value值/> A31. Based on the transformation matrices W _k1 , W _k2 and W _k3 of each attention head, convert the vector representation z _i of the fact triplet into a Query value Key value/> and Value/>

A32、基于和/>进行各注意力头的注意力计算：A32, based on and/> Calculate the attention of each attention head:

其中，d_k为的维度；Among them, _dk is dimensions;

A33、聚合各注意力头的注意力计算结果，得到注意力向量Z_i：A33. Aggregate the attention calculation results of each attention head to obtain the attention vector Z _i :

其中，K为注意力头的数量；Among them, K is the number of attention heads;

A34、利用线性变换，将注意力向量Z_i的维度变更为向量表示z_i的维度，获得该事实三元组的注意力向量r_i；A34. Use linear transformation to change the dimension of the attention vector Z _i to the dimension of the vector representation z _i , and obtain the attention vector r _i of the fact triplet;

r_i＝Z_i*W⁰ r _i =Z _i *W ⁰

其中，W⁰是线性变换矩阵。Among them, W ⁰ is the linear transformation matrix.

进一步的，在每一个时间步，所述解码器的解码过程为：Further, at each time step, the decoding process of the decoder is:

B1、基于当前时间步t之前所获得的各生成词，利用预训练模型，获得其词嵌入，并将各生成词的词嵌入，按生成顺序构成已生成词的第一词向量序列；B1. Based on each generated word obtained before the current time step t, use the pre-training model to obtain its word embedding, and use the word embedding of each generated word to form the first word vector sequence of the generated words in the order of generation;

B2、将第一词向量序列，输入解码器的自注意力模块；首先，基于自注意力机制，对第一词向量序列进行更新；然后，将输入的第一词向量序列和更新后的第一词向量序列进行残差连接，并归一化，获得已生成词的第二词向量序列；B2. Input the first word vector sequence into the self-attention module of the decoder; first, update the first word vector sequence based on the self-attention mechanism; then, combine the input first word vector sequence with the updated first word vector sequence. The word vector sequence is residually connected and normalized to obtain the second word vector sequence of the generated word;

B3、将第二词向量序列，输入解码器的交叉注意力模块；首先，以Transformer架构的编码器基于原始文本生成的上下文向量构建Key值和Value值，输出原始文本中各单词的注意力分布a^t，以第二词向量序列构建Query值，结合注意力分布a^t，对第二词向量序列进行更新；然后，将输入的第二词向量序列和更新后的第二词向量序列进行残差连接，并归一化，获得已生成词的第三词向量序列；B3. Input the second word vector sequence into the cross-attention module of the decoder; first, the encoder with Transformer architecture constructs Key and Value values based on the context vector generated by the original text, and outputs the attention distribution of each word in the original text. a ^t , construct the Query value based on the second word vector sequence, and update the second word vector sequence combined with the attention distribution a ^t ; then, perform residual processing on the input second word vector sequence and the updated second word vector sequence. Difference connection and normalization to obtain the third word vector sequence of the generated words;

B4、将第三词向量序列，输入解码器的事实注意力模块；首先，基于各事实三元组的注意力向量和第三词向量序列，利用交叉注意力机制，获得所有事实三元组对各生成词的影响系数；然后，将输入的第三词向量序列和所有事实三元组对各生成词的影响系数，按词对应进行残差连接，并归一化，获得已生成词的第四词向量序列；B4. Input the third word vector sequence into the fact attention module of the decoder; first, based on the attention vector and the third word vector sequence of each fact triple, use the cross attention mechanism to obtain all fact triple pairs The influence coefficient of each generated word; then, the input third word vector sequence and the influence coefficient of all fact triples on each generated word are connected by residuals according to the word correspondence, and normalized to obtain the generated word's third Four word vector sequence;

B5、将第四词向量序列，输入解码器的前馈网络模块；首先，利用前馈网络，按公式p_t＝max(0,ln_tW₁+b₁)W₂+b₂进行计算；然后，将前馈网络层的输出p_t与第四词向量序列ln_t进行残差连接，并归一化，获得解码器的输出其中，W₁、W₂、b₁和b₂均为前馈网络的可学习参数。B5. Input the fourth word vector sequence into the feedforward network module of the decoder; first, use the feedforward network to calculate according to the formula p _t =max(0,ln _t W ₁ +b ₁ )W ₂ +b ₂ ; Then, the output p _t of the feedforward network layer is residually connected to the fourth word vector sequence ln _t and normalized to obtain the output of the decoder Among them, W ₁ , W ₂ , b ₁ and b ₂ are all learnable parameters of the feedforward network.

进一步的，基于各事实三元组的注意力向量和第三词向量序列，利用交叉注意力机制，获得所有事实三元组对各生成词的影响系数，具体包括：Furthermore, based on the attention vector and third word vector sequence of each fact triplet, the cross-attention mechanism is used to obtain the influence coefficient of all fact triplets on each generated word, specifically including:

其中,表示第三词向量序列中第m个生成词的词向量；α_im表示第m个生成词和第i个事实三元组的相关性，β_im表示第m个生成词和第i个事实三元组的相关性权重，u_m表示所有事实三元组对第m个生成词的影响系数，F表示事实三元组集合。in, represents the word vector of the m-th generated word in the third word vector sequence; α _im represents the correlation between the m-th generated word and the i-th fact triple, β _im represents the m-th generated word and the i-th fact triple The correlation weight of the tuple, u _m represents the influence coefficient of all fact triples on the m-th generated word, and F represents the set of fact triples.

进一步的，所述文本摘要生成模型还包括指针网络；Further, the text summary generation model also includes a pointer network;

在每一个时间步，所述指针网络的处理过程包括：At each time step, the processing process of the pointer network includes:

首先，按如下公式，利用线性层，将解码器在当前时间步t的输出映射到单词表的特征空间：First, according to the following formula, use the linear layer to convert the output of the decoder at the current time step t Feature space mapped to vocabulary:

其中，W_vocab和b_vocab表示与单词表对应的可学习参数矩阵，所述单词表为预训练模型的单词表；Among them, W _vocab and b _vocab represent the learnable parameter matrix corresponding to the word list, and the word list is the word list of the pre-trained model;

然后，基于l_t，计算当前时间步t的词汇分布和指针概率/> Then, based on l _t , calculate the vocabulary distribution of the current time step t and pointer probability/>

其中，w_gen和b_gen表示可学习参数；表示当前时间步的生成词从单词表中生成的概率，/>表示单词表中各单词作为当前时间步生成词的输出概率分布；Among them, w _gen and b _gen represent learnable parameters; Represents the probability that the generated word at the current time step is generated from the word list,/> Represents the output probability distribution of each word in the word list as a generated word at the current time step;

之后，基于和/>计算当前时间步t的最终概率分布P_t(w)：Afterwards, based on and/> Calculate the final probability distribution P _t (w) for the current time step t:

其中，P_t(w)表示扩展词汇表中单词w作为当前时间步t的生成词的概率，所述扩展词汇表包括单词表和原始文本所包含的所有单词，N表示原始文本的单词数量，n表示原始文本中单词的序号，为交叉注意力模块生成的原始文本各单词向量的注意力分布a^t中第n个单词的注意力，/>表示原始文本中单词为w的注意力和。Among them, P _t (w) represents the probability that word w in the expanded vocabulary is a generated word at the current time step t. The expanded vocabulary includes all words contained in the word list and the original text, and N represents the number of words in the original text. n represents the sequence number of the word in the original text, The attention distribution of each word vector of the original text generated by the cross attention module a The attention of the nth word in ^t ,/> Represents the attention sum of word w in the original text.

进一步的，所述文本摘要生成模型还包括集束搜索算法，在每一个时间步，所述集束搜索算法的处理过程包括：Further, the text summary generation model also includes a beam search algorithm. At each time step, the processing process of the beam search algorithm includes:

首先，利用当前时间步t指针网络输出的最终概率分布P_t(w)，筛选概率最大的P个单词构建候选词；First, use the final probability distribution P _t (w) output by the pointer network at the current time step t to screen the P words with the highest probability to construct candidate words;

然后，进行判定，若候选词中包含原始文本所包含的单词，则选择原始文本所包含的且输出概率最大的单词，作为当前时间步的生成词；否则，利用P个候选词和当前时间步之前已生成的生成词，构成P个候选摘要，并按如下公式计算各候选摘要的组合概率，选择组合概率最大的一条候选摘要所对应的候选词，作为当前时间步的生成词：Then, make a judgment. If the candidate words contain words contained in the original text, select the word contained in the original text with the highest output probability as the generated word at the current time step; otherwise, use P candidate words and the current time step The generated words that have been generated before constitute P candidate summaries, and the combination probability of each candidate summary is calculated according to the following formula, and the candidate word corresponding to the candidate summary with the highest combination probability is selected as the generated word at the current time step:

P_Y＝argmax(log(Π_tp(y_t|y₁,y₂,…,y_t-1)))P _Y =argmax(log(Π _t p(y _t |y ₁ ,y ₂ ,…,y _t-1 )))

其中，Y表示候选摘要，p表示对应单词的输出概率。Among them, Y represents the candidate summary, and p represents the output probability of the corresponding word.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明采用Transformer架构构建序列到序列的文本摘要生成模型，并对所述Transformer架构的解码器进行了改进，在解码器的前馈网络模块和交叉注意力模块之间引入事实注意力模块，该模块基于各事实三元组的注意力向量和解码器中的交叉注意力模块输出的生成词的词向量来计算各事实三元组对生成词的影响，并以此更新生成词的词向量；而事实三元组的注意力向量的计算，在事实三元组编码向量的基础上，通过自注意力机制计算获得。因此，能够避免忽略事实三元组之间不同重要程度对最终摘要结果贡献不同的问题，能够提高文本摘要生成的事实一致性。The present invention uses the Transformer architecture to build a sequence-to-sequence text summary generation model, and improves the decoder of the Transformer architecture, introducing a fact attention module between the feedforward network module and the cross-attention module of the decoder. The module calculates the influence of each fact triplet on the generated word based on the attention vector of each fact triplet and the word vector of the generated word output by the cross-attention module in the decoder, and updates the word vector of the generated word based on this; The calculation of the attention vector of the fact triplet is calculated through the self-attention mechanism based on the encoding vector of the fact triplet. Therefore, it is possible to avoid ignoring the problem of different importance levels among fact triplets contributing to the final summary result, and to improve the factual consistency of text summary generation.

进一步的，考虑到长文本情况下，事实三元组之间也存在复杂关系，这种复杂关系同样可能导致不同事实三元组对最终结果贡献不同。因此，在事实三元组编码向量的计算过程中，还融入了不同事实三元组之间复杂关系的影响。Furthermore, considering the long text situation, there are also complex relationships between fact triples, and this complex relationship may also lead to different fact triples contributing differently to the final result. Therefore, the influence of complex relationships between different fact triples is also incorporated into the calculation process of the fact triple encoding vector.

进一步的，本发明还引入了指针网络，通过指针网络将当前时间步解码器的输出向量映射到单词表的特征空间，获得映射向量表示，根据映射向量表示计算当前时间步的生成词从单词表中生成的概率以及单词表中各单词作为当前时间步生成词的输出概率分布，进而计算单词表和原文单词构成的扩展单词表中各单词最终作为当前时间步生成词的概率；避免了传统解码过程为了获得当前生成词的分布概率，直接将前文解码信息映射到单词表上，可能在生成摘要中引入不属于原文但属于单词表的单词而产生的事实性错误。即，本发明通过指针网络的引入，来尽量避免生成不属于原文但属于单词表中的单词，以此减少出现事实性错误的概率，进而提高文本摘要生成的事实一致性。Furthermore, the present invention also introduces a pointer network, which maps the output vector of the current time step decoder to the feature space of the word list through the pointer network, obtains the mapping vector representation, and calculates the generated words of the current time step from the word list based on the mapping vector representation. The probability generated in the word list and the output probability distribution of each word in the word list as a generated word at the current time step are then calculated, and then the probability of each word in the expanded word list composed of the word list and the original words is finally used as a generated word at the current time step; traditional decoding is avoided. In order to obtain the distribution probability of the current generated word, the process directly maps the previous decoding information to the word list, which may introduce factual errors in the generated summary due to words that do not belong to the original text but belong to the word list. That is, the present invention uses the introduction of pointer networks to try to avoid generating words that do not belong to the original text but belong to the word list, thereby reducing the probability of factual errors and thereby improving the factual consistency of text summary generation.

进一步的，本发明还引入了集束搜索算法，采用集束搜索算法利用当前时间步指针网络输出的扩展单词表中各单词最终作为当前时间步生成词的概率，筛选概率最大的多个单词构建候选词，若候选词包含原始文本所包含的单词，则选择原始文本所包含的且输出概率最大的单词，作为当前时间步的生成词；否则，利用多个候选词和当前时间步之前已生成的生成词，构成多个候选摘要，并计算各候选摘要的组合概率，选择组合概率最大的一条候选摘要所对应的候选词，作为当前时间步的生成词。即，本发明通过集束搜索算法，在计算出来的候选词中不包含原文中的单词时，利用候选词和当前时间步此前的已生成词来构建候选摘要，并计算各候选摘要的组合概率，从而在词组层面上寻找相对意义上的近似全局最优解，以此提高当前时间步的生成词与已生成词语义组合的合理性，进而提高文本摘要生成的事实一致性。Furthermore, the present invention also introduces a beam search algorithm, which uses the beam search algorithm to use the probability of each word in the expanded word list output by the current time step pointer network to eventually become a generated word at the current time step, and selects multiple words with the highest probability to construct candidate words. , if the candidate word contains a word contained in the original text, then select the word contained in the original text and with the highest output probability as the generated word at the current time step; otherwise, use multiple candidate words and the generated words generated before the current time step. words to form multiple candidate summaries, and calculate the combination probability of each candidate summary, and select the candidate word corresponding to the candidate summary with the highest combination probability as the generated word at the current time step. That is, through the beam search algorithm, when the calculated candidate words do not contain words in the original text, the present invention uses the candidate words and the generated words before the current time step to construct candidate summaries, and calculates the combination probability of each candidate summary, Thus, the approximate global optimal solution in the relative sense is found at the phrase level, thereby improving the rationality of the semantic combination of generated words and generated words at the current time step, and thereby improving the factual consistency of text summary generation.

附图说明Description of drawings

图1为本发明实施例中的基于事实一致性增强的文本摘要生成的原理框架图。Figure 1 is a principle framework diagram of text summary generation based on fact consistency enhancement in an embodiment of the present invention.

具体实施方式Detailed ways

本发明旨在提出一种基于事实一致性增强的文本摘要生成方法，解决现有技术忽略事实三元组之间不同重要程度对最终摘要结果贡献不同的问题，提高生成的文本摘要的可信度。The present invention aims to propose a text summary generation method based on enhanced fact consistency, solve the problem that the existing technology ignores the different importance between fact triples and contribute different contributions to the final summary result, and improve the credibility of the generated text summary. .

本发明的文本摘要生成方法，采用Transformer架构构建序列到序列的文本摘要生成模型。Transformer架构为Encoder-Decoder框架，在编码部分，首先，对输入的原始文本进行分词，获得原始文本的词序列；然后，通过预训练模型，获得词序列中各单词的词向量，构成原始文本的词向量序列；最后，将词向量序列输入Transformer架构的编码器，经编码器编码获得原始文本的上下文向量。在解码部分，基于编码获得的上下文向量，逐步生成摘要文本，也即按照时间步逐个获得生成词，每一个时间步解码生成一个单词。The text summary generation method of the present invention adopts the Transformer architecture to build a sequence-to-sequence text summary generation model. The Transformer architecture is an Encoder-Decoder framework. In the encoding part, first, the input original text is segmented to obtain the word sequence of the original text; then, through the pre-training model, the word vector of each word in the word sequence is obtained to form the original text. The word vector sequence; finally, the word vector sequence is input into the encoder of the Transformer architecture, and the context vector of the original text is obtained after encoding by the encoder. In the decoding part, based on the context vector obtained by encoding, the summary text is gradually generated, that is, the generated words are obtained one by one according to time steps, and one word is generated by decoding at each time step.

而本发明的关键在于对Transformer架构的解码器进行了改进，在现有解码器的前馈网络模块和交叉注意力模块之间，引入事实注意力模块，也即，本发明中的解码器，包括依次连接的自注意力模块、交叉注意力模块、事实注意力模块和前馈网络模块。事实注意力模块，基于各事实三元组的注意力向量和交叉注意力模块输出的生成词的词向量来计算各事实三元组对生成词的影响，并以此更新生成词的词向量。The key to the present invention is to improve the decoder of the Transformer architecture. Between the feedforward network module and the cross-attention module of the existing decoder, a fact attention module is introduced, that is, the decoder in the present invention, It includes self-attention module, cross-attention module, factual attention module and feed-forward network module connected in sequence. The fact attention module calculates the influence of each fact triplet on the generated word based on the attention vector of each fact triplet and the word vector of the generated word output by the cross attention module, and updates the word vector of the generated word based on this.

具体的讲，本发明的方法，在每一个时间步，所述解码器的解码过程为：Specifically, in the method of the present invention, at each time step, the decoding process of the decoder is:

首先，将当前时间步此前生成的全部单词通过预训练模型获得已生成词的第一词向量序列；接着，利用自注意力模块对第一词向量序列进行更新，并与输入的第一词向量序列进行残差连接并归一化处理，获得已生成词的第二词向量序列；然后，基于交叉注意力模块，利用编码部分获得的原文上下文向量对第二词向量序列进行更新，并与第二词向量序列进行残差连接并归一化处理，获得已生成词的第三词向量序列；接着，基于各事实三元组的注意力向量和第三词向量序列，利用事实注意力模块，采用交叉注意力机制，获得所有事实三元组对各已生成词的影响系数，并与第三词向量序列按词对应进行残差连接并归一化处理，获得已生成词的第四词向量序列，将第四词向量序列作为前馈网络的输入，获得当前时间步的解码输出向量。First, all the words generated before the current time step are used through the pre-training model to obtain the first word vector sequence of the generated words; then, the self-attention module is used to update the first word vector sequence and compare it with the input first word vector The sequence is residually connected and normalized to obtain the second word vector sequence of the generated word; then, based on the cross-attention module, the second word vector sequence is updated using the original text context vector obtained in the encoding part, and combined with the second word vector sequence The two word vector sequences are residually connected and normalized to obtain the third word vector sequence of the generated words; then, based on the attention vector and third word vector sequence of each fact triplet, the fact attention module is used to The cross-attention mechanism is used to obtain the influence coefficients of all fact triples on each generated word, and the residual connection and normalization are carried out corresponding to the third word vector sequence by word to obtain the fourth word vector of the generated word. Sequence, use the fourth word vector sequence as the input of the feedforward network to obtain the decoding output vector of the current time step.

上述解码过程，获得了当前时间步单词的分布概率，可直接将其映射到单词表上，获得生成词；但该方式可能在生成摘要中引入不属于原始文本但属于单词表的单词，一定程度上可能会由于分布概率计算精度不够而产生事实性错误。为有效缓解这一问题，本发明引入指针网络，通过指针网络，将当前时间步的解码输出向量映射到单词表的特征空间，获得映射向量表示，根据映射向量表示计算当前时间步的生成词从单词表中生成的概率和单词表中各单词作为当前时间步生成词的输出概率分布，进而计算单词表和原始文本所包含所有单词构成的扩展单词表中各单词最终作为当前时间步生成词的概率。The above decoding process obtains the distribution probability of the word at the current time step, which can be directly mapped to the word list to obtain the generated words; however, this method may introduce words that do not belong to the original text but belong to the word list to a certain extent in the generated summary. Factual errors may occur due to insufficient accuracy in calculating distribution probability. In order to effectively alleviate this problem, the present invention introduces a pointer network. Through the pointer network, the decoding output vector of the current time step is mapped to the feature space of the word table to obtain the mapping vector representation. According to the mapping vector representation, the generated word of the current time step is calculated from The probability generated in the word list and the output probability distribution of each word in the word list as a generated word at the current time step are then calculated, and then the probability distribution of each word in the expanded word list composed of all words contained in the word list and the original text is finally used as a generated word at the current time step. Probability.

更进一步的，本发明的方法，还引入集束搜索方法，通过当前生成词与已生成词组合概率的计算，在词组层面上寻找相对意义上的近似全局最优解，提高当前时间步的生成词与已生成词语义组合的合理性，并尽力避免生成不属于原始文本但属于单词表的单词，从而进一步提高文本摘要生成的事实一致性和可信度。集束搜索算法，利用当前时间步指针网络输出的扩展单词表中各单词最终作为当前时间步生成词的概率，筛选概率最大的多个单词构建候选词，若候选词包含原始文本所包含的单词，则选择原始文本所包含的且输出概率最大的单词，作为当前时间步的生成词；否则，利用多个候选词和当前时间步之前已生成的生成词，构成多个候选摘要，并计算各候选摘要的组合概率，选择组合概率最大的一条候选摘要所对应的候选词，作为当前时间步的生成词。Furthermore, the method of the present invention also introduces the beam search method, through the calculation of the probability of the combination of the current generated word and the generated word, to find the approximate global optimal solution in the relative sense at the phrase level, and improve the generated words at the current time step. The rationality of the semantic combination with the generated words, and try to avoid generating words that do not belong to the original text but belong to the word list, thereby further improving the factual consistency and credibility of text summary generation. The beam search algorithm uses the probability that each word in the expanded word list output by the current time step pointer network will eventually become a generated word at the current time step, and selects multiple words with the highest probability to construct candidate words. If the candidate words include words contained in the original text, Then select the word contained in the original text and with the highest output probability as the generated word at the current time step; otherwise, use multiple candidate words and the generated words generated before the current time step to form multiple candidate summaries, and calculate each candidate For the combination probability of the summary, select the candidate word corresponding to the candidate summary with the highest combination probability as the generated word at the current time step.

以下结合实施例，对本发明的方法进行进一步的描述。The method of the present invention will be further described below with reference to examples.

实施例：Example:

本实施例中的基于事实一致性增强的文本摘要生成方法，其文本摘要生成模型，如图1所示，以Transformer架构的Encoder-Decoder框架为基础，相对于现有技术，其编码器Encoder的结构保持不变，而在解码器Decoder中，引入了事实注意力模块，改进后的解码器Decoder，包括自注意力模块、交叉注意力模块、事实注意力模块和前馈网络模块。The text summary generation method based on enhanced fact consistency in this embodiment, its text summary generation model, as shown in Figure 1, is based on the Encoder-Decoder framework of the Transformer architecture. Compared with the existing technology, its encoder Encoder The structure remains unchanged, but in the decoder Decoder, a fact attention module is introduced. The improved decoder Decoder includes a self-attention module, a cross-attention module, a fact attention module and a feed-forward network module.

下面从事实三元组注意力向量生成、编解码过程、模型训练三个方面进行具体阐述。The following is a detailed explanation from three aspects: fact triple attention vector generation, encoding and decoding process, and model training.

一、事实三元组注意力向量生成1. Fact triple attention vector generation

事实三元组的注意力向量，是为了在后续解码过程中，引入事实三元组的权重，计算各事实三元组对生成词的不同贡献，包括以下过程：The attention vector of the fact triplet is to introduce the weight of the fact triplet in the subsequent decoding process and calculate the different contributions of each fact triplet to the generated word, including the following process:

A1、提取事实三元组A1. Extract fact triples

本步骤中，利用自然语言处理工具，对原始文本进行处理，从原始文本中提取出事实三元组F_i，构建事实三元组集合F。在本实施例中，自然语言处理工具为StanfordNLP工具包。In this step, the natural language processing tool is used to process the original text, extract the fact triplet F _i from the original text, and construct the fact triplet set F. In this embodiment, the natural language processing tool is the StanfordNLP toolkit.

具体而言，首先，将原始文本D输入StanfordNLP工具包，进行分词，获得原始文本的词序列{x₁,x₂,…,x_b,…,x_N}，其中，x_n表示文本中第n个单词，n表示原始文本中单词的序号，N表示原始文本的单词数量。Specifically, first, input the original text D into the StanfordNLP toolkit, perform word segmentation, and obtain the word sequence {x ₁ , x ₂ ,..., x _b ,..., x _N } of the original text, where x _n represents the word sequence in the text n words, n represents the sequence number of the words in the original text, and N represents the number of words in the original text.

然后，利用StanfordNLP工具包，通过词性标注、句法成分分析、共指消解、关系抽取等过程，输出从原始文本D中提取的事实三元组F_i，构建事实三元组集合F。集合F＝{F₁,F₂,…,F_i,…,F_I}，事实三元组F_i＝<s_i,r_i,o_i>，其中，s为主体单词，r为关系修饰词，o为客体单词，下标i为事实三元组的序号，I表示事实三元组的数量，且任一事实三元组F_i中的s_i、r_i和o_i均来自于同一个句子。比如，<科比，喜欢，可乐>就是一个事实三元组。Then, the StanfordNLP toolkit is used to output the fact triplet F _i extracted from the original text D through processes such as part-of-speech tagging, syntactic component analysis, coreference resolution, and relationship extraction, and construct a fact triplet set F. Set F = {F ₁ ,F ₂ ,…,Fi _, …,F _I }, fact triplet F _i =<s _i ,ri _, o _i >, where s is the subject word and r is the relation modification word, o is the object word, the subscript i is the sequence number of the fact triplet, I represents the number of fact triplet, and s _i , r _i and o _i in any fact triplet F _i all come from the same A sentence. For example, <Kobe, like, Coke> is a fact triple.

A2、提取关系三元组A2. Extract relationship triples

本步骤中，应用修辞结构理论(Rhetorical structure theory)，在事实三元组的基础上抽取事实三元组间的复杂关系。进一步的讲，利用自然语言处理工具的决策树模型，基于其所包含的关系类型，对各事实三元组F_i之间的关系进行分类；然后，基于关系分类结果，获得事实三元组F_i之间的关系三元组，构建关系三元组集合R。In this step, rhetorical structure theory is applied to extract complex relationships between fact triples based on fact triples. Furthermore, the decision tree model of the natural language processing tool is used to classify the relationship between each fact triplet F _i based on the relationship type it contains; then, based on the relationship classification result, the fact triplet F is obtained The relationship triplet between _i constructs the relationship triplet set R.

在本实施例中，自然语言处理工具为StanfordNLP工具包，由于修辞结构理论中的RST树模型已被内置到StanfordCoreNLP工具包，因此，直接调用即可，具体的讲，将原始文本D＝{x₁,x₂,…,x_n,…,x_N}以及事实三元组集合F＝{F₁,F₂,…,F_i,…,F_I}，输入StanfordCoreNLP工具包。利用StanfordCoreNLP工具包，基于其RST树模型所包含的关系类型，对各事实三元组F_i之间的关系进行分类，目前RST树模型中包含的关系有23种；然后，基于关系分类结果，获得事实三元组F_i之间的关系三元组R_j，构建关系三元组集合R。In this embodiment, the natural language processing tool is the StanfordNLP toolkit. Since the RST tree model in the rhetorical structure theory has been built into the StanfordCoreNLP toolkit, it can be called directly. Specifically, the original text D={x ₁ ,x ₂ ,…,x _n ,…,x _N } and the set of fact triples F = {F ₁ ,F ₂ ,…, _Fi ,…,F _I }, enter the StanfordCoreNLP toolkit. Use the StanfordCoreNLP toolkit to classify the relationships between each fact triplet F _i based on the relationship types contained in its RST tree model. Currently, there are 23 types of relationships included in the RST tree model; then, based on the relationship classification results, Obtain the relationship triplet R _j between the fact triplet F _i and construct the relationship triplet set R.

集合R＝{R₁,R₂…,R_j,…R_J}，关系三元组R_j＝<F_1j,CR_j,F_2j>，其中，F_1j和F_2j分别表示第j个关系三元组所包含的事实三元组，CR_j表示第j个关系三元组所包含关系，下标j为关系三元组的序号，J表示关系三元组的数量。例如：事实三元组<小明，出生于，四川>与<小明，掌握，四川话>是“背景”关系，<杰克，喜欢，可乐>与<杰克，患上，蛀牙>是“因果”关系。Set R = {R ₁ , R ₂ ..., R _j , ... R _J }, relationship triplet R _j = <F _1j , CR _j , F _2j >, where F _1j and F _2j respectively represent the jth relationship The fact triplet contained in the triplet, CR _j represents the relationship contained in the j-th relationship triplet, the subscript j is the sequence number of the relationship triplet, and J represents the number of relationship triplet. For example: the fact triplet <Xiao Ming, born in Sichuan> and <Xiao Ming, master, Sichuan dialect> are "background" relationships, and <Jack, like, Coke> and <Jack, suffer from, tooth decay> are "causality" relationships. .

A3、向量编码A3. Vector encoding

本步骤中，将事实三元组集合映射为图结构，并利用图卷积网络对图结构的节点进行编码，并融入复杂关系信息，获得各事实三元组的编码向量。In this step, the set of fact triples is mapped into a graph structure, and the graph convolution network is used to encode the nodes of the graph structure, and complex relationship information is integrated to obtain the encoding vector of each fact triple.

具体而言，包括以下处理过程：Specifically, the following processes are included:

A31、将事实三元组集合映射为图A31. Map the set of fact triples into a graph

为了利用图卷积神经网络进行编码，需要将非结构化的事实三元组集合F，转换为结构化的图。对于每一个事实三元组F_i＝<s_i,r_i,o_i>，创建主体节点s_i、关系节点r_i和客体节点o_i；主体节点与关系节点通过边连接，客体节点与关系节点之间同样通过边连接，且边不表示任何信息，只表示两者存在联系。也即，以各事实三元组F_i所包含单词构建节点、联系构建边，将事实三元组集合F映射为图。In order to use graph convolutional neural networks for encoding, the unstructured set of fact triples F needs to be converted into a structured graph. For each fact triple F _i =<s _i ,ri _, o _i >, create subject node s _i , relationship node _ri and object node o _i ; the subject node and relationship node are connected through edges, and the object node is connected with the relationship Nodes are also connected through edges, and edges do not represent any information, only that there is a connection between the two. That is, the words contained in each fact triple F _i are used to construct nodes and connections to construct edges, and the set of fact triples F is mapped into a graph.

A32、节点初始化A32, node initialization

本步骤使用预训练Bert模型，初始化图中各节点的节点向量。This step uses the pre-trained Bert model to initialize the node vectors of each node in the graph.

A33、节点更新A33, node update

本步骤利用GCN网络，对图中各节点的节点向量进行更新：This step uses the GCN network to update the node vectors of each node in the graph:

GCN网络，对于任意节点，它会收集自身周围的邻居节点信息，并利用这些信息对自身特性信息进行变换，这一过程会重复多次，具体计算过程如下所示：GCN network, for any node, it will collect the information of neighboring nodes around itself, and use this information to transform its own characteristic information. This process will be repeated many times. The specific calculation process is as follows:

经过GCN网络聚合计算后，每一个节点都包含了周围节点甚至两跳、三跳距离外节点的信息，有效捕捉了关联信息，并一定程度上包含了事实关系。在本实施例中，GCN网络为两层。After GCN network aggregation calculation, each node contains information about surrounding nodes and even nodes two or three hops away, effectively capturing associated information and including factual relationships to a certain extent. In this embodiment, the GCN network has two layers.

A34、构建事实三元组编码特征A34. Construct fact triple encoding features

本步骤中，将各事实三元组所包含节点的节点向量进行拼接，获得各事实三元组的编码特征其中，/>和/>分别表示事实三元组F_i所包含s_i、r_i和o_i对应节点的节点向量。In this step, the node vectors of the nodes contained in each fact triplet are spliced to obtain the coding characteristics of each fact triplet. Among them,/> and/> Respectively represent the node vectors of the nodes corresponding to s _i , r _i and o _i contained in the fact triplet F _i .

A35、更新事实三元组编码特征A35. Update fact triple encoding features

为有效融合事实三元组间复杂关系对不同事实三元组包含信息的影响，本步骤首先构建各关系三元组的编码特征，然后基于关系三元组的编码特征，采用注意力机制更新事实三元组的编码特征。In order to effectively integrate the impact of complex relationships between fact triples on the information contained in different fact triples, this step first constructs the coding features of each relationship triplet, and then uses the attention mechanism to update the facts based on the coding features of the relationship triplet. Encoding features of triples.

具体的，首先，将各关系三元组所包含事实三元组的编码特征和关系向量进行拼接，获得关系三元组的编码特征R_j＝[h_1j,cr_j,h_2j]，h_1j和h_2j分别表示第j个关系三元组所包含事实三元组的编码特征，cr_j表示第j个关系三元组所包含关系基于预训练模型所获得的关系向量。Specifically, first, the coding features and relationship vectors of the fact triples contained in each relationship triplet are spliced to obtain the coding features of the relationship triplet R _j = [h _1j , cr _j , h _2j ], h _1j and h _2j respectively represent the encoding features of the fact triples contained in the j-th relationship triplet, and cr _j represents the relationship vector obtained by the relationship contained in the j-th relationship triplet based on the pre-training model.

然后，利用各关系三元组的编码特征和各事实三元组的编码特征，进行交叉注意力计算，基于计算结果对各事实三元组的编码特征进行更新，获得更新后的事实三元组的编码特征h′_i，计算过程如下：Then, the coding features of each relationship triplet and the coding features of each fact triplet are used to perform cross-attention calculations, and the coding features of each fact triplet are updated based on the calculation results to obtain the updated fact triplet. The encoding feature h′ _i , the calculation process is as follows:

α_ij＝h_i*R_j α _ij =h _i *R _j

A4、融合前后语义A4. Fusion of front and back semantics

对于前文获得的事实三元组编码特征，虽然隐含了一定的事实三元组间关系结构信息，但仅包含局部的结构信息，长文本情况下对远距离事实关系捕捉能力不足，为了解决长距离依赖问题，本步骤中，将事实三元组的编码特征，输入循环神经网络，获得各事实三元组的融合其前后事实三元组语义信息的向量表示z_i。For the fact triple coding features obtained previously, although it implies certain relationship structure information between fact triples, it only contains local structural information. In the case of long texts, the ability to capture long-distance fact relationships is insufficient. In order to solve the problem of long text For the distance dependence problem, in this step, the encoding features of the fact triples are input into the recurrent neural network to obtain the vector representation z _i of the semantic information of the fact triples before and after the fusion of each fact triplet.

具体的，在本实施例中，将步骤A35更新后的事实三元组的编码特征h′_i，输入Bi-LSTM网络，来获得各事实三元组的融合其前后事实三元组语义信息的向量表示z_i。进一步的讲，利用Bi-LSTM网络中，基于前向LSTM，获得前向隐层向量基于后向LSTM，获得后向隐层向量/>然后，将前向隐层向量/>与后向隐层向量/>进行向量拼接，即可得到融合前后事实三元组语义信息的向量表示z_i，公式如下所示：Specifically, in this embodiment, the coding features h′ _i of the fact triples updated in step A35 are input into the Bi-LSTM network to obtain the semantic information of each fact triplet that is integrated with the previous and later fact triplets. The vector represents z _i . Furthermore, using the Bi-LSTM network, based on the forward LSTM, the forward hidden layer vector is obtained Based on backward LSTM, obtain the backward hidden layer vector/> Then, convert the forward hidden layer vector/> and backward hidden layer vector/> By performing vector splicing, the vector representation z _i of the semantic information of fact triples before and after fusion can be obtained. The formula is as follows:

A5、自注意力计算A5. Self-attention calculation

本步骤中，利用各事实三元组的向量表示z_i，基于自注意力机制，获得各事实三元组的注意力向量。为了提高对不同信息的捕捉能力，进而提高注意力计算的效果，所述自注意力机制为多头自注意力，具体如下：In this step, the vector representation z _i of each fact triple is used, and the attention vector of each fact triple is obtained based on the self-attention mechanism. In order to improve the ability to capture different information and thereby improve the effect of attention calculation, the self-attention mechanism is multi-head self-attention, specifically as follows:

A51、基于各注意力头的转换矩阵W_k1、W_k2和W_k3，将事实三元组的向量表示z_i，转换为Query值Key值/>和Value值/> A51. Based on the transformation matrices W _k1 , W _k2 and W _k3 of each attention head, convert the vector representation z _i of the fact triplet into a Query value Key value/> and Value/>

A52、基于和/>进行各注意力头的注意力计算：A52, based on and/> Calculate the attention of each attention head:

其中，d_k为的维度；Among them, _dk is dimensions;

A53、聚合各注意力头的注意力计算结果，得到注意力向量Z_i：A53. Aggregate the attention calculation results of each attention head to obtain the attention vector Z _i :

A54、利用线性变换，将注意力向量Z_i的维度变更为向量表示z_i的维度，获得该事实三元组的注意力向量r_i；A54. Use linear transformation to change the dimension of the attention vector Z _i to the dimension of the vector representation z _i , and obtain the attention vector r _i of the fact triplet;

r_i＝Z_i*W⁰ r _i =Z _i *W ⁰

二、编码过程2. Coding process

对于编码过程，首先，对原始文本进行分词，获得原始文本的词序列；然后，通过预训练Bert模型，提取原始文本的词序列中各单词的词向量，获得原始文本的词向量序列；之后，对原始文本的词向量序列，经编码器编码获得原始文本的上下文向量。由于编码过程同现有技术Transformer架构的编码过程相一致，在此不再赘述。For the encoding process, first, segment the original text to obtain the word sequence of the original text; then, by pre-training the Bert model, extract the word vectors of each word in the word sequence of the original text, and obtain the word vector sequence of the original text; after that, For the word vector sequence of the original text, the context vector of the original text is obtained through encoding by the encoder. Since the encoding process is consistent with the encoding process of the Transformer architecture in the prior art, it will not be described again here.

三、解码过程3. Decoding process

解码过程，按照时间步逐个获得生成词，即每一个时间步解码生成一个单词。In the decoding process, the generated words are obtained one by one according to time steps, that is, one word is generated by decoding at each time step.

每一时间步的解码，包括以下步骤：The decoding of each time step includes the following steps:

B1、获取当前时间步之前已生成词的第一词向量序列：B1. Get the first word vector sequence of words generated before the current time step:

本步骤中，基于当前时间步t之前所获得的各生成词，利用预训练Bert模型，获得其词嵌入，并将各生成词的词嵌入，按生成顺序构成已生成词的第一词向量序列。In this step, based on each generated word obtained before the current time step t, the pre-trained Bert model is used to obtain its word embedding, and the word embedding of each generated word is used to form the first word vector sequence of the generated words in the order of generation. .

B2、对第一词向量序列进行自注意力计算，获得已生成词的第二词向量序列：B2. Perform self-attention calculation on the first word vector sequence to obtain the second word vector sequence of the generated words:

本步骤中，将第一词向量序列，输入解码器的自注意力模块；首先，基于自注意力机制，对第一词向量序列进行更新；然后，为缓解深层网络训练梯度消失问题并加快网络收敛，将输入的第一词向量序列和更新后的第一词向量序列进行残差连接，并归一化，获得已生成词的第二词向量序列。In this step, the first word vector sequence is input into the self-attention module of the decoder; first, the first word vector sequence is updated based on the self-attention mechanism; then, in order to alleviate the vanishing gradient problem of deep network training and speed up the network To converge, the input first word vector sequence and the updated first word vector sequence are residually connected and normalized to obtain the second word vector sequence of the generated word.

B3、利用原始文本的上下文向量基于交叉注意力机制对第二词向量序列进行更新，获得已生成词的第三词向量序列：B3. Use the context vector of the original text to update the second word vector sequence based on the cross-attention mechanism to obtain the third word vector sequence of the generated word:

本步骤中，将第二词向量序列，输入解码器的交叉注意力模块；首先，以Transformer架构的编码器基于原始文本生成的上下文向量(编码部分获得)构建Key值和Value值，输出原始文本中各单词的注意力分布a^t，以第二词向量序列构建Query值，结合注意力分布a^t，对第二词向量序列进行更新；然后，将输入的第二词向量序列和更新后的第二词向量序列进行残差连接，并归一化，获得已生成词的第三词向量序列。In this step, the second word vector sequence is input into the cross-attention module of the decoder; first, the encoder with Transformer architecture constructs Key and Value values based on the context vector generated by the original text (obtained in the encoding part), and outputs the original text Attention distribution a ^t of each word in , the Query value is constructed with the second word vector sequence, and combined with the attention distribution a ^t , the second word vector sequence is updated; then, the input second word vector sequence and the updated The second word vector sequence is residually connected and normalized to obtain the third word vector sequence of the generated word.

B4、基于各事实三元组的注意力向量和第三词向量序列，利用交叉注意力机制，计算所有事实三元组对各已生成词的影响系数，获得已生成词的第四词向量序列：B4. Based on the attention vector and third word vector sequence of each fact triple, use the cross attention mechanism to calculate the influence coefficient of all fact triples on each generated word, and obtain the fourth word vector sequence of the generated word :

本步骤中，将第三词向量序列，输入解码器的事实注意力模块；首先，基于各事实三元组的注意力向量和第三词向量序列，利用交叉注意力机制，获得所有事实三元组对各生成词的影响系数；然后，将输入的第三词向量序列和所有事实三元组对各生成词的影响系数，按词对应进行残差连接，并归一化，获得已生成词的第四词向量序列。In this step, the third word vector sequence is input into the fact attention module of the decoder; first, based on the attention vector and the third word vector sequence of each fact triple, the cross attention mechanism is used to obtain all fact triples The influence coefficient of the group on each generated word; then, the input third word vector sequence and the influence coefficient of all fact triples on each generated word are connected by residuals corresponding to the words and normalized to obtain the generated words The fourth word vector sequence.

其中，利用交叉注意力机制计算事实三元组对各生成词的影响系数，具体如下：Among them, the cross-attention mechanism is used to calculate the influence coefficient of fact triples on each generated word, as follows:

首先，计算各事实三元组与各个已生成词的相关性：First, calculate the correlation between each fact triple and each generated word:

其中，α_im表示第m个生成词和第i个事实三元组的相关性，表示第三词向量序列中第m个生成词的词向量；Among them, α _im represents the correlation between the m-th generated word and the i-th fact triplet, Represents the word vector of the m-th generated word in the third word vector sequence;

接着，计算各个已生成词与各事实三元组的相关性权重：Next, calculate the correlation weight between each generated word and each fact triple:

其中，β_im表示第m个生成词和第i个事实三元组的相关性权重，F表示事实三元组集合；之后，计算事实三元组对各已生成词的影响系数：Among them, β _im represents the correlation weight between the m-th generated word and the i-th fact triple, and F represents the set of fact triples; then, calculate the influence coefficient of the fact triples on each generated word:

u_m表示所有事实三元组对第m个生成词的影响系数。u _m represents the influence coefficient of all fact triples on the m-th generated word.

B5、将第四词向量序列作为前馈网络的输入，获得当前时间步的解码输出向量：B5. Use the fourth word vector sequence as the input of the feedforward network to obtain the decoding output vector of the current time step:

本步骤中，将第四词向量序列，输入解码器的前馈网络模块；首先，利用前馈网络，按公式p_t＝max(0,ln_tW₁+b₁)W₂+b₂进行计算；然后，将前馈网络层的输出p_t与第四词向量序列ln_t进行残差连接，并归一化，获得解码器的输出其中，W₁、W₂、b₁和b₂均为前馈网络的可学习参数。In this step, input the fourth word vector sequence into the feedforward network module of the decoder; first, use the feedforward network to proceed according to the formula p _t =max(0,ln _t W ₁ +b ₁ )W ₂ +b ₂ Calculate; then, residual connection is made between the output p _t of the feedforward network layer and the fourth word vector sequence ln _t , and normalized to obtain the output of the decoder. Among them, W ₁ , W ₂ , b ₁ and b ₂ are all learnable parameters of the feedforward network.

B6、基于解码输出向量，通过指针网络预测单词表中的单词和原文中的单词作为当前时间步生成词的概率：B6. Based on the decoding output vector, use the pointer network to predict the words in the word list and the words in the original text as the probability of generating words at the current time step:

传统解码过程为了获得当前生成词的分布概率，直接将前文解码信息映射到单词表上，可能在生成摘要中引入不属于原文但属于单词表的单词而产生的事实性错误。为尽量减少由此产生的事实性错误，本步骤中，引入指针网络，来尽量避免生成不属于原文但属于单词表中的单词，以此减少出现事实性错误的概率。In order to obtain the distribution probability of the current generated word, the traditional decoding process directly maps the previous decoding information to the word list, which may introduce factual errors in the generated summary due to words that do not belong to the original text but belong to the word list. In order to minimize the resulting factual errors, in this step, a pointer network is introduced to try to avoid generating words that do not belong to the original text but belong to the word list, thereby reducing the probability of factual errors.

具体而言，首先，按如下公式，利用线性层，将解码器在当前时间步t的输出映射到单词表的特征空间：Specifically, first, according to the following formula, use the linear layer to convert the output of the decoder at the current time step t Feature space mapped to vocabulary:

接着，基于映射向量表示l_t计算当前时间步t的词汇分布和指针概率/> Next, calculate the vocabulary distribution of the current time step t based on the mapping vector representation l _t and pointer probability/>

最后，基于和/>计算当前时间步t的最终概率分布P_t(w)：Finally, based on and/> Calculate the final probability distribution P _t (w) for the current time step t:

B7、基于预测的单词表中的单词和原文中的单词作为当前时间步生成词的概率，采用集束搜索算法获得当前时间步的生成词：B7. Based on the predicted words in the word list and the words in the original text as the probability of generating words at the current time step, the beam search algorithm is used to obtain the generated words at the current time step:

本步骤中，由于单词表中的单词对于原文而言属于外来词汇，而外来词汇相比原文中的单词对摘要的生成更加容易出现事实性错误，因此，集束搜索算法对于确实需要引入外来词汇时，计算拟引入的外来词汇与已生成词之间的组合概率，从而判断词汇的合理性。In this step, since the words in the word list are foreign words to the original text, and the foreign words are more prone to factual errors than the words in the original text when generating the summary, therefore, the beam search algorithm is useful when it is really necessary to introduce foreign words. , calculate the combination probability between the foreign words to be introduced and the generated words, so as to judge the rationality of the words.

具体而言，处理过程包括：Specifically, the processing includes:

然后，进行判定，若候选词中包含原始文本所包含的单词，则选择原始文本所包含的且输出概率最大的单词，作为当前时间步的生成词；否则，利用P个候选词和当前时间步之前已生成的生成词，构成M个候选摘要，并按如下公式计算各候选摘要的组合概率，选择组合概率最大的一条候选摘要所对应的候选词，作为当前时间步的生成词：Then, make a judgment. If the candidate words contain words contained in the original text, select the word contained in the original text with the highest output probability as the generated word at the current time step; otherwise, use P candidate words and the current time step The previously generated generated words form M candidate summaries, and the combination probability of each candidate summary is calculated according to the following formula, and the candidate word corresponding to the candidate summary with the highest combination probability is selected as the generated word at the current time step:

上述步骤B1-B7为一个时间步的生成过程，一个时间步输出一个生成词，通过循环执行上述生成过程，直至达到预定的摘要生成词个数，即可获得生成摘要。The above steps B1-B7 are a time step generation process. One time step outputs one generated word. By executing the above generation process in a loop until the predetermined number of summary generated words is reached, the generated summary can be obtained.

四、模型训练4. Model training

本实施例中的文本摘要生成模型的训练过程如下：The training process of the text summary generation model in this embodiment is as follows:

C1、从训练集中，选择一个样本作为当前训练样本，所述样本包括原始文本及其真实摘要；具体而言，可以采用文本摘要领域中常见数据集作为基础数据，比如CNN/DailyMail数据集作为训练集；C1. From the training set, select a sample as the current training sample. The sample includes the original text and its real summary; specifically, common data sets in the field of text summarization can be used as basic data, such as the CNN/DailyMail data set as training set;

C2、将当前训练样本的原始文本，输入至文本摘要生成模型；C2. Input the original text of the current training sample into the text summary generation model;

利用文本摘要生成模型的编码器，获得原始文本的上下文向量；Use the encoder of the text summary generation model to obtain the context vector of the original text;

由文本摘要生成模型按步骤B1～B7，逐个生成各时间步的生成词，最终通过各时间步的生成词，构成该训练样本的原始文本的生成摘要，且所述生成词的数量与真实摘要的单词数量相一致；The text summary generation model generates generated words at each time step one by one according to steps B1 to B7. Finally, the generated words at each time step form a generated summary of the original text of the training sample, and the number of generated words is the same as the real summary. The number of words is consistent;

C3、按如下损失函数进行损失计算，并基于损失对文本摘要生成模型进行更新：C3. Calculate the loss according to the following loss function, and update the text summary generation model based on the loss:

其中，y_m表示真实摘要中第m个单词的词向量，表示生成摘要中第m个生成词的词向量，T表示矩阵转置；Among them, y _m represents the word vector of the m-th word in the real summary, Represents the word vector of the m-th generated word in the generated summary, and T represents the matrix transpose;

C4、重复步骤C1～C3，直至完成文本摘要生成模型的训练。C4. Repeat steps C1 to C3 until the training of the text summary generation model is completed.

最后应当说明的是，上述实施例仅是优选实施方式，并不用以限制本发明。应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可以做出若干修改，等同替换、改进等，均应包含在本发明的保护范围之内。Finally, it should be noted that the above-mentioned embodiments are only preferred implementations and are not intended to limit the present invention. It should be pointed out that for those of ordinary skill in the art, several modifications, equivalent substitutions, improvements, etc. can be made without departing from the spirit of the present invention and the scope protected by the claims, which should all be included in the scope of the present invention. within the scope of protection.

Claims

1. A text summary generation method based on enhanced fact consistency, using the Transformer architecture to build a sequence-to-sequence text summary generation model. The decoder of the Transformer architecture includes a self-attention module, a cross-attention module and a front-end module connected in sequence. Feed network module, which is characterized by:

A fact attention module is introduced between the feedforward network module and the cross-attention module of the decoder;

Define the fact triplet and express it as F _i =<s _i , r _i , o _i >, where s is the subject word, r is the relational modifier, o is the object word, and the subscript i is the fact triplet. serial number, and each word in any fact triplet comes from the same sentence in the original text; the fact attention module uses the attention vector of each fact triplet in the original text, and the decoder cross The third word vector sequence output by the attention module is used as input. Based on the cross-attention mechanism, the influence coefficients of all fact triples on each generated word are obtained, and the influence coefficients of all fact triples on each generated word are calculated for the third word vector sequence. The word vector sequence is updated to obtain a fourth word vector sequence, which is used as the input of the decoder feedforward network module;

The attention vector of the fact triplet is calculated as follows:

A1. Use natural language processing tools to process the original text, extract fact triples F _i , and construct a set of fact triples F; then, use the words contained in each fact triplet F _i to construct nodes and connections to construct edges. Map the set of fact triples F into a graph, and use the graph neural network to calculate the node vectors of each node in the graph; then, splice the node vectors of the nodes contained in each fact triplet to obtain each fact triplet. encoding features Among them,/> and/> Respectively represent the node vectors of the nodes corresponding to si _, r _i and o _i contained in the fact triplet F _i ;

A2. Input the coding features of the fact triples into the recurrent neural network, and obtain the vector representation z _i of the semantic information of the fused fact triples before and after each fact triple;

A3. Use the vector representation z _i of each fact triplet and obtain the attention vector of each fact triplet based on the self-attention mechanism.

2. A text summary generation method based on fact consistency enhancement as claimed in claim 1, characterized in that:

In step A2, the recurrent neural network is a Bi-LSTM network; based on the coding characteristics of the fact triplet, the Bi-LSTM network is used to obtain the forward hidden layer vector of each fact triplet. and backward hidden layer vector/> Then, convert the forward hidden layer vector/> and backward hidden layer vector/> Perform vector splicing to obtain the vector representation of the semantic information of each fact triplet before and after the fusion of each fact triplet/>

3. A text summary generation method based on fact consistency enhancement as claimed in claim 1, characterized in that:

In step A1, the graph neural network is used to calculate the node vectors of each node in the graph, including:

First, use the pre-trained model to initialize the node vectors of each node in the graph;

Then, use the GCN network to update the node vectors of each node in the graph according to the following formula:

Among them, ReLU represents the activation function, H ^l and H ^l+1 represent the outputs of the l and l+1 layers of the GCN network respectively, and A represents the adjacency matrix of the graph. represents the degree matrix, and W ^t represents the weight matrix of the lth layer of the GCN network.

4. A text summary generation method based on fact consistency enhancement as claimed in claim 1, characterized in that:

Step A1 also includes:

First, use the decision tree model of the natural language processing tool to classify the relationship between each fact triplet F _i based on the relationship type it contains; then, based on the relationship classification results, obtain the relationship between the fact triplet F _i The relationship triplet between them constructs the relationship triplet set R;

Then, the coding features and relationship vectors of the fact triples contained in each relationship triplet are spliced to obtain the coding features of the relationship triplet R _j = [h _1j , cr _j , h _2j ], h _1j and h _2j Respectively represent the encoding characteristics of the fact triples contained in the j-th relationship triplet, cr _j represents the relationship vector obtained by the relationship contained in the j-th relationship triplet based on the pre-training model;

After that, the coding features of each relationship triplet and the coding features of each fact triplet are used to perform cross-attention calculations. Based on the calculation results, the coding features of each fact triplet are updated, and the updated fact triplet is The encoding feature h′ _i is used as the input of step A2.

5. A text summary generation method based on fact consistency enhancement as claimed in claim 4, characterized in that:

The coding features of each relationship triplet and the coding features of each fact triplet are used to perform cross-attention calculations, and the coding features of each fact triplet are updated based on the calculation results, including:

α _ij =h _i *R _j

Among them, α _ij expresses the correlation between the i-th fact triplet and the j-th relationship triplet, β _ij expresses the correlation weight between the i-th fact triplet and the j-th relationship triplet, and R represents the relationship. Collection of triples.

6. A text summary generation method based on enhanced fact consistency according to any one of claims 1 to 5, characterized in that, in step A3, vector representation z _i of each fact triple is used, based on self-attention force mechanism to obtain the attention vector of each fact triplet; the self-attention mechanism is multi-head self-attention, and its calculation process includes:

A31. Based on the transformation matrices W _k1 , W _k2 and W _k3 of each attention head, convert the vector representation z _i of the fact triplet into a Query value Key value/> and Value/>

A32, based on and/> Calculate the attention of each attention head:

Among them, d _k is dimensions;

A33. Aggregate the attention calculation results of each attention head to obtain the attention vector Z _i :

Among them, K is the number of attention heads;

A34. Use linear transformation to change the dimension of the attention vector Z _i to the dimension of the vector representation z _i , and obtain the attention vector r _i of the fact triplet;

r _i =Z _i *W ⁰

Among them, W ⁰ is the linear transformation matrix.

7. A text summary generation method based on fact consistency enhancement according to claim 1, characterized in that, at each time step, the decoding process of the decoder is:

B1. Based on each generated word obtained before the current time step t, use the pre-training model to obtain its word embedding, and use the word embedding of each generated word to form the first word vector sequence of the generated words in the order of generation;

B2. Input the first word vector sequence into the self-attention module of the decoder; first, update the first word vector sequence based on the self-attention mechanism; then, combine the input first word vector sequence with the updated first word vector sequence. The word vector sequence is residually connected and normalized to obtain the second word vector sequence of the generated word;

B3. Input the second word vector sequence into the cross-attention module of the decoder; first, the encoder with Transformer architecture constructs Key and Value values based on the context vector generated by the original text, and outputs the attention distribution of each word in the original text. a ^t , construct the Query value based on the second word vector sequence, and update the second word vector sequence combined with the attention distribution a ^t ; then, perform residual processing on the input second word vector sequence and the updated second word vector sequence. Difference connection and normalization to obtain the third word vector sequence of the generated words;

B4. Input the third word vector sequence into the fact attention module of the decoder; first, based on the attention vector and the third word vector sequence of each fact triplet, use the cross attention mechanism to obtain all fact triplet pairs The influence coefficient of each generated word; then, the input third word vector sequence and the influence coefficient of all fact triples on each generated word are connected by residuals according to word correspondence, and normalized to obtain the generated word's third Four word vector sequence;

B5. Input the fourth word vector sequence into the feedforward network module of the decoder; first, use the feedforward network to calculate according to the formula p _t =max (0, ln _t W ₁ +b ₁ )W ₂ +b ₂ ; Then, the output p _t of the feedforward network layer is residually connected to the fourth word vector sequence ln _t and normalized to obtain the output of the decoder Among them, W ₁ , W ₂ , b ₁ and b ₂ are all learnable parameters of the feedforward network.

8. A text summary generation method based on fact consistency enhancement according to claim 1 or 7, characterized in that:

Based on the attention vector and third word vector sequence of each fact triplet, the cross-attention mechanism is used to obtain the influence coefficients of all fact triplets on each generated word, including:

in, represents the word vector of the m-th generated word in the third word vector sequence; α _im represents the correlation between the m-th generated word and the i-th fact triple, β _im represents the m-th generated word and the i-th fact triple The correlation weight of the tuple, u _m represents the influence coefficient of all fact triples on the m-th generated word, and F represents the set of fact triples.

9. A text summary generation method based on fact consistency enhancement as claimed in claim 7, characterized in that:

The text summary generation model also includes a pointer network;

At each time step, the processing process of the pointer network includes:

First, according to the following formula, use the linear layer to convert the output of the decoder at the current time step t Feature space mapped to vocabulary:

Among them, W _vocab and b _vocab represent the learnable parameter matrix corresponding to the word list, and the word list is the word list of the pre-trained model;

Then, based on l _t , calculate the vocabulary distribution of the current time step t and pointer probability/>

Among them, w _gen and b _gen represent learnable parameters; Represents the probability that the generated word at the current time step is generated from the word list,/> Represents the output probability distribution of each word in the word list as a generated word at the current time step;

Afterwards, based on and/> Calculate the final probability distribution P _t (w) for the current time step t:

Among them, P _t (w) represents the probability that word w in the expanded vocabulary is a generated word at the current time step t. The expanded vocabulary includes all words contained in the word list and the original text, and N represents the number of words in the original text. n represents the sequence number of the word in the original text, The attention distribution of each word vector of the original text generated by the cross attention module a The attention of the nth word in ^t ,/> Represents the attention sum of word w in the original text.

10. A text summary generation method based on fact consistency enhancement as claimed in claim 9, characterized in that:

The text summary generation model also includes a beam search algorithm. At each time step, the processing process of the beam search algorithm includes:

First, use the final probability distribution P _t (w) output by the pointer network at the current time step t to screen the P words with the highest probability to construct candidate words;

Then, make a judgment. If the candidate words contain words contained in the original text, select the word contained in the original text with the highest output probability as the generated word at the current time step; otherwise, use P candidate words and the current time step The generated words that have been generated before constitute P candidate summaries, and the combination probability of each candidate summary is calculated according to the following formula, and the candidate word corresponding to the candidate summary with the highest combination probability is selected as the generated word at the current time step:

P _Y =argmax(log(Π _t p(y _t |y ₁ , y ₂ ,..., y _t-1 )))

Among them, Y represents the candidate summary, and p represents the output probability of the corresponding word.