CN114218928A

CN114218928A - Abstract text summarization method based on graph knowledge and theme perception

Info

Publication number: CN114218928A
Application number: CN202111654105.5A
Authority: CN
Inventors: 姜明; 邹一凡; 张旻
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-03-22

Abstract

The invention discloses an abstract text summarization method based on graph knowledge and topic perception. The invention provides a model based on BERT, a neural topic model and a graph neural network, which is called GTASum. At the input end of a document encoder, using BERT to obtain a hidden word vector of a document; at the input of the topic encoder, obtaining a word-topic distribution vector of the document using a neural topic model; inputting the two vectors into a graph neural network for training to obtain context content fused with topic knowledge, and generating a text by a transform-based decoder; meanwhile, the LN layer with the normalized condition can cooperatively train a neural topic model and a decoder, and effectively select the characteristics. The result shows that the method has better robustness and adaptive capacity.

Description

Abstract text summarization method based on graph knowledge and theme perception

Technical Field

The invention belongs to the technical field of natural language processing, and relates to a text abstract generation method, in particular to an abstract text abstract method based on graph knowledge and topic perception, and a text abstract generation method based on a pre-training language model, a neural topic model and a graph neural network.

Background

With the development of computer performance and large-scale language models, natural language processing tasks (NLPs) have made significant development. The abstract task is one of core problems of natural language processing tasks and aims to enable people to quickly master important information in texts. Text summaries have been widely used in a number of areas such as news, finance, conferencing, and medicine. At present, there are two main methods for the summarization task: an abstraction method and an abstract method. The extraction method mainly copies important information from the original text and then aggregates them into text. The abstract generated by the method usually retains the significant information of the source text and has correct grammar, but inevitably generates a large amount of redundant information easily. Abstract methods are to form a summary based on an understanding of the source text input. It tries to understand the text content, can generate words not in the original text, more closely approaches the essence of the abstract, and has the potential of generating high-quality abstract. To sum up, in order to better generate news summaries to help readers quickly and efficiently grasp daily messages, the present study focuses on abstract summaries.

In abstract abstracts, sequence-to-sequence has become the dominant framework for a variety of architectures. In the early days, the abstracted text excerpts were primarily RNN-based codecs. Because of the "long range dependency" problem, the RNN has lost much information by the time the word is entered at the last time step, and bahdana et al apply the Attention mechanism to NLP. Since then, text summarization has advanced greatly, and various applications have emerged endlessly. Among these, the most notable sibling transform architecture. This model has achieved surprising performance in many areas. Current SOTA abstraction models, including BART, PEGASUS, and ProphetNet, all employ a transform-based architecture. Also, with the contribution of attention mechanism, the transform-based model can capture well the grammatical and contextual information between tokens. However, higher levels of semantic understanding do not perform well.

To address this problem, researchers have tried a variety of improvements, one of which is the way of exploiting topic perception. Topic models such as LDA, PFA, NVDM, NTM can all provide additional information for document understanding. For text summarization, we believe that it can improve model performance by incorporating the topic model properties into the summarization model. Furthermore, in recent years, Graph Neural Networks (GNNs) have been widely used for cross-sentence relational modeling of summarization tasks. Some studies have created document graphs based on linguistic analysis. However, this approach relies on external tools, possibly leading to semantically fragmented outputs. Wang and Liu et al have constructed word and sentence document graphs, but this approach has difficulty capturing semantic level relationships, and therefore, how to efficiently construct documents into abstractable graphs is also a difficult problem.

Disclosure of Invention

The invention aims to provide an abstract text summarization method based on graph knowledge and theme perception, aiming at the defects of the prior art.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step (1): given an original input document D, a [ CLS ] is inserted into the beginning and end of each sentence in the original input document D]And [ SEP ]]Then, the processed input document D is put into a pre-training language model BERT, and the feature representation H of the sentence is learned_B；

Step (2): inputting original input document D into neural topic model NTM, learning topic representation H of document_T；

And (3): representing the characteristics of the sentence H_BAnd a topic representation H of the document_TInputting the information into a destination attention network GAT and initializing; generating sentence characteristics h' with topic information after the attention network GAT coding; the GAT coding process is to construct a heterogeneous document graph with topics and sentences, and continuously update the feature representation H_BAnd a topic representation H_TA constructed node representation;

and (4): sending the sentence characteristics h' with the topic information into a Transformer-based decoder for decoding; generating a text abstract after normalization;

and (5): training the GTASum model on the CNN/DailyMail data set and the XSum data set, selecting the optimal GTASum model, inputting any text into the trained GTASum model, and outputting corresponding abstract contents;

the GTASum model is composed of a pre-training language model BERT, a neural topic model NTM, a graph attention network GAT and a decoder.

The invention has the following beneficial effects:

the invention provides a topic model and a graph neural network which can help a pre-training language model to better speak. During training, the document theme information and the document embedded information are sent to a graph neural network for fusion and updating. Through the operation, when the pre-training language model faces downstream tasks, feature selection can be carried out by better referring to subject information. The results show that this approach has better robustness and adaptability. And testing according to standard performance indexes in the field of text summarization, wherein the evaluation indexes comprise ROUGE _1/ROUGE _2/ROUGE _ L. Tests were performed on news text summary datasets CNN/DailyMail and XSum and yielded results that were of a leading level in the field.

Drawings

FIG. 1 is the overall flow framework of the model;

FIG. 2 is a neural topic model portion framework;

FIG. 3 is an example of the present model usage;

Detailed Description

The invention is further illustrated by the following figures and examples.

The invention provides an abstract text summarization method based on graph knowledge and theme perception. First, we encode the input document with a pre-trained language model BERT to learn the contextual sentence representation while using the Neural Topic Model (NTM) to find potential topics. Then, we build a heterogeneous document graph composed of sentences and topic nodes and update the representation by using a modified graph attention network (GAT). Thirdly, the expression form of the sentence node is obtained, and the latent semantic meaning is calculated. Finally, the latent semantics are fed into a transform-based decoder for decoding to generate a final result. We performed extensive experiments on two real-world datasets CNN/DailyMail and XSum.

The model based on the BERT, the neural topic model and the graph neural network is called GTASum. At the input end of a document encoder, using BERT to obtain a hidden word vector of a document; at the input of the topic encoder, obtaining a word-topic distribution vector of the document using a neural topic model; inputting the two vectors into a graph neural network for training to obtain context content fused with topic knowledge, and generating a text by a transform-based decoder; meanwhile, the LN layer with the normalized condition can cooperatively train a neural topic model and a decoder, and effectively select the characteristics. The result shows that the model can well capture the key information of the abstract text abstract and has better robustness and adaptive capacity.

As shown in fig. 1 and 2, an abstract text summarization method based on graph knowledge and topic perception includes the following steps:

the GTASum model consists of a pre-training language model BERT, a neural topic model NTM, a drawing attention network GAT and a decoder;

further, the step (1) is specifically implemented as follows:

1-1 inserting special marks into the beginning and end of each sentence in the original input document D<CLS>And<SEP>set of sentences W ═ W₁,w₂,…,w_n}; wherein, w_iRepresenting the ith sentence; [ CLS]Put at the beginning of each sentence, [ SEP]Put at the end of each sentence;

1-2 putting the sentence set W into a pre-trained language model BERT, as shown in equation 1, generating a hidden state representation H of the sequence_BAnd represents the hidden state as H_BThe features considered as corresponding sentences represent:

H_B＝{h₁,h₂,...,h_i,...,h_n}＝BERT({w₁,w₂,...,w_i,...,w_n}) (1)

further, the step (2) is specifically implemented as follows:

2-1, inputting an original input document D into a neural topic model NTM for coding; during the encoding process, the average value μ ═ f is generated_μ(x) Sum variance log σ ═ f_σ(x) (ii) a Wherein the function f_μAnd f_σAre all linear transformation functions;

the 2-2 decoding process comprises three steps:

firstly, the method comprises the following steps: a gaussian distribution is used to describe the topic distribution, i.e., z- Ν (μ, σ) and θ ═ softmax (z); where z is a potential subject variable, θ ∈ R^KIs the result of z normalization, K is the subject dimension;

secondly, the method comprises the following steps: by p_w＝softmax(W_φTheta) to learn the predicted word p_w∈R^VThe probability of occurrence of (c); wherein, W_φ∈R^V×KIs a topic-word distribution matrix in a similar LDA topic model;

thirdly, the method comprises the following steps: from predicted words p_wExtracting each word to construct a bag of words x_box；

2-3 extracting neural topic model intermediate parameter W_φConstructing a topic representation H using equation 2_T；

Wherein the content of the first and second substances,

is a set of data having a predefined dimension of d_tSubject representation of f_φIs a linear transformation function;

further, the step (3) is specifically implemented as follows:

3-1 constructs an undirected graph G ═ V, E }, where V ═ V_S∪V_TIs a set of nodes, E is a set of edges; wherein V_S＝{S₁,S₂,...,S_NIndicates N sentence nodes, and the sentence feature indicates h₁,h₂,...,h_i,...,h_nIs corresponding to, V_T＝{T₁,T₂,...,T_KDenotes K topic nodes, and the topic representation of the document { t }₁,t₂,...,t_j,...,t_kCorresponding to the obtained result; e ═ E₁₁,...,e_NKRepresents the weight between the ith sentence node and the jth topic node;

3-2 representation of H by sentence features_BAnd a topic representation H of the document_TInitializing nodes, carrying out graph coding, and obtaining weighted representation S of each sentence node through a graph attention network_i'；

Wherein, W_bAnd W_cIs a trainable parameter; LeakyReLU is an activation function; s_iIs the ith sentence node, T_jIs the jth subject node; s_i' is the ith sentence node, and contains subject information through the weighting of the subject node;

is a feed-forward neural network;

3-3 considering that sentence nodes and topic nodes represent differently, a heterogeneous document graph needs to be constructed, so equation 3 is rewritten into equation 6 as follows:

the formula removes the trainable parameter W_bUsing a non-linear transformation function f_SAnd f_TMapping sentences and topics to an implicit common space and recalculating e_ij；

3-4, splicing n sentence nodes containing subject information to generate sentence characteristics h ' ═ { S ' with the subject information '₁,S'₂,...,S'_i,...,S'_n}；

Further, the step (4) is specifically implemented as follows:

4-1, sending the sentence characteristics h' into a Transformer-based decoder for decoding, and then predicting through a multi-layer attention mechanism; as shown in fig. 3.

Further, the step (5) is specifically implemented as follows:

5-1, performing end-to-end training, setting epoch as 10, learning rate as 0.00001 and batch size as 16, and gradually adjusting the learning rate by using an Adam optimizer; jointly training the neural topic model and the decoder to reduce loss, wherein the loss function is as follows:

L_NTM＝D_KL(p(z)||q(z|x))-E_q(z|x)[p(x|z)] (7)

wherein the first term represents KL-divergence loss and the second term represents reconstruction loss; q (z | x) and p (x | z) represent the encoder and decoder networks of the NTM, respectively;

L_Trans＝-∑log p(y|x；θ) (8)

where x represents the input document, y represents the reference summary, and θ is the model parameter

L＝L_Trans+λL_NTM (9)

Wherein λ is a balance parameter, ranging from [0,1 ];

5-2, inputting the test document into the GTASum model to obtain the summary content.

Claims

1. An abstract text summarization method based on graph knowledge and topic perception is characterized by comprising the following steps:

2. The abstract text summarization method based on graph knowledge and topic perception according to claim 1, wherein the step (1) is implemented as follows:

H_B＝{h₁,h₂,...,h_i,...,h_n}＝BERT({w₁,w₂,...,w_i,...,w_n}) (1)。

3. the abstract text summarization method based on graph knowledge and topic perception according to claim 2, wherein the step (2) is implemented as follows:

the 2-2 decoding process comprises three steps:

Wherein the content of the first and second substances,

is a set of data having a predefined dimension of d_tSubject representation of f_φIs a linear transformation function.

4. The abstract text summarization method based on graph knowledge and topic perception according to claim 3, wherein the step (3) is implemented as follows:

3-1 constructs an undirected graph G ═ V, E }, where V ═ V_S∪V_TIs a set of nodes, E is a set of edges; wherein V_S＝{S₁,S₂,...,S_NIndicates N sentence nodes, and the sentence feature indicates h₁,h₂,...,h_i,...,h_nIs corresponding to, V_T＝{T₁,T₂,...,T_KDenotes K topic nodes, and the topic representation of the document { t }₁,t₂,...,t_j,...,t_kCorresponding to the obtained result; e ═ E₁₁,...,e_NKDenotes the ith sentence node and the jth topic nodeWeight in between;

S'_i＝σ(∑α_ijW_cT_j) (5)

Wherein, W_bAnd W_cIs a trainable parameter; LeakyReLU is an activation function; s_iIs the ith sentence node, T_jIs the jth subject node; s'_iIs the ith sentence node and contains topic information through the weighting of the topic node;

is a feed-forward neural network;

3-4, splicing n sentence nodes containing subject information to generate sentence characteristics h ' ═ { S ' with the subject information '₁,S'₂,...,S'_i,...,S'_n}。

5. The abstract text summarization method based on graph knowledge and topic perception according to claim 3, wherein the step (4) is implemented as follows:

4-1 feeding sentence characteristics h' into a Transformer-based decoder; the prediction is then made by a multi-layer attention mechanism.

6. The abstract text summarization method based on graph knowledge and topic perception according to claim 5, wherein the step (5) is implemented as follows:

5-1, performing end-to-end training, setting epoch as 10, learning rate as 0.00001, setting batch size as 16, and gradually adjusting the learning rate by using an Adam optimizer; jointly training the neural topic model and the decoder to reduce loss, wherein the loss function is as follows:

L_NTM＝D_KL(p(z)||q(z|x))-E_q(z|x)[p(x|z)] (7)

L_Trans＝-∑log p(y|x；θ) (8)

L＝L_Trans+λL_NTM (9)

Wherein λ is a balance parameter, ranging from [0,1 ];