CN110825869A

CN110825869A - Text abstract generating method of variation generation decoder based on copying mechanism

Info

Publication number: CN110825869A
Application number: CN201910872440.9A
Authority: CN
Inventors: 黄晓; 滕蔚; 林嘉良; 保延翔
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2020-02-21

Abstract

The invention provides a text abstract generating method of a variable generation decoder based on a replication mechanism, which generates hidden variables containing potential sentence structure information by utilizing a VAE thought, and maximally utilizes the potential characteristic information of a sentence, so that the generated abstract has stronger structure and readability. A replication mechanism is introduced in a replication part, and a context semantic vector with an attention mechanism is used as the replication part to relieve the common repeated problem of the traditional sequence-to-sequence framework. The generation part and the copy part act together, so that the generated abstract is more accurate.

Description

Text abstract generating method of variation generation decoder based on copying mechanism

Technical Field

The invention relates to the technical field of natural language processing, in particular to a text abstract generating method of a variation generation decoder based on a copying mechanism.

Background

The automatic summarization goal is to generate a concise summary from a specified piece of text. The method for realizing the method mainly comprises two methods: an extraction formula and a generation formula. The extraction type automatic abstract mainly extracts sentences or phrases from the original text as an abstract; the generative automatic abstract is to regenerate a new sentence which is not in the original text as an abstract.

Conventional neural network generative text summarization methods are generally attention-based sequence-to-sequence methods. The method can extract shallow semantic information and generate simple abstract from the original text. But the generated abstract often contains repeated words, and the potential structural information contained in the abstract is often ignored when the abstract is generated, so that the quality of the generated abstract is greatly limited.

Disclosure of Invention

The invention provides a text abstract generating method of a variation generation decoder based on a copying mechanism, which can realize the generation of a text abstract with clear sentence structure and accurate content.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a text abstract generating method of a variation generation decoder based on a copying mechanism comprises the following steps:

s1: mapping the input text content into a semantic sequence by encoding;

s2: decoding the semantic sequence obtained in the step S1;

s3: optimizing the decoded output of step S2 results in a text summary of the input text.

Further, the specific process of step S1 is:

inputting text content X ═ { X ═ X₁,x₂,...,x_TMapping the word vector into a corresponding word vector, and sending the word vector as input into a bidirectional gate control circulation unit GRU of an encoder for encoding to obtain the hidden layer state of the input text in an encoding stage, wherein the bidirectional gate control circulation unit GRU comprises a forward direction-backward direction and a backward direction-forward direction for encoding, and the forward direction-backward direction is formed by x₁To x_T(ii) a Backward-forward is formed by x_TTo x₁The resulting hidden layer states are as follows:

and connecting the hidden layer states in two directions to obtain the hidden layer state of the input text in the encoding stage:

further, the decoding process in step S2 includes potential sentence structure information modeling based on VAE and replication information modeling based on replication mechanism, wherein the potential sentence structure information modeling based on VAE is:

the hidden layer state of the decoding stage is calculated by two layers of GRUs, the first layer hidden layer state

The calculation method of (c) is as follows:

wherein, y_t-1Is the output of the last moment in time,

is the first layer hidden layer state at the previous moment;

using the first layer hidden layer state at the current time

And all hidden layer states of the encoding stage

To calculate an attention weight distribution, a context semantic vector c_tWeighted sum of hidden layer states representing input text content, attention weight a_tiThe calculation method is as follows:

wherein, the weight of the attention mechanism is used for measuring the relevance of the output at the time t and the content in the text,

and

is a weight matrix; b_aIs an offset;

second layer hidden layer state

The attention mechanism is introduced into the decoding stage, so c is_tAdding into

In the calculation of (2):

wherein, y_t-1Is the output of the last moment in time,

is the second layer hidden layer state at the previous time.

In sentence modeling, the output variable y before t moment is taken into consideration of the timing problem_<tAnd the latent structural variable z_<tMapping to a posterior distribution q_φ(z_t|y_<t,z_<t) Using posterior distribution q_φ(z_t|y_<t,z_<t) Approximate output true posterior distribution p_θ(z_t|y_<t,z_<t) And assuming that the distribution is normal, from q_φ(z_t|y_<t,z_<t) And (3) sampling z in the middle, because the sampling process is not conductive, and in order to ensure that the backward propagation of the training process is smoothly carried out, introducing a reconstruction parameter skill, and obtaining a new sampling result through parameter change, wherein a reconstruction parameter formula is as follows:

z＝μ+σε

wherein epsilon-N (0, I) is a noise variable, and Gaussian parameters mu and sigma are respectively a variation mean and a standard deviation;

the sampled latent variable z contains the latent sentence structure information of the target abstract, and the latent structure variable z and the state of the second-layer hidden layer

Co-mapping to a new hidden layer state as a generating partial variable:

wherein tanh (-) is an activation function,

andis a matrix of the weights that is,

is an offset.

Further, the process of modeling the replication information based on the replication mechanism in step S2 is:

in order to solve the repeated problem in the generation of the abstract, a copying mechanism is introduced in the decoding stage, and a context semantic vector c is used_tAs a copy vector, by copying vector c_tThe second layer hiding layer state

And the latent structural variable z_tCalculate a control gate g_switch：

Wherein sigmoid (. cndot.) is an activation function,

and

is a weight matrix, b_switchIs an offset;

g_switchfor assigning different weights to the variables of the generation part and the variables of the copy part, g_switchRepresenting the weight of the generated part; (1-g)_switch) Represents the weight of the copied part, c_tThe variable serving as the copy part can ensure that each final hidden layer has different degrees of copy, thereby achieving the effect of strengthening the attention mechanism again and reducing the problem of repeated words;

the final hidden layer variables are calculated as follows:

and (5) obtaining the target word y by the final hidden layer variable h through a softmax (·) function_tIs represented by the following formula:

wherein sigmoid (. cndot.) is an activation function,

is a matrix of the weights that is,is an offset.

Further, the specific process of step S3 is:

the network is optimized, and the loss function mainly comprises two parts: and generating a negative number of the log-likelihood function of the abstract and a lower bound of the variation, and combining a log term in the lower bound of the variation and the log-likelihood function of the abstract. The loss function is shown as follows:

wherein, { XⁿDenotes training text content, { yⁿDenotes the generated target digest, D_KL[·]And the divergence is KL, N is the number of samples, and T is the length of the output sequence.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a text abstract generating method of a variation generation decoder based on a copying mechanism. The method is based on an attention sequence-to-sequence framework and combines a Variational Auto-Encoder (VAE) and a replication mechanism to extract and fully utilize effective information contained in sentences. The encoder of the method is the same as the traditional encoder, and a bidirectional Gated cycle Unit (GRU) is used as a basic sequence modeling Unit. Different from the traditional decoder, the decoder of the method has three parts, wherein the first part is a GRU decoding part, two layers of GRUs are adopted, the GRU of the first layer is used for calculating the attention mechanism, the GRU of the second layer is used for introducing the attention mechanism, and the state of the hidden layer of the decoder is obtained through calculation. And the second part is a VAE part and is used for modeling latent variables, extracting the structure information of the latent sentences by utilizing the VAE and generating the hidden variables containing the target abstract latent structure information. And mapping the hidden variable and the hidden layer state of the decoder into a new variable as the variable of the generation part. The third part is a copy part, and a semantic vector containing the context of the attention mechanism is used as a variable of the copy part. And in the final output part of the model, the variables of the generating part and the variables of the copying part act together to decode and output, and a text abstract with clear sentence structure and accurate content is generated.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a data preprocessing diagram;

FIG. 3 is a word graph;

FIG. 4 VAE ideogram;

FIG. 5 is a diagram of potential structural analysis in summary form;

fig. 6 generates an example summary graph.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a text summary generation method for a variation generation decoder based on a copy mechanism is to give a text sequence X ═ { X ═ X%₁,x₂,...,x_TAs input, the goal is to generate a summary Y ═ Y of the corresponding text sequence₁,y₂,...,y_T}. The whole framework is divided into two parts of an encoder and a decoder as shown in figure 1. The encoder section encodes the input sequence using bidirectional GRUs. The decoder is composed of three parts, wherein the first part is a GRU decoding part, two layers of GRUs are adopted, the GRU of the first layer is used for calculating the attention mechanism, the GRU of the second layer is introduced into the attention mechanism, and the hidden layer state of the decoder is obtained through calculation. And the second part is a VAE part and is used for modeling latent variables, extracting the structure information of the latent sentences by utilizing the VAE and generating the hidden variables containing the target abstract latent structure information. And mapping the hidden variable and the hidden layer state of the decoder into a new variable as the variable of the generation part. The third part is a copy part, and a semantic vector containing the context of the attention mechanism is used as a variable of the copy part. And in the final output part of the model, the variables of the generation part and the variables of the copy part act together to decode and output to obtain the target abstract. Short Chinese abstract with Xinlang microblogAn open-size (LCTS) data set is described as an example.

First, data is preprocessed as shown in fig. 2. And performing word segmentation according to the mode of FIG. 3 to construct a dictionary. And respectively segmenting the text content and the target abstract and constructing a dictionary. In order to avoid errors caused by word segmentation errors, word segmentation is carried out by taking the word as a unit to construct a dictionary. The words of the dictionary are then mapped into a randomly initialized word vector. And respectively obtaining a dictionary of the text content and a dictionary of the target abstract.

Then, the input text content X is set as X₁,x₂,...,x_TConverting the data into an ID form, indexing in a dictionary through the ID, finding a word vector corresponding to the word, and sending the word vector into a bidirectional GRU for coding by taking the word vector as input to obtain a hidden layer state of a coding stage. Bidirectional GRUs contain both forward-backward and backward-forward encoding. Forward-backward is formed by x₁To x_T(ii) a Backward-forward is formed by x_TTo x₁. The resulting hidden layer states are as follows:

and connecting the hidden layer states in two directions to obtain the hidden layer state in the encoding stage:

the hidden layer state of the decoding stage is next calculated. The hidden layer state of the decoding stage is computed by the two layers GRU. First layer hidden layer state

The calculation method of (c) is as follows:

wherein, y_t-1Is the output of the last moment in time,is the first layer hidden layer state at the last moment.

Then using the first layer hidden layer state at the current moment

And all hidden layer states of the encoding stage

To calculate the attention weight distribution. The attention weight is calculated as follows:

wherein the content of the first and second substances,

and

is a weight matrix; b_aIs an offset.

Context semantic vector c_tWeighted sum of hidden layer states representing input text content:

the second layer hides the layer state, taking into account the attention mechanism introduced in the decoding stage, c_tAdding intoIn the calculation of (2):

wherein, y_t-1Is the output of the last moment in time,

is the second layer hidden layer state at the previous time.

Next, the potential structure of the sentence is modeled, and the potential structure in the sentence is extracted as shown in fig. 5. Using a posterior distribution q_φ(z_t|y_<t,z_<t) Approximate output true posterior distribution p_θ(z_t|y_<t,z_<t) And assuming that the distribution is normal, from q_φ(z_t|y_<t,z_<t) And z is sampled out. Since the sampling process is not conductive, in order to ensure the backward propagation of the training process to be smoothly performed, a reconstruction parameter skill is introduced, and a new sampling result is obtained through parameter change, as shown in fig. 5. The reconstruction parameter equation is as follows:

z＝μ+σε

where ε -N (0, I) is the noise variable and the Gaussian parameters μ and σ are the variation mean and standard deviation, respectively.

The sampled latent variable z contains the latent sentence structure information of the target abstract. Hiding the latent structural variable z and the second layer hidden layer state

Co-mapping to a new hidden layer state as a generating partial variable:

wherein tanh (-) is an activation function,

andis a matrix of the weights that is,is an offset.

This is followed by modeling of the replication variables. A replication mechanism is introduced at the decoding stage. Context semantic vector c_tAs a copy vector. By copying vector c_tThe second layer hiding layer stateAnd the latent structural variable z_tCalculate a control gate g_switch：

Wherein sigmoid (. cndot.) is an activation function,

and

is a weight matrix, b_switchIs an offset.

g_switchUsed to assign different weights to the variables of the generation part and the variables of the replication part. g_switchUsed to assign different weights to the variables of the generation part and the variables of the replication part. g_switchRepresenting the weight of the generated part; (1-g)_switch) Representing the weight of the copied portion. C is to_tThe variable serving as the copy part can ensure that each final hidden layer has different degrees of copy, thereby achieving the effect of strengthening the attention mechanism again and lightening the problem of repeated words.

The final hidden layer variables are therefore calculated as follows:

finally, the final hidden layer variable h is processed by a softmax (·) function to obtain a target word y_tProbability distribution of, e.g.Represented by the formula:

wherein sigmoid (. cndot.) is an activation function,

is a matrix of the weights that is,

is an offset.

The network is optimized, and the loss function mainly comprises two parts: the negative of the log-likelihood function of the summary and the lower bound of the variation are generated. For convenience of representation, the log terms in the lower bound of the variation and the log-likelihood function of generating the summary are combined. The loss function is shown as follows:

wherein, { XⁿDenotes training text content, { yⁿDenotes the generated target digest, D_KL[·]And the divergence is KL, N is the number of samples, and T is the length of the output sequence. FIG. 6 is a diagram illustrating the summary generation result of the embodiment.

The method of the embodiment is based on the attention sequence-to-sequence framework and combines VAE and a replication mechanism to extract and fully utilize effective information contained in sentences. The encoder of the method is the same as the traditional encoder, and a bidirectional GRU is used as a basic sequence modeling unit. Different from the traditional decoder, the decoder of the method has three parts, wherein the first part is a GRU decoding part, two layers of GRUs are adopted, the GRU of the first layer is used for calculating the attention mechanism, the GRU of the second layer is used for introducing the attention mechanism, and the state of the hidden layer of the decoder is obtained through calculation. And the second part is a VAE part and is used for modeling latent variables, extracting the structure information of the latent sentences by utilizing the VAE and generating the hidden variables containing the target abstract latent structure information. And mapping the hidden variable and the hidden layer state of the decoder into a new variable as the variable of the generation part. The third part is a copy part, and a semantic vector containing the context of the attention mechanism is used as a variable of the copy part. And in the final output part of the model, the variables of the generating part and the variables of the copying part act together to decode and output, and a text abstract with clear sentence structure and accurate content is generated.

Compared with the traditional sequence-to-sequence model, the method of the invention does not introduce a generative model, and the abstract sentence information is not extracted completely, so that the potential sentence structure information can not be utilized, and the generated abstract quality is not high. According to the method, a potential sentence structure information modeling part is added under a sequence framework based on the attention sequence, VAE is introduced as a generation model to model potential structure variables, and the structure and readability of the generated abstract are improved.

Meanwhile, the invention introduces a copy mechanism in the decoding part, and divides the output part of the method into a generation part and a copy part. The hidden variable generated by the VAE and the hidden layer state of the decoder are combined and mapped into a new hidden variable as a generating part, a semantic vector containing the context of the attention mechanism is used as a copying part, and the generating part and the copying part are combined together to decode and output. The method not only ensures that the deep semantic information of the sentences can be utilized, but also relieves the common repeated problem from the sequence to the sequence frame, so that the summary generation is orderly carried out with high quality.

In addition, the GRU decoding part of the decoder of the present invention adopts two layers of GRUs. The first layer of GRU is used for calculating an attention mechanism, and the second layer of GRU introduces the attention mechanism to obtain a hidden layer state of the decoder, so that the combination of a variation idea and a replication idea is realized, and a variation generation decoder based on the replication mechanism is constructed; the method is verified on an LCSTS data set, and proves that the algorithm is superior to a common baseline model.

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A text abstract generating method of a variation generation decoder based on a copying mechanism is characterized by comprising the following steps:

s1: mapping the input text content into a semantic sequence by encoding;

s2: decoding the semantic sequence obtained in the step S1;

2. The method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 1, wherein the specific process of step S1 is:

connecting hidden layer states in two directions to obtain input text in encoding stageHidden layer state:

3. the method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 2, wherein the decoding process in step S2 includes potential sentence structure information modeling based on VAE and replication information modeling based on replication mechanism, wherein the potential sentence structure information modeling based on VAE is:

The calculation method of (c) is as follows:

wherein, y_t-1Is the output of the last moment in time,

is the first layer hidden layer state at the previous moment;

using the first layer hidden layer state at the current timeAnd all hidden layer states of the encoding stage

for the hidden layer state of the encoding stage at time i,

and

is a weight matrix; b_aIs an offset;

second layer hidden layer state

The attention mechanism is introduced into the decoding stage, so c is_tAdding intoIn the calculation of (2):

wherein, y_t-1Is the output of the last moment in time,

is the state of the second hidden layer at the previous moment, c_tIs a context semantic vector.

z＝μ+σε

the sampled latent variable z contains the latent sentence structure information of the target abstract, and the latent structure variable z and the state of the second-layer hidden layerCo-mapping to a new hidden layer state as a generating partial variable:

where tanh (-) is the activation function, z is the latent structural variable,

the layer state is hidden for the second layer,and

is a matrix of the weights that is,

is an offset.

4. The method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 3, wherein the replication mechanism-based replication information modeling in step S2 is:

in order to solve the repeated problem in the generation of the abstract, a copying mechanism is introduced in the decoding stage, and a context semantic vector c is used_tAs a copy vector, by copying vector c_tThe second layer hiding layer stateAnd the latent structural variable z_tCalculate a control gate g_switch：

Wherein sigmoid (. cndot.) is an activation function, c_tIn the form of a context semantic vector,

the layer state is hidden for the second layer,

and

is a weight matrix, b_switchIs an offset;

the final hidden layer variables are calculated as follows:

wherein the content of the first and second substances,

to generate a partial hidden state, c_tIs a context semantic vector. And (5) obtaining the target word y by the final hidden layer variable h through a softmax (·) function_tIs represented by the following formula:

wherein sigmoid (. cndot.) is an activation function,is a matrix of the weights that is,is the bias and h is the final hidden layer variable.

5. The method for generating a text summary of a replication mechanism-based variation generation decoder according to claim 4, wherein the specific process of step S3 is: