CN112560438A

CN112560438A - Text generation method based on generation of confrontation network

Info

Publication number: CN112560438A
Application number: CN202011364634.7A
Authority: CN
Inventors: 王俊丽; 吴雨茜; 韩冲; 张超波
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-03-26

Abstract

A text generation method based on generation of a countermeasure network relates to the field of text generation, in particular to a text generation method based on generation of the countermeasure network. The technical scheme of the invention is to utilize the text data in the real data set to obtain the 'true value' capable of guiding the network so as to achieve the purpose of improving the convergence speed of the generated network. The method generates the distance between the text and the real text by using cosine distance and other quantities, and adds the distance to the objective function of the generated network, so that the distance is gradually optimized in the training process. In addition, the structure of the discrimination network is improved by adding a self-attention mechanism in the input layer, so that the network can obtain richer semantic and context information, and the performance of the discrimination network is optimized. The invention can generate the text data conforming to the logic more stably.

Description

Text generation method based on generation of confrontation network

Technical Field

The invention relates to the field of text generation, in particular to a text generation method based on generation of a countermeasure network.

Background

As an important research direction in the field of natural language processing, the text generation technology has great application prospect. For example, the method can be applied to systems of intelligent question answering, dialogue, machine translation and the like, and more intelligent and natural human-computer interaction is realized; automatic writing and publishing of news can also be achieved by the text generation system instead of editing.

Text generation refers to making a computer have human-level language expression capability through the technology of machine learning and natural language processing, and comprises machine translation, sentence generation, dialog generation and the like. Usually, the strategy of text generation is based on a language model, which is a probability-based model, and the next most likely word can be predicted according to input data, and text is used as sequence data, and context exists between words, so that semantic modeling can be performed by using a recurrent neural network and its variants (RNN, LSTM, GRU), and the like.

Character-based Recurrent Neural Network (Char-RNN) is a classic text generation technique based on deep learning. The input and output of Char-RNN are all character units, and the learning goal of the network is to make the next character consistent with the target output of the training sample. In addition to the basic recurrent neural network, the classical generative model-Variational Automatic Encoder (VAE) in deep learning is often used in the task of text generation. The VAE adds constraint to the encoder to enable the encoder to generate potential variables which obey Gaussian distribution, and a loss function of the network consists of two parts, namely a reconstruction error which can be measured by a mean square error; the other is the difference between the distribution of latent variables and the gaussian distribution, which can be measured by KL divergence.

Similar to VAE, the Generative Adaptive Network (GAN) contains two submodels: generating a model G and a discriminant model D. The generated model is used for simulating the distribution of real data, the discrimination model is used for judging whether a sample is a real sample or a generated sample, and the training target of the network is to enable the generated model to perfectly fit the distribution of the real data, so that the discrimination model cannot be distinguished. GAN is very suitable for continuous data such as images, but has a difficulty in the case of discrete data such as texts, the most fundamental reason is that probability output is extracted into discrete output in the sampling process, and researchers have already fine-tuned the calculation mode inside GAN to directly improve the network. From the perspective of replacing KL divergence, the Wasserstein GAN enables the return value of the network to be smoother, and the effect is obvious after the Wassertein GAN is applied to text generation. Some scholars improve the Softmax function, and a Gumbel-Softmax mode is adopted to achieve certain effect.

Reinforcement Learning (RL) is one of the more advanced Learning methods, and SeqGAN first proposes to combine Reinforcement Learning with generative countermeasure network, which is an outstanding representative of GAN application in text generation. There is a document that proposes to use a discriminator D as a source of a Reward value Reward in reinforcement learning, a generator finds a word whose Reward is expected to be the largest word by word, and the discriminator gives a Reward score to a generated sequence in units of whole sentences. During the training process, the judgers and the generators are alternately carried out, and the training of the judgers is carried out once after the training of the generators of multiple rounds.

The proposal of SeqGAN focuses on the application of reinforcement learning in combination with GAN in text generation, but the potential of GAN in the field of natural language processing is limited due to the difficult training problem of GAN. Since the output of the generator G is discrete, it is difficult for the discriminator D to return to update the gradient of the generator G, so it is necessary to use the idea in reinforcement learning for reference. SeqGAN treats the word that has been generated as the current state (state) and the next word to be generated as the action (action). Because the discriminator D can only grade one complete action, the possibility of the subsequent occurrence of each action is complemented by adopting a Monte Carlo tree searching mode, the discriminator D judges the complete sequences to generate a Reward value Reward, the Reward value Reward is transmitted back to the generator G to update parameters, and a network capable of generating the next optimal action is trained in a reinforcement learning mode.

The subsequent person improves the method continuously on the basis, JiWei et al apply the method in a conversation generation task, adopt a classic seq2seq model as a generator, and also take the result of a discriminator D as Reward in reinforcement learning, but the calculation is different, and a paper randomly selects one from subsequences of a positive sequence and a negative sequence to train the discriminator D; the LeakGAN mainly guides the generator to generate a more complete sequence from the angle of the length of the input text through the characteristics extracted by the discriminator at the intermediate time step; cooperative Training provides a new network Training mode, mainly aims to solve the problem that SeqGAN needs to be pre-trained in a maximum likelihood mode, introduces a harmonic mixer M, and trains a generator G by replacing a true value with an estimated value of the harmonic mixer, thereby obtaining a very good effect.

Disclosure of Invention

The invention aims to provide a text generation method based on a generation countermeasure network, and more stable generated texts are obtained.

The invention utilizes the text data in the real data set to obtain the true value capable of guiding the network so as to achieve the purpose of improving the convergence speed of the generated network. The method generates the distance between the text and the real text by using cosine distance and other quantities, and adds the distance to the objective function of the generated network, so that the distance is gradually optimized in the training process. In addition, the structure of the discrimination network is improved by adding a self-attention mechanism in the input layer, so that the network can obtain richer semantic and context information, and the performance of the discrimination network is optimized. The invention can generate the text data conforming to the logic more stably.

The technical scheme of the invention is as follows:

a text generation method based on generation of a confrontation network comprises the following steps:

and (1) preprocessing and vectorizing the text data, and providing the preprocessed and vectorized text data for the steps (2), (3) and (4).

Some spoken stop words and special symbols exist in the text, which form large noise and affect the training of the model, so that preprocessing work such as data cleaning and stop word removal is required.

The short text data set is preprocessed and represented as word-embedding-based short text vector data. After the text content words are vectorized, the semantic information of the text content words can be more easily expressed, and meanwhile, the calculation is convenient.

And (3) constructing and training a discriminator network, and providing the calculation result of the trained discriminator to the loss function in the step (3).

The discriminator is named discriminator D.

The role of the discriminator is to discriminate between the generated data of the generator and the actual text in the data set. Real sample data is marked as 1 as a positive example, a randomly initialized vector is marked as 0 as a negative example, and a discriminator D is alternately trained to enable the discriminator to have certain discrimination capability, wherein the discriminator not only adopts a convolutional neural network CNN, but also is added with self-attention layers to acquire richer semantic information, and can better learn the associated information of the context and the importance of the current word.

(3) Building and training generator networks

The generator needs to be guided by a loss function to obtain a vector which is more approximate to real text data, in an improved SeqGAN model, the result of the discriminator in the step (2) is used as a Reward value Reward, the generator G is optimized in an enhanced learning mode, the cosine difference value of a generator sample and the real text is added to be used as a part of Reward values, and the generator is guided to generate the vector which is more approximate to the true value.

(4) Alternate training to obtain stable text generation model

The generator is named generator G.

And (3) alternately training the generator G in the step (2) and the discriminator D in the step (3) in a countermeasure generating mode, designing a countermeasure loss function, and obtaining a new text data set through an Adam optimizer so as to solve the text classification problem under the unbalanced condition. The optimal weight parameters are searched using the generate counter-loss function and the Adam optimizer for generating new text data.

The invention focuses on the steps.

The step (2) comprises the following substeps:

a) defining an initial state as S_RandomIt refers to a discrete vector generated randomly, and the length is the maximum length of the text sequence. S_RandomAfter pre-training, obtaining a text vector S with certain semantic expression capability₀The vector will become the initial input state for the generative model. During state transition, the text vector S_iIs continuously updated, and the updating direction is determined by the reward value scored by the discriminator D. As shown in FIG. 4, the target sequence is "I have an applet and it standards good. The initial state is a random sequence vector denoted S_RandomA text with ambiguous semantic information is represented, and after maximum likelihood pre-training, an initial sequence S of the input SeqGAN is obtained₀. For each sequence state S_iThe value of the prize value determines which state S to transfer to next_i+1After n epochs, the target state S is finally reached_epoch。

b) Reward value Q for a single time step_D ^G(θ|Si)The method is characterized in that the total reward value used for calculating the sentence generated by the whole network is accumulated according to the reward value corresponding to the new word at each time step. If the word generated at the current moment is not the word at the end of the sentence, then all possible sequences are completed by Monte Carlo search and averaged.

c) To add truth to guide the convergence of the generating network, a similarity L is defined_simAnd sentence confidence F. L is_simAnd measuring the similarity between the current sequence and the real text data by means of Euclidean distance, cosine distance and the like. F represents the proportion of the number of words generated at the current time step to the total number of words in the sentence, when the proportion is larger than a set threshold value, the current sentence is credible, the calculation result is added into the final loss function, otherwise, the sentence is over-high in randomness and is not added into the calculation of the loss function.

d) The general goal of generating the Network part optimization is composed of two items of Network-Reward and TG-Reward. And calculating the Reward value corresponding to each time step i by adopting a reinforcement learning mode according to the Network-Reward loss item, wherein the Reward value is used as a part of the cost function of the generator G. And the TG-Reward is obtained by calculating the word vector distance between the sentence obtained by the last step of calculation and a target sentence which is really expected to be generated in the database.

The loss function in the truth guidance model is defined as:

wherein the content of the first and second substances,

is the prize value for the current text sequence; loss term L_simIs the generation result G (theta | S) of the generator G_i) The distance difference from the real text data gt, parameters α and β are coefficients that determine the weights of the two terms, and T represents the maximum number of iterations and is a natural number.

That is, the generator G wants to minimize the cost function, so that the discriminator D cannot distinguish the self-generated text from the real text data, and the discriminator D tries to maximize the difference to distinguish whether the current text data is the generated data.

In the step (3), the discrimination network not only comprises a convolutional neural network CNN for extracting the strongest semantic information, but also is added with a layer of semantic information which is extracted from an attention layer and contains context in parallel, and then the two results are fused to obtain the final score of the discrimination reward value. In the self-attention layer, the focus of attention and the attention resource are both vector matrices of the input generator, where the formula of the attention weight is:

f(x_i)＝∑_jV^T·tanh(U·x_i). Wherein x_iIs an input, U and V are learnable network parameters, and tanh is an activation function.

Drawings

Fig. 1 is a block diagram of the architecture of the present invention.

Fig. 2 is a diagram of a truth-based guided SeqGAN model.

FIG. 3 shows the results of the comparative experimental results of the model of the present invention (comparing the results of the NLL-test loss experiments under different network structures).

Fig. 4 text vector Si is updated flow chart (example).

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the text generation method according to the embodiment of the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the embodiments described herein are only for illustrating the present invention and are not intended to limit the present invention, i.e., the scope of the present invention is not limited to the following embodiments, but rather, appropriate changes may be made by one skilled in the art according to the inventive concept of the present invention, which may fall within the scope of the invention as defined by the claims.

As shown in the block diagram of fig. 1, the method according to the embodiment of the present invention includes the following steps:

1) the short text data set is preprocessed and represented as word-embedding-based short text vector data. The MS COCO short text data set published by Microsoft is adopted in the experiment, the MS COCO data set originates from the Microsoft COCO data set annotated in the year 2014 by Microsoft, and the MS COCO short text data set is a large and rich object detection, segmentation and caption data set. Including example segmentation of 80 class objects, segmentation of 91 classes of items, keypoint detection of person examples, and 5 image titles per image.

2) The input of the discriminator D is a two-part vector of the generator G's generated sample and the real data sample, and the size of the batch is set to 64. Dropout is used to prevent overfitting, and is set to a value of 0.75 before the output layer. The method comprises the steps of taking real sample data as a positive example, taking a randomly initialized vector as a negative example, and alternately training a discriminator D to enable the discriminator to have certain discrimination capability, wherein the discriminator not only adopts a convolutional neural network CNN, but also adds a self-attention layer self-attention to obtain richer semantic information. In the convolutional neural network, the number of convolution kernels is set to 100 and the size is set to 2. This step trains the discriminator for 40 turns, so that the discriminator has certain discrimination capability.

3) The discriminator G uses a recurrent neural network LSTM with long and short term memory, with a hidden layer size of 32, using Adam as the optimizer, adding l2 regularization to prevent overfitting. And the result of the discriminator is used as an incentive value Reward, the generator G is optimized in an reinforcement learning mode, the cosine difference value of the generator sample and the real text is added as a part of the incentive value, and the generator is guided to generate a vector which is closer to the true value.

The invention improves the loss function and the network structure of the classic SeqGAN, utilizes the cosine function to enable the generation result of the generator G to be closer to the real text data, and simultaneously uses a self-attention mechanism to obtain more comprehensive text information at the input layer of the discriminator D, thereby improving the discrimination capability of the discriminator. The overall framework of the model is shown in FIG. 2, the Reward value returned by the discriminator is divided into two parts, namely Network-Reward and TG-Reward, wherein the Network-Reward is the discrimination score of the discriminator D on the sampled complete sequence, and the closer the discrimination score is to a real text sequence, the closer the score is to 1; and the TG-Reward part represents the semantic distance between the currently generated sequence and the real text data, and the smaller the distance, the closer the text sequence is to the expression of a real text.

4) Unlike the classical SeqGAN, the Reward value Reward in reinforcement learning consists of two parts, one is the scoring result of the discriminator D and the other is the spatial distance from the real text data. In the experiment, the smaller the value of total Reward, the better. In the specific implementation process, an optimal loss function is found by adjusting the proportionality coefficients of the two parts of Reward, and the optimal loss function is finally set to be 0.001. And designing a countermeasure loss function in a mode of generating countermeasures, and obtaining a new text data set through an Adam optimizer so as to solve the text classification problem under the unbalanced condition. And stopping training when the training round reaches the expected set round.

The experimental results show that:

as can be seen from fig. 3, the self-attribute structure is very helpful for extracting semantic information in the original text, and the self-attention and convolution parallel extraction of semantic features is performed on the text sequence generated by the generator G, so that the convergence of the discrimination network can be accelerated, the generated text with a smaller NLL-TEST Loss is obtained, and the effect of 0.1% is improved on the whole. And the method of replacing the convolutional posing layer by an attention mechanism reduces the performance of the model on the contrary because the expression of the strongest semantic information is lost. Experiments prove that SeqGAN based on truth guidance improves the convergence rate of a text generation model, the quality of texts generated by a network after stabilization is improved, and NLL-TEST loss indexes are improved.

Claims

1. A text generation method based on generation of a confrontation network is characterized by comprising the following steps:

step (1), preprocessing and vectorizing text data, and providing the preprocessed and vectorized text data for the steps (2), (3) and (4);

carrying out data cleaning and stop word preprocessing; short text vector data represented as word embedding-based;

step (2) constructing and training a discriminator network, and providing the calculation result of the trained discriminator to the loss function in the step (3);

the discriminator is named as discriminator D;

the discriminator is used for distinguishing the generated data of the generator and the real text in the data set; real sample data is marked as 1 as a positive example, a randomly initialized vector is marked as 0 as a negative example, and a discriminator D is alternately trained to enable the discriminator to have certain discrimination capability, the discriminator not only adopts a convolutional neural network CNN, but also is added with a self-attention layer to acquire richer semantic information, and the context associated information and the importance of the current word can be learned;

(3) building and training generator networks

The generator needs to be guided by a loss function to obtain a vector which is more approximate to real text data, in an improved SeqGAN model, the result of the discriminator in the step (2) is used as a Reward value Reward, the generator G is optimized in an enhanced learning mode, the cosine difference value of a generator sample and a real text is added to be used as a part of Reward value, and the generator is guided to generate a vector which is more approximate to a true value;

(4) alternate training to obtain stable text generation model

Naming the generator as generator G;

alternately training the generator G in the step (2) and the discriminator D in the step (3) in a countermeasure generating mode, designing a countermeasure loss function, and obtaining a new text data set through an Adam optimizer so as to solve the text classification problem under the unbalanced condition; the optimal weight parameters are searched using the generate counter-loss function and the Adam optimizer for generating new text data.

2. The method of claim 1, wherein step (2) comprises the sub-steps of:

a) defining an initial state as S_RandomIt refers to a discrete vector generated randomly, and the length is the maximum length of the text sequence. S_RandomAfter pre-training, obtaining a text vector S with certain semantic expression capability₀The vector will become the initial input state of the generative model; during state transition, the text vector S_iIs continuously updated, and the updating direction is determined by the reward value after the score is given by the discriminator D;

b) reward value Q for a single time step_D ^G(θ|Si)The method is characterized in that the total reward value used for calculating the sentence generated by the whole network is accumulated according to the reward value corresponding to the new word at each time step. If the word generated at the current moment is not the word at the tail of the sentence, completing all possible sequences through Monte Carlo search and averaging;

c) to add truth to guide the convergence of the generating network, a similarity L is defined_simAnd sentence confidence F. L is_simMeasuring the similarity between the current sequence and the real text data, and measuring the similarity through Euclidean distance, cosine distance and other modes; f represents the proportion of the number of words generated at the current time step to the total number of words in the sentence, when the proportion is larger than a set threshold value, the current sentence is credible, the calculation result is added into the final loss function, otherwise, the sentence is over-high in randomness and is not added into the calculation of the loss function;

d) the general goal of generating the optimization of the Network part consists of two items of Network-Reward and TG-Reward; the Network-Reward loss item adopts a reinforcement learning mode to calculate an award value corresponding to each time step i, and the award value is used as a part of a generator G cost function; the TG-Reward is obtained by calculating the word vector distance between the sentence obtained by the last step of calculation and a target sentence which is really expected to be generated in a database;

the loss function in the truth guidance model is defined as:

wherein the content of the first and second substances,

is the prize value for the current text sequence; loss term L_simIs the generation result G (theta | S) of the generator G_i) The distance difference with the real text data gt, parameters alpha and beta are coefficients for determining the weights of the two items, and T represents the maximum iteration frequency and is a natural number;

3. The method according to claim 1, wherein in step (3), the discriminant network includes not only the convolutional neural network CNN for extracting the strongest semantic information, but also a layer of semantic information including context extracted in parallel from the attention layer, and then the two results are merged to obtain the final score of the discriminant reward value. In the self-attention layer, the focus of attention and the attention resource are both vector matrices of the input generator, where the formula of the attention weight is: f (x)_i)＝∑_jV^T·tanh(U·x_i) (ii) a Wherein x_iIs an input, U and V are learnable network parameters, and tanh is an activation function.