CN111858931B - Text generation method based on deep learning - Google Patents

Text generation method based on deep learning Download PDF

Info

Publication number
CN111858931B
CN111858931B CN202010652675.XA CN202010652675A CN111858931B CN 111858931 B CN111858931 B CN 111858931B CN 202010652675 A CN202010652675 A CN 202010652675A CN 111858931 B CN111858931 B CN 111858931B
Authority
CN
China
Prior art keywords
training
vector
generator
text
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010652675.XA
Other languages
Chinese (zh)
Other versions
CN111858931A (en
Inventor
廖盛斌
余亚斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202010652675.XA priority Critical patent/CN111858931B/en
Publication of CN111858931A publication Critical patent/CN111858931A/en
Application granted granted Critical
Publication of CN111858931B publication Critical patent/CN111858931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text generation method based on deep learning. The method comprises training and testing, and is characterized in that the training comprises the following steps: constructing a training set, wherein the training set comprises a plurality of sample pairs consisting of the preprocessed topics and the corresponding texts; a pre-defined generator, wherein the generator is used for generating a text according to an input topic, pre-training the generator by using the training set, and adding an attention mechanism and a new historical memory information module in the encoding and decoding of the generator; predefining a classifier, and inputting the text output by the generator and the text in the training set into the classifier for countertraining; and performing reinforcement learning training on the generator according to the pre-trained generator and the classifier definition loss function. The invention has better text generation effect.

Description

Text generation method based on deep learning
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a text generation method based on deep learning.
Background
The advent of deep learning has enabled the development of artificial intelligence to move up a new step and quickly have profound effects in both academic and industrial areas. The method based on deep learning has become a mainstream method in the fields of computer vision, natural language processing and the like. The method based on deep learning in the natural processing field has also made great progress, for example, in the fields of machine translation, man-machine conversation, ancient poetry generation and the like, the method based on deep learning completely surpasses or even replaces the traditional machine learning method.
The automatic writing is an important artificial intelligence technology, writing or aided creation is performed by using artificial intelligence, a new creation method and a new approach are provided for human beings, the convenience and the speed of the automatic writing are greatly improved, and the daily writing mode of people is changed to a great extent. However, the former automatic writing is automatic writing based on a template, and although the automatic writing can be rapidly performed, the method has great defects in novelty and diversity and is difficult to meet the requirement of people on innovation.
The classic text generation method based on deep learning is an artificial neural network model based on a Recurrent Neural Network (RNN). Compressing the input information into a vector with a fixed length, and generating a text sentence by sentence through a neural network by using linear or nonlinear transformation. The method has a very obvious defect that the model compresses historical memory information into state vectors with the same length, and each word only considers the historical information transmitted by the previous word, so that the historical information is seriously lost, and the quality of the text generated later is increasingly poor.
Disclosure of Invention
Aiming at least one defect or improvement requirement of the prior art, the invention provides a text generation method based on deep learning, which has better text generation effect.
In order to achieve the above object, the present invention provides a text generation method based on deep learning, which comprises training and testing, wherein the training comprises the following steps:
constructing a training set, wherein the training set comprises a plurality of sample pairs consisting of the preprocessed topics and the corresponding texts;
the generator comprises an encoder and a decoder, wherein the encoder is used for encoding the input topics into word vectors, the decoder is a long-short term memory network using a recurrent neural network, the initial state vectors of the long-short term memory network use randomly initialized vectors, and the input of the long-short term memory network comprises the real output of the last time step, the topic vectors obtained by attention mechanism and the global history memory vectors;
predefining a classifier, and inputting the text output by the generator and the text in the training set into the classifier for countertraining;
and performing reinforcement learning training on the generator according to the pre-trained generator and the classifier definition loss function.
Preferably, the pre-treatment comprises: and performing keyword segmentation on the texts in the sample set, calculating tf-idf scores of all the keywords by using a tf-idf algorithm, and selecting a plurality of keywords with the highest scores as topics of each text.
Preferably, the global history memory vector is obtained according to a history memory matrix, the history memory matrix is composed of vectors with a length of L, the history memory matrix is initialized to 0 at first, the word vectors generated before are dynamically stored in the training process, and the history memory matrix is not updated in the training process of the generator.
Preferably, a gating network is used to obtain the currently required global history memory vector.
Preferably, the classifier comprises a convolutional layer, a pooling layer and a Highway network which are connected in sequence, and an objective function of the classifier uses a cross entropy loss function.
Preferably, the defining a loss function according to the pre-trained generator and the classifier is specifically: and using the expectation based on punishment as an objective function of reinforcement learning training, wherein the punishment function is obtained by jointly calculating according to the classifier and the generator.
Preferably, the hidden state vector s of the decodert=LSTM(st-1,[e(yt-1);ht-1;ct]) In which is the hidden state vector of the decoder t-1 time steps, ht-1Vector representing memory information of t-1 time steps, ctIs a context vector of a topic, ctThe method is obtained according to a multiplicative attention mechanism, and particularly according to the following formula:
gtj=va TCt-1,jtanh(Wast-1+Uae(τj))
αtj=softmax(gtj)
Figure GDA0003571983800000031
in the above formula, gtjRepresenting the t-th time step decoder to the j-th topic taujAttention weight of (a)tjIs to gtjNormalized attention weight, va、WaAnd UaAll are trainable parameters and are initialized by using standard normal distribution, C is a topic coverage vector, and C is used for coverage vector of jth topic at t-1 time stept-1jAnd (4) showing.
In general, compared with the prior art, the invention has the following beneficial effects:
(1) two attention mechanisms and a new history memory module are added to the coding and decoding structure. One attention mechanism is used for selectively focusing on a specific input feature vector, the new history memory module as an explicit information storage module can learn and characterize global history information, and the other attention mechanism is used for acquiring a global history information vector from the global history information storage module. The global history information is combined with local history information of a long-term memory network (LSTM) to further increase the long-term dependence capacity of the recurrent neural network on language.
(2) In order to further improve the effect, the ideas of reinforcement learning and neural network resistance are combined, and the topic relevance of the text is further increased.
(3) In addition, because topic-based text generation belongs to an open text generation task, the invention adopts a decoding mode based on Temperature sampling (Sample with Temperature) to increase the diversity of generated texts.
Drawings
FIG. 1 is a flowchart illustrating a text generation method based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a generator of an embodiment of the present invention;
FIG. 3 is a schematic diagram of an arbiter according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of reinforcement learning based on strategy gradients according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an anti-neural network training in accordance with an embodiment of the present invention;
fig. 6 is a schematic diagram of an experimental result of the text generation method based on deep learning according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the text generation method based on deep learning according to the embodiment of the present invention includes the following stages.
Stage 1: data preparation
In the data preparation stage, required text data is firstly acquired from a webpage crawler, and special symbols in the data are cleaned to acquire required training data.
Secondly, extracting keywords by using a tf-idf algorithm, which specifically comprises the following steps: performing word segmentation processing on all articles; removing stop words; according to tf-idfij=tfij×idfiThe tf-idf score is calculated. Wherein tf isijThe method is used for measuring the frequency of occurrence of the ith word in the jth text and is calculated according to the following formula:
Figure GDA0003571983800000041
wherein idfiThe reverse text frequency of the ith word is specifically calculated as follows:
Figure GDA0003571983800000042
calculated tf-idfijRepresenting the tf-idf score of the ith word in the jth article. Sorting the tf-idf scores of all words of each article in a descending order, taking the first 5 words as the keywords of the article, taking the extracted keywords as the topics of the article, then counting the word frequency of all topics, and sorting the words from each articleThe topic with lower frequency (the frequency is less than 100) is removed from the 5 topics in the chapter.
Randomly dividing the data of the articles and the topic data acquired above, wherein 85% of all the data are used as a training set, 5% are used as a verification set, and the rest 10% are used as a test set;
selecting a proper dictionary size | V |, and using the segmented text to construct the mapping from the vocabulary index to the word, wherein the specific method is that 4 markers are sequentially added at the beginning of the vocabulary, < PAD > represents a filling marker, < UNK > represents a word not in the vocabulary, < GO > represents a starting marker of the text, and < END > represents an ending marker of the text. For the new vocabulary, the words in the vocabulary are numbered from 0, a mapping from number to word is established, and a mapping from word to number is established.
In one embodiment, the method further comprises the following steps: and selecting a proper text length L, and preprocessing all texts by using the word list constructed above. Adding the above < GO > tag at the beginning of each text, adding < END > tags at the END of each text, and filling with < PAD > until the total length reaches L, directly intercepting the text length to L for the text with the length larger than L, and adding < END > termination tags at the END of the text; and (4) serializing all texts by using the mapping dictionary from the words to the numbers, and converting all the texts with the divided words into a word list number sequence for training and testing.
And (2) stage: pre-training generator model
Constructing a pre-training model, wherein the pre-training model specifically comprises an encoder and a decoder; the method comprises the steps that a coder codes topics into word vectors with proper dimensionality, a decoder uses a long-time and short-time memory network of a recurrent neural network, an initial state vector of the long-time and short-time memory network uses a vector initialized at random, a topic vector of each time step is obtained by using an attention mechanism, and the current topic vector represents topic semantic information included in a current word. The current input comprises a word vector mapped (corresponding) by a real output (group Truth) word of the previous time step, a topic vector obtained by an attention mechanism and a new historical memory vector, and the three vectors are spliced to form the current input. The method comprises the steps that a current input vector and the state of a previous time step are subjected to nonlinear transformation of a long-time memory network (LSTM) to obtain an output vector, the output vector is subjected to one-layer linear transformation to convert the last dimension into the size of a word list, and a softmax function is used for normalization to obtain the occurrence probability of each word in the word list of the current time step or the probability distribution of the current word list. We calculate the cross entropy loss function as the final objective function based on this probability distribution and the 0, 1 distribution of the training set labels, and then continuously adjust the parameters of the model according to the batch stochastic gradient descent algorithm and the appropriate learning rate until the model converges. The final probability component is used to decode to the final predicted word.
The pre-training generator model architecture is shown in FIG. 2, and specific inputs are set as n topic sets { τ [ ]1,τ2,...,τnN is the number of the preset maximum input topics. Mapping the input topic by a dictionary to obtain the unique id of each word, and then retrieving the e (tau) from the word vector matrixj) As a topic τjWord vector of (2), hidden state vector s of decodert=LSTM(st-1,[e(yt-1);ht-1;ct]) Wherein s ist-1Is the hidden state vector for the decoder t-1 time steps. h ist-1Vector representing memory information of t-1 time steps, ctIs a context vector of a topic, ctThe method is obtained according to a multiplicative attention mechanism, and particularly according to the following formula:
gtj=va TCt-1,jtanh(Wast-1+Uae(τj))
αtj=softmax(gtj)
Figure GDA0003571983800000061
in the above formula, gtjRepresenting the t-th time step decoder to the j-th topic taujThe attention weight of (a) is given,αtjis to gtjNormalized attention weight, va、WaAnd UaAre trainable parameters and are initialized using a standard normal distribution.
Meanwhile, in order to avoid over-expressing some topics and neglecting other topics during generation, a topic coverage vector C is also used. C is [0, 0, 0 ]]Initialization is performed to indicate that no topics are expressed at the beginning. C is also dynamically updated, and for topics which are expressed before, the attention weight of the topics is reduced, so that the chances of the topics being expressed next are reduced; and for topics which are not expressed, the opportunity that the topics are expressed next is increased by increasing the attention weight of the topics. C is used for coverage vector of jth topic at tth time stept,jIndicating that the update is performed according to the following equation:
Figure GDA0003571983800000062
wherein phijObtained according to the following formula:
φj=N·σ(Uf[e(τ1),e(τ2),...,e(τk)])
in the above formula, N is the number of input topics, Ufσ is a sigmoid activation function for trainable parameters.
Wherein h istObtained according to a history memory module, which mainly comprises a history memory matrix HMT ×ET represents the maximum length of the article, E represents the dimension of a word vector, the history memory matrix is initialized by using an all-0 matrix, the history memory matrix represents that no memory information is stored at the beginning, the word vectors generated at each time step before the decoder are dynamically stored in the training process, the word vectors are filled into the history memory matrix, and the history memory matrix is not subjected to parameter updating in the training process and is only used as a container for storing the word vectors.
Since the long-time memory network encodes the historical information into 2 vectors, a certain loss is caused, and the historical memory matrix is equivalent to a historical information enhancement module and is used for making up the loss of the historical information.
With the generation of new words, the history memory matrix is calculated as follows:
HMT×E(t)=e(yt-1)
to select the history, we use a gating network that uses the decoder's hidden state vector stAnd two trainable parameters WhAnd bhAs input, processing is performed using the tanh activation function, calculated as follows:
vt=tanh(Whst+bh)HM[t,;]
vector v obtained for the above gated networktPerforming softmax normalization as the weight of each word in the history memory module, and selecting a required history information vector h according to the weight and the history memory moduletThe specific calculation is obtained according to the following formula:
ht=softmax(vt)HMT×E
the distribution of the model at the t time step is finally obtained according to the following formula:
p(yt|y1:t-1,τ1:k)=softmaqx(Wost)
wherein WoAre learnable parameters and are initialized using a standard normal distribution.
In the training stage, the model is trained by using a cross entropy loss function as an objective function, and the formula is as follows:
Figure GDA0003571983800000081
where q (t) is the distribution of the real output, using one hot (one hot) encoding, and p (t) is the distribution predicted by the model.
In the prediction orderSegment, model predicts word y at time ttSampling was performed based on the following distribution:
yt~p(yt|y1:t-1,τ1:k)
and (3) stage: training multiple-question classifier
And constructing a multi-classification discriminator. As shown in fig. 2, the multi-classifier specifically includes a convolutional layer followed by a max-pooling layer, and a high speed Network (high-way Network), and the objective function uses a cross-entropy loss function. The data of the multi-classifier is from the training set and the data generated by the previous pre-training generator, the label dimension is T +1, T represents the number of labels contained in the data set, and the other label represents whether the current text is the data of the training set or the generated data of the pre-training generator.
The structure of the discriminator is shown in fig. 3, the discriminator is a text multi-classifier, and there are n +1 targets, where n targets are n topics to which the text belongs, and the other is a training sample for judging whether the text is generated by a model or actually. The input of the classifier is composed of real training data and data generated by a pre-training generator, and the input text sequence y1,y2,...,yTFeature vector obtained by a two-dimensional convolution
Figure GDA0003571983800000082
Figure GDA0003571983800000083
Representing concatenation of vectors, text feature sequence vectors pi1:T∈RT×E,ω∈Rl×EThe length of the convolution kernel is E and the dimension of the word vector is consistent, the width of the convolution kernel is l, the word convolution and the activation function are mapped to obtain the feature vector
Figure GDA0003571983800000084
Figure GDA0003571983800000085
Reuse of maximum pooling and one Highway network obtains final output class distribution Dφ(xj|y1:T). The goal is a cross entropy loss function, which is formulated as follows:
Figure GDA0003571983800000086
wherein x isjIs to input a text sequence y1:TThe topic label is coded by using one hot (one hot), the total number of the topic labels is n +1, the last label represents whether the text is real data, and the model updates parameters by using an Adam gradient descent algorithm.
The generator and the multi-classifier are initialized, and the word vector is randomly initialized by using a standard normal distribution. Other weights are initialized with a normal with a mean of 0 and a variance of 0.01. Initializing a proper batch size, namely the number of data pieces of the disposable feeding model, and initializing the learning rate to be 0.01;
pre-training a generator and a multi-classifier, randomly disordering training data in each round, pre-training the generator, generating data with the same quantity as a training set by using the pre-training generator, and disordering the multi-classified data set in each round; secondly, pre-training the multiple classifiers, and updating the weight of each layer of network by using a random gradient descent algorithm until the network converges.
And (4) stage: construction reinforcement learning generator
And constructing a reinforcement learning module, wherein the reinforcement learning module is composed of a new generator and the above multi-classifiers. And modifying a loss function by the new generator on the basis of the pre-training generator, using an expectation based on punishment as an objective function of the reinforcement learning model, and calculating the punishment function according to the multi-topic classifier and the new generator.
Since the target trained by Maximum Likelihood Estimation (MLE) is a solution with the maximum probability sought at each step, but words with low probability appear in the text, the model and the actual intention of generation are inconsistent, and the reinforced learning does not require that each step is the optimal solutionInstead, the solution seeks the accumulated solution with the maximum report, words with low probability are allowed to appear in the middle, the local optimal solution is pursued by the maximum likelihood, and the global optimal solution is pursued by the reinforcement learning, so the grammar generation rule which accords with human language cognition is more likely to be found by the reinforcement learning. As shown in fig. 4, in the text generation, the agent of reinforcement learning can be regarded as a generator, the graph is represented by G, the environment of reinforcement learning is represented by a multi-classifier D, the state of the agent is the set of the words (token) generated before, and the graph is represented by solid black dots. The action of reinforcement learning is the word (token) selectable by the next agent, and a method of strategy gradient is introduced, atRepresenting the tokens predicted by the t-th time step model, and the strategy pi represents all tokens with which the text is generated by us. Then reinforcement learning is to determine this strategy. Then we parameterize the strategy pi, Pθ(as) where s ═ a ═ s1,a2,...,anUnder the condition, the probability that the next token selected by the model agent is a. Here our strategy is Gθ(yt+1|y1:t) The loss function for deep reinforcement learning is as follows:
Figure GDA0003571983800000101
the Penalty factor Penalty is calculated according to whether the current sequence generates the last word or not according to the following two conditions:
Figure GDA0003571983800000102
as shown in fig. 4, the reward (reward) of the arbiter is not used directly as feedback for reinforcement learning, but the penalty is used as feedback to the agent from the environment, so our strategy should minimize the accumulated penalty. Since we need to obtain the current accumulated expected Penalty every action, we need to use Monte Carlo (MenterCarlo) to sample the remaining T-T tokens and calculate the current instantaneous accumulated Penalty (Penalty) with the discriminator when generating the T word. The addition of reinforcement learning can enable the generated text of us to have more topic relevance.
And (5) stage: using antagonistic neural network training
As shown in fig. 5, the antagonistic neural network is composed of a generator and a discriminator. The reinforcement learning generator described in the generator experiment stage 5, during the counterstudy, on one hand, the generator generates more training data and texts with topic relevance through learning evolution, and on the other hand, the discriminator identifies the texts generated by the generator through learning evolution and guides the evolution of the generator.
The counterlearning is added because when the generator is stronger by the prior reinforcement learning, the capability of the discriminator is weakened at this time, once the discrimination capability of the discriminator is weakened, the reward calculation of the reinforcement learning is biased, and because the existence of the sampling variance can make the reinforcement learning training unstable, after the parameters of the reinforcement learning are updated, a log-likelihood (MLE) objective function is added to correct the reinforcement learning generator, so as to slow down the fluctuation in the training process.
The training speed of the anti-neural network is slow and convergence is difficult because the anti-neural network needs to train the arbiter and the generator simultaneously in the training process. To solve these problems, on the one hand, the multi-topic classifier and the pre-training generator need to be pre-trained sufficiently before training against the neural network, which also helps the convergence of the model. On the other hand, a smaller training period (1-3 training periods) is selected during the counterlearning training, and an excessively large training period wastes computing resources and may cause overfitting.
And 6: model selection and testing
Since the task itself belongs to the open text generation, the use of greedy decoding or beam search (beamsearch) decoding makes the generated text too single and the duplication phenomenon is severe. Therefore, the embodiment of the invention uses a sampling decoding mode to increase the diversity of the generated text. Through sampling-based decoding, the embodiment of the invention can generate more diversified texts. And the sampling method can effectively avoid the repeated phenomenon of words. In addition, the embodiment of the invention uses the decoding based on sampling in the training and testing, and the problem of exposure bias (exposing bias) caused by the inconsistency of the decoding methods of the training and testing is relieved to a certain extent.
Compared with a plurality of models on the disclosed data set, experiments prove that the text generation method provided by the embodiment of the invention can generate texts with more topic relevance and is smoother and smoother. In the decoding process, the diversity of the generated text is increased and the occurrence of the repeated phenomenon of the text generation is reduced by using a decoding mode based on distributed sampling.
To verify the effectiveness of the embodiments of the present invention, experimental verification was performed on the disclosed data set:
the experiment adopts the zhihu data set disclosed by Haugha in 2018, and on the aspect of the BLEU score of the text automatic evaluation index, the model of the embodiment of the invention is improved by 37% compared with a baseline model and is improved by 6% compared with the current best model. In the aspect of manual evaluation, the model generation short text score of the embodiment of the invention also achieves the best result. The text generated by the method of the embodiment of the invention is shown in fig. 6.
It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A text generation method based on deep learning comprises training and testing, and is characterized by comprising the following steps:
stage 1: in the data preparation stage, required text data is firstly obtained from a webpage crawler, and special symbols in the data are cleaned to obtain required training data;
and (2) stage: constructing a pre-training model, wherein the pre-training model specifically comprises an encoder and a decoder; the method comprises the steps that a coder codes topics into word vectors with proper dimensionality, a decoder uses a long-time and short-time memory network of a recurrent neural network, an initial state vector of the long-time and short-time memory network uses a vector initialized at random, a topic vector of each time step is obtained by using an attention mechanism, and the current topic vector represents topic semantic information included in a current word;
setting specific input as n topic sets { tau1,τ2,...,τnN is the preset number of the maximum input topics, the input topics are mapped through a dictionary to obtain unique id of each word, and then e (tau) is obtained through retrieval from a word vector matrixj) As a topic τjWord vector of (2), hidden state vector s of decodert=LSTM(st-1,[e(yt-1);ht-1;ct]) Wherein s ist-1Is a hidden state vector, h, for the decoder at t-1 time stepst-1Vector representing memory information for t-1 time steps, ctIs a context vector of a topic, ctThe method is obtained according to a multiplicative attention mechanism, and particularly according to the following formula:
gtj=va TCt-1,jtanh(Wast-1+Uae(τj))
αtj=softmax(gtj)
Figure FDA0003571983790000011
in the above formula, gtjRepresenting the t-th time step decoder to the j-th topic taujAttention weight of (a)tjIs to gtjNormalized attention weight, va、WaAnd UaAre trainable parameters, using targetsInitializing a quasi normal distribution;
c is used for coverage vector of jth topic at tth time stept,jIndicating that the update is performed according to the following equation:
Figure FDA0003571983790000021
c is used for coverage vector of jth topic in t-1 time stept-1,jIt is shown that,
wherein phijObtained according to the following formula:
φj=N·σ(Uf[e(τ1),e(τ2),...,e(τk)])
in the above formula, N is the number of input topics, UfSigma is a sigmoid activation function for trainable parameters;
wherein h istObtained according to a history memory module, which mainly comprises a history memory matrix HMT×ET represents the maximum length of the article, E represents the dimension of a word vector, the history memory matrix is initialized by using a full 0 matrix, which represents that no memory information is stored at the beginning, the word vectors generated at each time step before the decoder are dynamically stored in the training process, and the word vectors are filled in the history memory matrix, the history memory matrix is not updated with parameters in the training process and is just used as a container for storing the word vectors,
with the generation of new words, the history memory matrix is calculated as follows:
HMT×E(t)=e(yt-1)
for selection of the history information a gating network is used which uses a hidden state vector s of the decodertAnd two trainable parameters WhAnd bhAs input, processing is performed using the tanh activation function, calculated as follows:
υt=tanh(Whst+bh)HM[t,;]
vector upsilon obtained by upper gating networktPerforming soft max normalization as the weight of each word in the history memory module, and selecting the required history information vector h according to the weight and the history memory moduletThe specific calculation is obtained according to the following formula:
ht=softmax(vt)HMT×E
distribution p (y) of model at t time stept|y1:t-1,τ1:k) Obtained according to the following formula:
p(yt|y1:h-1,τ1:k)=softmax(Wost)
wherein WoAre learnable parameters, are initialized using a standard normal distribution,
in the training phase, the model is trained by using a cross entropy loss function as an objective function, and the formula is as follows:
Figure FDA0003571983790000031
where q (t) is the distribution of the true output using one-hot encoding and p (t) is the distribution predicted by the model;
in the prediction phase, the model predicts the word y at time ttSampling was performed based on the following distribution:
yt~p(yt|y1:t-1,τ1:k)
and (3) stage: training a multi-topic classifier, constructing a multi-classification discriminator which is a text multi-classifier with n +1 targets, wherein the n targets are n topics to which texts belong, the other one is used for judging whether the texts are model generated or actual training samples, the input of the classifier consists of two parts, namely real training data and data generated by a pre-training generator, and the input text sequence y is1,y2,...,yTFeature vector obtained by a two-dimensional convolution
Figure FDA0003571983790000032
Representing concatenation of vectors, text feature sequence vectors pi1:T∈RT×E,ω∈Rl×EThe length of the convolution kernel is E and the dimension of the word vector is consistent, the width of the convolution kernel is l, the word convolution and the activation function are mapped to obtain the feature vector
Figure FDA0003571983790000033
And obtaining the final output class distribution D by using the maximum pooling and a Highway networkφ(xj|y1:T);
And (4) stage: constructing a reinforcement learning generator, wherein in the text generation, an agent for reinforcement learning can be regarded as the generator and is represented by G, the environment for reinforcement learning is represented by a multi-classifier D, the state of the agent is the set of words generated previously, the words selectable by the agent in the next step of the action for reinforcement learning are introduced into a strategy gradient method, atRepresenting words predicted by the tth time step model, the strategy pi represents all tokens of the text generated by the strategy, and the reinforcement learning is to determine the strategy pi, parameterize the strategy pi, and Pθ(as) denotes a symbol in s ═ a1,a2,...,anUnder the condition, the probability that the next word selected by the model is a is determined, and the strategy is Gθ(yt+1|y1:t) Loss function J of deep reinforcement learningGThe following were used:
Figure FDA0003571983790000041
the Penalty factor Penalty is calculated according to whether the current sequence generates the last word or not according to the following two conditions:
Figure FDA0003571983790000042
and (5) stage: training by using an antagonistic neural network, wherein the antagonistic neural network consists of a generator and a discriminator, and in the process of antagonistic learning, on one hand, the generator generates training data and texts with topic correlation through learning evolution, and on the other hand, the discriminator identifies the texts generated by the generator through learning evolution and guides the evolution of the generator;
and 6: selecting and testing a model;
the training comprises the steps of:
constructing a training set, wherein the training set comprises a plurality of sample pairs consisting of preprocessed topics and corresponding texts;
the generator comprises an encoder and a decoder, wherein the encoder is used for encoding the input topics into word vectors, the decoder is a long-short term memory network using a recurrent neural network, the initial state vectors of the long-short term memory network use randomly initialized vectors, and the input of the long-short term memory network comprises the real output of the last time step, the topic vectors obtained by attention mechanism and the global history memory vectors;
predefining a classifier, and inputting the text output by the generator and the text in the training set into the classifier for countertraining;
and performing reinforcement learning training on the generator according to the pre-trained generator and the classifier definition loss function.
2. The method of claim 1, wherein the preprocessing comprises: and performing keyword segmentation on the texts in the sample set, calculating tf-idf scores of all the keywords by using a tf-idf algorithm, and selecting a plurality of keywords with the highest scores as topics of each text.
3. The method as claimed in claim 1, wherein the global history memory vector is obtained from a history memory matrix, the history memory matrix is composed of vectors with length L, the history memory matrix is initially initialized to 0, the word vectors generated before are dynamically stored in the training process, and the history memory matrix is not updated during the training process of the generator.
4. The method of claim 3, wherein a gating network is used to obtain the global history memory vector needed currently.
5. The deep learning-based text generation method of claim 1, wherein the classifier comprises a convolutional layer, a pooling layer and a Highway network which are connected in sequence, and an objective function of the classifier uses a cross entropy loss function.
6. The method for generating text based on deep learning of claim 1, wherein the defining a loss function according to the pre-trained generator and the classifier is specifically: and using the expectation based on punishment as an objective function of reinforcement learning training, wherein the punishment function is obtained by jointly calculating according to the classifier and the generator.
7. The text generation method based on deep learning of claim 1,
hidden state vector s of the decodert=LSTM(st-1,[e(yt-1);ht-1;ct]) In which is the hidden state vector of the decoder t-1 time steps, ht-1Vector representing memory information for t-1 time steps, ctIs a context vector of a topic, ctThe method is obtained according to a multiplicative attention mechanism, and particularly according to the following formula:
gtj=va TCt-1,jtanh(Wast-1+Uae(τj))
αtj=softmax(gtj)
Figure FDA0003571983790000061
in the above formula, gtjRepresenting the t-th time step decoder to the j-th topic taujAttention weight of (a)tjIs to gtjNormalized attention weight, va、WaAnd UaAll are trainable parameters and are initialized by using standard normal distribution, C is a topic coverage vector, and C is used for coverage vector of jth topic at t-1 time stept-1,jAnd (4) showing.
CN202010652675.XA 2020-07-08 2020-07-08 Text generation method based on deep learning Active CN111858931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010652675.XA CN111858931B (en) 2020-07-08 2020-07-08 Text generation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010652675.XA CN111858931B (en) 2020-07-08 2020-07-08 Text generation method based on deep learning

Publications (2)

Publication Number Publication Date
CN111858931A CN111858931A (en) 2020-10-30
CN111858931B true CN111858931B (en) 2022-05-13

Family

ID=73153043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010652675.XA Active CN111858931B (en) 2020-07-08 2020-07-08 Text generation method based on deep learning

Country Status (1)

Country Link
CN (1) CN111858931B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11989939B2 (en) 2021-03-17 2024-05-21 Samsung Electronics Co., Ltd. System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training
CN113435183B (en) * 2021-06-30 2023-08-29 平安科技(深圳)有限公司 Text generation method, device and storage medium
CN113505611B (en) * 2021-07-09 2022-04-15 中国人民解放军战略支援部队信息工程大学 Training method and system for obtaining better speech translation model in generation of confrontation
CN114266320A (en) * 2021-12-30 2022-04-01 北京天融信网络安全技术有限公司 Model training method, password cracking method, device and electronic equipment
CN114444488B (en) * 2022-01-26 2023-03-24 中国科学技术大学 Few-sample machine reading understanding method, system, equipment and storage medium
CN114629699B (en) * 2022-03-07 2022-12-09 北京邮电大学 Migratory network flow behavior anomaly detection method and device based on deep reinforcement learning
CN114818666B (en) * 2022-04-26 2023-03-28 广东外语外贸大学 Evaluation method, device and equipment for Chinese grammar error correction and storage medium
CN114925658B (en) * 2022-05-18 2023-04-28 电子科技大学 Open text generation method and storage medium
CN115481630A (en) * 2022-09-27 2022-12-16 深圳先进技术研究院 Electronic insurance letter automatic generation method and device based on sequence countermeasure and prior reasoning
CN115630640B (en) * 2022-12-23 2023-03-10 苏州浪潮智能科技有限公司 Intelligent writing method, device, equipment and medium
CN116127051B (en) * 2023-04-20 2023-07-11 中国科学技术大学 Dialogue generation method based on deep learning, electronic equipment and storage medium
CN116957056B (en) * 2023-09-18 2023-12-08 天津汇智星源信息技术有限公司 Feedback-based model training method, keyword extraction method and related equipment
CN117610548B (en) * 2024-01-22 2024-05-03 中国科学技术大学 Multi-mode-based automatic paper chart title generation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN109241377A (en) * 2018-08-30 2019-01-18 山西大学 A kind of text document representation method and device based on the enhancing of deep learning topic information
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10387464B2 (en) * 2015-08-25 2019-08-20 Facebook, Inc. Predicting labels using a deep-learning model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN109241377A (en) * 2018-08-30 2019-01-18 山西大学 A kind of text document representation method and device based on the enhancing of deep learning topic information
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method

Also Published As

Publication number Publication date
CN111858931A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111858931B (en) Text generation method based on deep learning
CN107358948B (en) Language input relevance detection method based on attention model
CN108984526B (en) Document theme vector extraction method based on deep learning
CN110188358B (en) Training method and device for natural language processing model
CN109614471B (en) Open type problem automatic generation method based on generation type countermeasure network
CN109522411A (en) A kind of writing householder method neural network based
CN112487820B (en) Chinese medical named entity recognition method
CN108830287A (en) The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method
CN111738007B (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN108984524A (en) A kind of title generation method based on variation neural network topic model
CN109918510A (en) Cross-cutting keyword extracting method
CN111160467A (en) Image description method based on conditional random field and internal semantic attention
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN109800434A (en) Abstract text header generation method based on eye movement attention
CN111078866A (en) Chinese text abstract generation method based on sequence-to-sequence model
CN113673535B (en) Image description generation method of multi-modal feature fusion network
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN109214006A (en) The natural language inference method that the hierarchical semantic of image enhancement indicates
Zulqarnain et al. An improved deep learning approach based on variant two-state gated recurrent unit and word embeddings for sentiment classification
Duan et al. Temporality-enhanced knowledgememory network for factoid question answering
CN109308316B (en) Adaptive dialog generation system based on topic clustering
Yang et al. Recurrent neural network-based language models with variation in net topology, language, and granularity
CN111353040A (en) GRU-based attribute level emotion analysis method
Yang et al. Sequence-to-sequence prediction of personal computer software by recurrent neural network
CN113836934B (en) Text classification method and system based on tag information enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant