WO2020151310A1

WO2020151310A1 - Text generation method and device, computer apparatus, and medium

Info

Publication number: WO2020151310A1
Application number: PCT/CN2019/116941
Authority: WO
Inventors: 毕野; 黄博; 吴振宇; 王建明
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-24
Filing date: 2019-11-11
Publication date: 2020-07-30
Also published as: CN109885667A

Abstract

A text generation method and device, an apparatus and a medium in the field of model construction. The method comprises: acquiring a positive text sample from a real text data set; establishing an initial generator model, using the positive text sample to pre-train the initial generator model so as to acquire a generator model, and using the generator model to generate a negative text sample; establishing an initial discerning model, and using the positive text sample and the negative text sample to perform pre-training so as to acquire a discerning model; causing the generator model and the discerning model to continuously confront each other, and updating parameters of the models; when the discerning model converges, acquiring a text generation model according to the generator model at the time of convergence; and acquiring text to be identified, inputting the text into the text generation model, and generating target text on the basis of the text generation model. The text generation method improves the efficiency of text generation model construction and the accuracy of text generation.

Description

Text generation method, device, computer equipment and medium

This application is based on the Chinese invention patent application filed on January 24, 2019 with the application number 201910067379.0 and titled "text generation method, device, computer equipment and medium", and claims its priority.

Technical field

This application belongs to the field of model construction, and more specifically, relates to a text generation method, device, computer equipment and medium.

Background technique

With the development of science and technology, we hope that computers can write like humans and can write high-quality natural language texts. The automatic text generation technology is the key technology to achieve this goal.

At present, a commonly used method is to use Long Short-Term Memory Networks (LSTM) for text generation. LSTM is a type of Recurrent/Recursive Neural Network (RNN). Among them, the common way to train RNN is maximum likelihood estimation, that is, given the first t-1 words, the next word is given by maximizing the log likelihood of the t-th word. However, the disadvantage of using RNN is that it will produce a gradually increasing deviation, because when generating a sentence, RNN is generated word by word, and the next word is generated on the basis of the previous word, which leads to A deviation, and as the length of the sequence increases, the deviation will become larger.

In addition, RNN cannot be self-improved. For some applications of RNN, a minimizing loss function can be added to improve the model. But for the text generation model, because the input data is discrete data, there is no directly usable loss function, and there is no suitable way to guide the text generation model to self-improve to obtain close to real output.

To sum up, the current models used to generate text are inefficient, and it is urgent to find a text generation model that can generate text faster and more accurately.

Summary of the invention

The embodiments of the present application provide a text generation method, device, computer equipment, and storage medium to solve the current problem of low efficiency of text generation.

A text generation method, including:

Obtaining a real text data set, and obtaining a positive text sample from the real text data set;

Establishing an initial generator model, inputting the positive text samples to the initial generator model for pre-training, obtaining a generator model, and generating a first negative text sample according to the generator model;

Establishing an initial discriminator model, and inputting the positive text samples and the first negative text samples into the initial discriminating model for pre-training to obtain the discriminator model;

Generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and calculate the gradient of the generator model according to the reward value. Gradient update the generator model;

Generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross-entropy;

Alternately updating the generator model and the discriminator model, and if the output of the discriminator model converges, obtain a text generation model according to the generator model at the time of convergence;

Obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.

A text generating device includes:

A text positive sample acquisition module, used to acquire a real text data set, and acquire a positive text sample from the real text data set;

The generator model acquisition module is used to establish an initial generator model, input the positive text samples to the initial generator model for pre-training, obtain a generator model, and generate a first negative text sample according to the generator model ；

The discriminator model acquisition module is used to establish an initial discriminator model, and input the positive text samples and the first negative text samples into the initial discriminant model for pre-training to obtain the discriminator model;

The generator model update module is configured to generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, and calculate the generator according to the reward value The gradient of the model, and update the generator model according to the gradient;

The discriminator model update module is used to generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and according to minimize cross entropy Update the discriminator model;

A text generation model acquisition module, configured to alternately update the generator model and the discriminator model, if the output of the discriminator model converges, obtain a text generation model according to the generator model at the time of convergence;

The target text generation module is used to obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.

A computer device includes a memory, a processor, and computer readable instructions that are stored in the memory and can run on the processor, and the processor implements the above text generation method when the computer readable instructions are executed.

One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the above text generation method.

The details of one or more embodiments of the present application are presented in the following drawings and descriptions, and other features and advantages of the present application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present application. For some embodiments, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative labor.

FIG. 1 is a schematic diagram of an application environment of a text generation method in an embodiment of the present application;

Figure 2 is a flowchart of a text generation method in an embodiment of the present application;

Fig. 3 is another flowchart of a text generation method in an embodiment of the present application;

FIG. 4 is another flowchart of a text generation method in an embodiment of the present application;

FIG. 5 is another flowchart of a text generation method in an embodiment of the present application;

Fig. 6 is another flowchart of a text generation method in an embodiment of the present application;

FIG. 7 is a functional block diagram of a text generating device in an embodiment of the present application;

FIG. 8 is a functional block diagram of the generator model acquisition module in the text generation device in an embodiment of the present application;

9 is a functional block diagram of the discriminator model acquisition module in the text generation device in an embodiment of the present application;

Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.

detailed description

The following will clearly and completely describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The text generation method provided in this application can be applied to the application environment as shown in Figure 1, where the client communicates with the server through the network, the server obtains the real text data set through the client, and the text positive from the real text database. Sample; then build an initial generator model based on the input of the client, input the positive text samples to the initial generator model for pre-training, obtain the generator model, and generate the first negative text sample according to the generator model; then build according to the input of the client The initial discriminator model, the positive sample of the text and the negative sample of the first text are input into the initial discriminator model for pre-training to obtain the discriminator model; then the server generates test text based on the generator model, and inputs the test text to the discriminator Obtain the reward value of the test text in the model, calculate the gradient of the generator model according to the reward value, and update the generator model according to the gradient; the server generates a second negative text sample according to the updated generator model, and compares the second negative text sample with The positive samples of text are input into the discriminator model, and the discriminator model is updated according to the minimized cross entropy; the generator model and the discriminator model are alternately updated. If the output of the discriminator model converges, the text generation model is obtained according to the generator model at the time of convergence ; Finally, the text to be recognized is obtained, and the text to be recognized is input into the text generation model, and the target text is generated based on the text generation model and returned to the client. Among them, the client can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented with an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a text generation method is provided. The method is applied to the server in FIG. 1 as an example for description, which may specifically include the following steps:

S10: Obtain a real text data set, and obtain a positive text sample from the real text data set.

Among them, the real text data set refers to the original text data set corresponding to the text that is expected to be finally output by the text generation model. For example, if the output of the text generation model is expected to be poems, the real text data set is a data set composed of various poems. The text in this embodiment can be a poem, an answer to a question or a dialogue, etc. This embodiment uses the final output poem as an example for description.

Among them, the text positive sample refers to multiple samples extracted from the real text data set, for example, multiple poems extracted from the real text data set.

Specifically, a large number of data sets of poems can be collected in advance and stored in the database of the server as real text data sets. At the beginning of training, the server randomly obtains real text data sets from the database, and extracts some poems (samples) from the real text data sets. ) As a positive sample of the text.

In an embodiment, as shown in FIG. 3, in order to better train the generator model and the discriminator model, the real text data set may be converted into the form of a vector, that is, step S10 may specifically include the following steps:

S11: Select N text data from the real text data set, where N is a positive integer.

Specifically, the server selects N samples from the database as positive text samples, where N is a positive integer. It can be understood that the more N samples are extracted, the better the training effect. Optionally, which samples are specifically selected as positive text samples can be obtained through the input of the client, for example, the client inputs the sample number, and then the server selects the corresponding sample from the database according to the sample number input by the client.

S12: Convert the N text data into a vector form using the word vector model, and use the N text data converted into the vector form as a positive text sample.

Among them, the word vector model is the word2vec model, and the word2vec model includes two neural network structures, namely CBOW and Skip-gram. Specifically, the server can input the poem (real text data set) into the word2vec model for training. After the training is completed, the word2vec model can be used to map each entry of the poem to a vector. For example, if a poem can be represented as {term 1, term 2, ..., term n}, the word2vec algorithm can be used to transform the term i into x _i , and the poem can be represented by a vector as X ^T =( x ₁ ,x ₂ ,···x _T ).

Specifically, the server converts the selected N text data into a vector form through the word2vec model, and then converts these N text data into a vector form as a positive text sample.

In the embodiment corresponding to FIG. 3, by selecting N text data from the real text data set, and then converting the N text data into a vector form using the word vector model, and finally converting the N text data into the vector form as the text positive sample. By converting the text data into a vector form, the terms in the text can be more relevant, which facilitates the subsequent training of the generator model and the discriminator model.

S20: Establish an initial generator model, input the positive text samples into the initial generator model for pre-training, obtain the generator model, and generate the first negative text sample according to the generator model.

It should be understood that the initial generator model and subsequent initial discriminator models are all models constructed based on neural networks. Optionally, since the input data text is discrete data, the establishment of the initial generator model can use a recurrent neural network (RNN); in order to speed up the training of the neural network and reduce the amount of calculation, the establishment of the initial discriminator model can be used Product neural network (CNN). Optionally, the establishment of the initial generator model and the initial discriminator model can also use other neural networks, which are not specifically limited here. In this embodiment, the initial generator model is a recurrent neural network and the initial discriminator model is a convolutional neural network.

Specifically, the parameters of the RNN are randomly selected to establish the initial generator model. After the initial generator model is established, the positive text samples obtained in step S10 are input into the initial generator model for pre-training, and the generator model can be obtained after pre-training. Then generate some negative samples as the first text negative samples according to the generator model, so as to pre-train the initial discriminator model. It should be understood that the initial generator model and the generator model are only for distinguishing the neural network before and after pre-training. Optionally, the server can also select additional sample data from the real text data set and input it into the initial generator model for pre-training.

S30: Establish an initial discriminator model, and input the positive text samples and the first negative text samples into the initial discriminant model for pre-training to obtain the discriminator model.

Specifically, the server randomly selects the parameters of the CNN to establish an initial discriminator model. After the initial discriminator model is established, the obtained positive text samples and the first negative text samples are respectively labeled. Exemplarily, the positive sample of the text may be labeled as 1, and the negative sample of the first text may be labeled as 0. Then the labeled positive text samples and the first negative text samples are input into the initial discriminator model for pre-training to obtain the discriminator model. Among them, there are N positive text samples and the first negative text samples. The number of N can be determined according to the actual situation. The more samples, the higher the discrimination accuracy of the obtained discriminator model. Using CNN to build the discriminator model is because an appropriate pooling layer can be set in the CNN, and the pooling operation can prevent the discriminator model from overfitting the data, speed up the discriminator model training, and reduce the amount of calculation.

S40: Generate a test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and update the generator model according to the gradient.

Among them, the reward value of the test text refers to the value output by the discriminator model.

Specifically, the server uses the generator model to generate the test text, then inputs the test text into the discriminator model, and obtains the value output by the discriminator model as the reward value. In order to continuously improve the generator model, the generator model here uses the Policy Gradient in Reinforcement Learning (RL), that is, when the output value of the discriminator model to the test text is relatively high, increase The probability of the corresponding action of the RNN in the generator model; when the output value of the discriminator model to the test text is relatively low, reduce the probability of the corresponding action of the RNN in the generator model. It should be understood that the level of the output value of the discriminator model is relative Concept, the value of different training stages is different, you can preset according to experience, for example, when the output of the generator model is relatively poor at the beginning of training, you can set the output of the discriminator model to be higher than 0.3 Set the value lower than 0.2 as a relatively low value; in the later stage of training, you can set the discriminator model output higher than 0.4 as a relatively high value, and set lower than 0.3 as a relatively low value.

Specifically, the strategy gradient of the generator is calculated according to the reward value of the test text, and finally the generator model is updated with the calculated strategy gradient. Expressed by the following formula:

among them,

Refers to the strategy gradient, J(θ) refers to the objective function of the generator model, E refers to the expected value, G _θ refers to the generator model, Y _1:t-1 ～G _θ refers to the text Y generated by the generator model Obey the probability distribution G _θ , G _θ (y _t |Y _1:t-1 ) refers to the probability that y _t appears in Y _{1:t-1 under} the generator model, D _φ refers to the discriminator model,

It refers to the reward of the text generated by the generator model G _θ in the discriminator model D _φ . The expectation in the above gradient can be approximated by sampling, and then update the generator model G _θ parameter θ:

Among them, α _h refers to the learning rate of the hidden layer.

S50: Generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross entropy.

Specifically, the server uses the updated generator model to generate some texts as the second text negative samples, and then respectively label the second text negative samples and the text positive samples and input them into the discriminator model for training. Among them, the second text negative sample is labeled as 0, and the text positive sample is labeled as 1. It should be understood that the text positive sample and the previous training text positive sample can be the same sample, or another sample can be extracted from the real text data set. The sample data is used as a positive sample of the text.

It should be understood that the purpose of the training of the discriminator model is that when the input is real text data, the output value is as close to 1 as possible; when the input is text generated by the generator, the output value is as close to 0 as possible. It can output an accurate value when given an arbitrary sample. Specifically, the following minimizing cross entropy can be used to obtain the pre-trained discriminator parameters:

Among them, the discriminator D _φ (Y) returns the probability that the sample Y belongs to the real sample, which is a number belonging to [0,1]. Y～p _data indicates that Y obeys the probability distribution p _data , and p _data refers to the probability distribution obeyed by the real text data set. Y～G _θ means that Y obeys the probability distribution G _θ , and E means the expected value; minimizing cross entropy can make the first part and the second part of the above formula as large as possible, that is, the probability of real data is as large as possible, and the probability of generating data is as large as possible small.

According to the minimized cross entropy, the parameters of the discriminator model can be updated, and the discriminator model can be updated. Among them, when the discriminator model is updated, it is based on the fixed generator model, and the number of times to update the discriminator model can be multiple times, which is specifically set according to the actual situation, and no specific limitation is made here.

S60: alternately update the generator model and the discriminator model. If the output of the discriminator model converges, the text generation model is obtained according to the generator model at the time of convergence.

Specifically, the server alternately updates the generator model and the discriminator model, that is, when the discriminator model does not converge, the generator model and the discriminator model are repeatedly updated, so that the generator model and the discriminator model continue to fight against training. Among them, when updating, the generator model is updated first, and the discriminator model remains unchanged; then the generator model is kept unchanged, and the discriminator model is updated. That is, let the parameters of the discriminator model be fixed, train the generator model; then let the parameters of the generator model be fixed, train the discriminator model; repeat this process until the output of the discriminator model converges. If the output of the discriminator model converges, the text generation model is obtained according to the generator model at the time of convergence. Among them, output convergence means that the value of the discriminator's output for a given sample (positive sample or negative sample) is close to 0.5, then the discriminator is considered to be unable to distinguish between positive and negative samples, and the server determines that the output of the discriminator has converged, and then generates according to the convergence The machine model can get the final text generation model.

S70: Obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.

Among them, the text to be recognized is the input of the text generation model, and the target text is the output of the text generation model. It can be understood that the text to be recognized and the target text correspond to the real text data set, that is, if the data set of poetry is used to train the text generation model, the text to be recognized and the target text corresponding to the text generation model are also poems; if you use dialogue To train the text generation model, the text to be recognized and the target text corresponding to the text generation model are also dialogues. Optionally, the to-be-recognized text and the target text may also be answers to questions, speech scripts, or short essays.

Specifically, the server obtains the to-be-recognized text input by the user through the client, and then inputs the to-be-recognized text into the text generation model, the text generation model generates the target text, and the server then outputs the target text to the client. For example, the server obtains the above text in the dialog input by the user through the client, such as "How is the weather today?", and then the server inputs the above text in the dialog into the text generation model, and the text generation model generates and uploads The text corresponds to the target text below, such as: "Today's weather is very good!" or "According to the weather forecast, it will rain today." etc., so as to form a corresponding dialogue, and finally the server will output the target text to the client.

In the embodiment corresponding to FIG. 2, by obtaining the real text data set, the positive text samples are obtained from the real text data set; the initial generator model is established, and the positive text samples are input to the initial generator model for pre-training to obtain the generator model , And generate the first negative text sample according to the generator model; establish an initial discriminator model, and input the positive text sample and the first negative text sample into the initial discriminant model for pre-training to obtain the discriminator model; generate tests based on the generator model Text, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and update the generator model according to the gradient; generate the second negative text sample according to the updated generator model, Input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross entropy; alternately update the generator model and the discriminator model, if the output of the discriminator model converges, then according to the convergence time The generator model obtains a text generation model; obtains the text to be recognized, inputs the text to be recognized into the text generation model, and generates target text based on the text generation model. By constructing the generator model and the discriminator model, and then constantly confronting the generator model and the discriminator model, and continuously improving themselves, the text generation model can be quickly constructed, and the accuracy of the generated text is high, which improves the construction efficiency of the text generation model. The precision of the generated text.

In one embodiment, as shown in FIG. 4, in step S20, an initial generator model is established, the positive text samples are input to the initial generator model for pre-training, the generator model is obtained, and the first generator model is generated according to the generator model. A negative sample of text can specifically include the following steps:

S21: Input the initial generation parameters into the recurrent neural network to establish an initial generator model.

Optionally, the initial generation parameters may be randomly selected recurrent neural network (RNN) parameters. That is, before the pre-training, the parameters can be randomly selected and input into the RNN to obtain the initial generator model.

S22: Input the positive text sample into the initial generator model for pre-training, and convert it into a probability output according to the probability distribution function to obtain pre-trained parameters.

Specifically, the server inputs the positive text samples into the initial generator model for pre-training. For example, the positive text samples are (x ₁ ,x ₂ ,···x _T ), and first (x ₁ ,x ₂ ,·· · x _T) in the recursive mapping RNN to hidden _{_{(h 1, h 2, ···}} h T), wherein the input means is hidden parameter in the hidden layer recurrent neural networks (hidden layers), but also a neuron The output parameters are expressed by the following formula:

h _t ＝g(h _t-1 ,x _t )=σ(Wx _t +Uh _t-1 )

Among them, W is the weight matrix, and U is the hidden state of h _t-1 (or called the transition matrix). σ can be a sigmoid function or a hyperbolic tangent function (tanh), and σ can be determined according to specific circumstances.

Then, use the probability distribution function to convert to the output probability. Optionally, the probability distribution function can be a soft max function, expressed by the following formula:

P(y _t |x ₁ ,x ₂ ,···x _t )=z(h _t )=soft max(c+Vh _t )

Among them, the above formula means that when (x ₁ , x ₂ , ···x _T ) is known, the distribution of the output y _t of the RNN is soft max (c+Vh _t ), and z(h _t ) means A function z of h _t is needed to convert the output into the form of probability. The output value belongs to [0,1]. This function z can be taken as the soft max function.

Specifically, after the server inputs the positive text samples into the RNN of the initial generator model for pre-training, the pre-trained parameters c and V can be obtained.

S23: Update the parameters of the initial generator model according to the pre-trained parameters to obtain the generator model.

Specifically, the original initial generation parameters of the initial generator model are updated according to the parameters c and V obtained after the pre-training to obtain the generator model. It will be appreciated, the model generator G _θ can be expressed by the parameter c can be obtained and the model parameter V [theta] of the generator model G _θ. After the generator model G _{θ is} obtained, certain sample data can be extracted from the real text data set and input into the generator model G _θ to generate the first negative text sample.

In the embodiment corresponding to Fig. 4, the initial generator model is established by inputting the initial generation parameters into the recurrent neural network, and then the positive text samples are input into the initial generator model for pre-training, and the probability distribution function is converted into probability output. Obtain the pre-trained parameters; finally update the parameters of the initial generator model according to the pre-trained parameters to obtain the generator model. Building a generator model through a recursive neural network can combine the characteristics of text generation as discrete data, so that the final text model output text is more efficient; in addition, the generator model can be pre-trained first, and the pre-trained generator model can be used Generate some negative samples to achieve pre-training of the discriminator model.

In one embodiment, as shown in FIG. 5, in step S30, an initial discriminator model is established, and the positive text samples and the first negative text samples are input to the initial discriminator model for pre-training, and the discriminator model is obtained. It can include the following steps:

S31: Input the initial discriminating parameters into the convolutional neural network to establish an initial discriminator model.

Optionally, the initial discriminating parameter may be a randomly selected convolutional neural network (CNN) parameter, that is, before pre-training, the randomly selected parameter may be input to the CNN to obtain the initial discriminator model.

S32: Input the text positive sample and the first text negative sample into the initial discriminator model for pre-training, transform it into a probability output according to the probability distribution function, and update the initial discriminating parameters of the initial discriminator according to the minimized cross entropy to obtain pre-training After the discriminant parameters.

Specifically, the training samples are labeled, that is, the positive text samples are labeled as 1, and the negative text samples are labeled as 0.

First, input the positive sample of text, such as (x ₁ , x ₂ ,...x _T ) into the CNN of the initial discriminant model, and CNN uses the convolution kernel ω ∈ R ^{l×k to} act on the positive sample of the text to obtain the positive The characteristics of the sample are expressed by the following formula:

Among them, the convolution kernel ω∈R ^l×k indicates that the convolution kernel is a real matrix of l×k, and ε _i:i+l-1 refers to the i-th to i+l-1 in the positive sample of the text Row is also a real matrix of l×k, b is the required parameter, which is a real number,

Refers to the sum of the products of corresponding elements in the matrix.

Then use Max pooling for pooling:

Among them, the above pooling refers to the feature c _{i of the} extracted text positive samples taking the maximum value. Optionally, average pooling can also be used here, which is not specifically limited.

After a certain number of convolution and pooling operations, it passes through a fully connected layer (FC), that is, an output layer, and is converted into a probability output using a sigmoid function.

Similarly, the first negative text sample marked as 0 is input into the CNN, and after the same process, the sigmoid function is used to convert it into a probability output.

Finally, after the pre-training of the text positive sample and the first text negative sample, the pre-training discriminant parameters, namely ω and b, can be obtained.

Optionally, in order to get a good effect on the discriminator model, get

In the future, a high-speed neural network can be used to train the discriminator model. The high-speed neural network can be calculated by the following formula:

Among them, τ refers to a set of behavior sequences that generate text, W _T , b _T and W _H are the weights of the high-speed layer, H is an affine transformation plus a nonlinear activation function (such as linear rectification function ReLU), denoted as linear The rectification function is f, then

Finally, the sigmoid function is used to convert into probability output.

Among them, W ₀ and b ₀ are the weights and deviations of the output layer of the discriminator.

S33: Update the parameters of the initial discriminator model according to the discriminant parameters after pre-training to obtain the discriminator model.

Specifically, the parameters of the initial discriminator model are updated according to the discriminant parameters ω and b after pre-training to obtain the discriminator model. It can be understood that the discriminator model can be represented by D _φ , where the parameters φ of the discriminator model can be obtained from the parameters ω and b. After obtaining the discriminator model, you can conduct confrontation training between the generator model and the discriminator model, and alternately update the generator model and the discriminator model until the model converges to obtain the final text generation model.

In the embodiment corresponding to FIG. 5, the initial discriminator model is established by inputting the initial discriminant parameters into the convolutional neural network; then the positive text samples and the first negative text samples are input into the initial discriminator model for pre-training, according to the probability The distribution function is transformed into a probability output, and the initial discriminant parameters of the initial discriminator are updated according to the minimized cross entropy to obtain the discriminant parameters after pre-training; the discriminant parameters after the final pre-training update the parameters of the initial discriminator model to obtain the discriminator model. The initial discriminator model is trained through the negative samples generated by the generator model and the positive text samples, and the discriminator model can be obtained. After the discriminator model is obtained, the generator model and the discriminator model can be trained against each other to finally generate text generation model.

In one embodiment, as shown in FIG. 6, in step S40, the test text is generated based on the generator model, the test text is input into the discriminator model to obtain the reward value of the test text, and the generator model is calculated according to the reward value. Gradient, and update the generator model according to the gradient, which can specifically include the following steps:

S41: Obtain the text in the process of generating the test text as the test sub-text.

It can be understood that the generator model will have many intermediate steps in the process of generating test text. For example, if the final generated text is "Moonlight in front of the bed", then the generator model will generate "bed", "before the bed", " "Before the bed"... Wait for the text in these processes, and the server can obtain the test text in these processes as the test sub-text.

S42: Use Monte Carlo search to generate M hypothetical texts according to each test sub-text.

Among them, the Monte Carlo search method (Monte Carlo method) refers to the use of random numbers (or more commonly pseudo-random numbers) to solve calculation problems.

It should be understood that since the discriminator model can only judge the authenticity of a whole sentence, when the generator model generates the test text, it is necessary to obtain the reward value of the test sub-text, so that the generator model can learn and gradient Calculation. Specifically, the server uses the Monte Carlo search method to generate N hypothetical texts according to the test sub-texts, and then input the N hypothetical texts into the discriminator model to obtain the reward value, and use the average of these reward values as the reward value of the test sub-text . Specifically, using Monte Carlo search to generate N hypothetical texts can be expressed by the following formula:

The above formula means that N hypothetical texts are generated by Monte Carlo search method under the condition of given test sub-text Y _1:t . Among them, the Monte Carlo search must follow a probability distribution, and this probability distribution is G _β , where G _β =G _θ , that is, N hypothetical texts can be generated using the Monte Carlo search method.

S43: Input M hypothetical texts into the discriminator model, obtain the reward average value of the M hypothetical texts as the reward value of the test sub-text, and input the test text into the discriminator model to obtain the reward value of the test text.

Specifically, the following formula can be used to calculate the test sub-text and the reward value of the test text:

Among them, the discriminator model D _φ (Y) returns the probability that the test sample Y belongs to the real sample, which is a number belonging to [0,1]; T time refers to the completion of the entire poem, so the reward value at T can be directly Given by the discriminator. And t=1: the reward value at time T-1 (that is, t from 1 to time T-1) needs to be given by Monte Carlo search simulation. The test sub-text at time t is Y _1:t-1 ,, and then use Monte Carlo search for N times to obtain N hypothetical texts Y _1:T , and use the average of the reward values of these N hypothetical texts as t The reward value at the moment. In this way, since each intermediate step defines a reward value, the generator model can be trained in reinforcement learning (RL).

S44: Calculate the gradient of the generator model according to the reward value of the test sub-text and the reward value of the test text, and update the parameters of the generator model according to the gradient to obtain the updated generator model.

Specifically, after obtaining the reward value of the test sub-text and the reward value of the test text, the following formula can be used to calculate the policy gradient of the generator model:

The expected E in the above gradient can be approximated by sampling, and then the parameter θ of the generator model is updated as

Then, the server obtains the updated generator model according to the parameters of the updated generator model, and then uses the updated generator model to update the discriminator model, alternately updating the generator model and the discriminator model, until the discriminator model converges , And finally get the text generation model according to the generator model at the time of convergence. Among them, when the generator model is updated, it is performed on the basis of a fixed discriminator model, and the number of times to update the parameters of the generator model can be set according to the actual situation, which is not specifically limited here.

In the embodiment corresponding to FIG. 6, by obtaining the text in the process of generating the test text as the test sub-text, according to each test sub-text, the Monte Carlo search method is used to generate M hypothetical texts; then the M hypothetical texts are input to the discrimination In the model, the average value of the reward of M hypothetical texts is obtained as the reward value of the test sub-text, and the test text is input into the discriminator model to obtain the reward value of the test text; finally, according to the reward value of the test sub-text and the test text The reward value calculates the gradient of the generator model, and updates the parameters of the generator model according to the gradient to obtain the updated generator model. By using the Monte Carlo search method, the intermediate text generated by the generator model can be rewarded accordingly, so that reinforcement learning can be used to train the generator model, and the training efficiency of the generator model can be improved.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

In one embodiment, a text generation device is provided, and the text generation device corresponds to the text generation method in the above-mentioned embodiment one-to-one. As shown in FIG. 7, the text generation device includes a text positive sample acquisition module 10, a generator model acquisition module 20, a discriminator model acquisition module 30, a generator model update module 40, a discriminator model update module 50, and a text generation model acquisition module. Module 60. The detailed description of each functional module is as follows:

The text positive sample obtaining module 10 is used to obtain a real text data set, and obtain a text positive sample from the real text data set;

The generator model acquisition module 20 is used to establish an initial generator model, input positive text samples into the initial generator model for pre-training, obtain the generator model, and generate the first negative text sample according to the generator model;

The discriminator model acquisition module 30 is used to establish an initial discriminator model, and input the positive text samples and the first negative text samples into the initial discriminant model for pre-training to obtain the discriminator model;

The generator model update module 40 is used to generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and update the generator model according to the gradient ；

The discriminator model update module 50 is configured to generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross entropy;

The text generation model acquisition module 60 is used to alternately update the generator model and the discriminator model. If the output of the discriminator model converges, the text generation model is obtained according to the generator model at the time of convergence;

The target text generation module 70 is configured to obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.

Further, the text positive sample acquisition module 10 is also used for:

Select N text data from the real text data set, where N is a positive integer;

The N text data are converted into a vector form using the word vector model, and the N text data converted into the vector form are used as the text positive samples.

Further, as shown in FIG. 8, the generator model acquisition module 20 includes an initial generation model establishment unit 21, an initial generation model pre-training unit 22 and a generator model acquisition unit 23.

The initial generation model establishment unit 21 is configured to input initial generation parameters into the recurrent neural network to establish an initial generator model;

The initial generation model pre-training unit 22 is used to input positive text samples into the initial generator model for pre-training, and convert it into a probability output according to the probability distribution function to obtain pre-trained parameters;

The generator model obtaining unit 23 is configured to update the parameters of the initial generator model according to the pre-trained parameters to obtain the generator model.

Further, as shown in FIG. 9, the discriminator model acquisition module 30 includes an initial discriminant model establishment unit 31, an initial discriminant model pre-training unit 32 and a discriminator model acquisition unit 33.

The initial discriminant model establishment unit 31 is used to input initial discriminant parameters into the convolutional neural network to establish an initial discriminator model;

The initial discriminant model pre-training unit 32 is used to input the positive text sample and the first negative text sample into the initial discriminator model for pre-training, convert it into a probability output according to the probability distribution function, and update the initial discriminator according to the minimized cross entropy The initial discriminant parameters of, and the discriminant parameters after pre-training are obtained;

The discriminator model acquisition unit 33 is configured to update the parameters of the initial discriminator model according to the discriminant parameters after pre-training to obtain the discriminator model.

Further, the generator model update module 40 is also used to:

Obtain the text in the process of generating the test text as the test sub-text;

Use Monte Carlo search method to generate M hypothetical texts according to each test sub-text;

Input M hypothetical texts into the discriminator model, obtain the reward average value of the M hypothetical texts as the reward value of the test sub-text, and input the test text into the discriminator model to obtain the reward value of the test text;

The gradient of the generator model is calculated according to the reward value of the test sub-text and the reward value of the test text, and the parameters of the generator model are updated according to the gradient to obtain the updated generator model.

For the specific limitation of the text generation device, please refer to the above limitation of the text generation method, which will not be repeated here. Each module in the above-mentioned text generation device can be implemented in whole or in part by software, hardware, and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer equipment includes a processor, a memory, a network interface and a database connected by a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store real text data sets, text positive samples, text negative samples, and word vector models. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a text generation method.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:

Obtain real text data sets, and obtain positive text samples from real text data sets;

Establish an initial generator model, input positive text samples into the initial generator model for pre-training, obtain the generator model, and generate the first negative text sample according to the generator model;

Establish an initial discriminator model, input the positive text sample and the first negative text sample into the initial discriminant model for pre-training to obtain the discriminator model;

Generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and update the generator model according to the gradient;

Generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross entropy;

Alternately update the generator model and the discriminator model. If the output of the discriminator model converges, the text generation model is obtained according to the generator model at the time of convergence;

In one embodiment, one or more readable storage media storing computer readable instructions are provided, and when the computer readable instructions are executed by one or more processors, the one or more processors execute The following steps:

Wherein, the readable storage medium includes a non-volatile readable storage medium and a volatile readable storage medium.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units, Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A text generation method, characterized in that it comprises:

Obtaining a real text data set, and obtaining a positive text sample from the real text data set;

Establishing an initial generator model, inputting the positive text samples to the initial generator model for pre-training, obtaining a generator model, and generating a first negative text sample according to the generator model;

Establishing an initial discriminator model, and inputting the positive text samples and the first negative text samples into the initial discriminating model for pre-training to obtain the discriminator model;

Generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and calculate the gradient of the generator model according to the reward value. Gradient update the generator model;

Generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross-entropy;

Alternately updating the generator model and the discriminator model, and if the output of the discriminator model converges, obtain a text generation model according to the generator model at the time of convergence;

Obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.
The text generation method according to claim 1, wherein said obtaining a positive text sample from said real text data set comprises:

Select N text data from the real text data set, where N is a positive integer;

The N pieces of text data are converted into a vector form using a word vector model, and the N pieces of text data converted into a vector form are used as text positive samples.
The text generation method according to claim 1, wherein said establishing an initial generator model, inputting said positive samples of text to said initial generator model for pre-training, obtaining a generator model, and according to said The generator model generates the first negative sample of text, including:

Input the initial generation parameters into the recurrent neural network to establish the initial generator model;

Inputting the positive text sample into the initial generator model for pre-training, and converting it into a probability output according to a probability distribution function, to obtain pre-trained parameters;

The parameters of the initial generator model are updated according to the pre-trained parameters to obtain a generator model.
The text generation method according to claim 1, wherein said establishing an initial discriminator model, inputting said positive text samples and said first negative text samples into said initial discriminant model for pre-training to obtain The discriminator model includes:

Input the initial discriminating parameters into the convolutional neural network to establish an initial discriminator model;

Input the positive text sample and the first negative text sample into the initial discriminator model for pre-training, transform it into a probability output according to the probability distribution function, and update the initial discriminator according to minimized cross entropy Discriminant parameters to obtain discriminant parameters after pre-training;

The parameters of the initial discriminator model are updated according to the discriminant parameters after the pre-training to obtain the discriminator model.
The text generation method according to claim 1, wherein the test text is generated based on the generator model, and the test text is input into the discriminator model to obtain the reward value of the test text, according to The reward value calculating the gradient of the generator model and updating the generator model according to the gradient includes:

Acquiring the text in the process of generating the test text as the test sub-text;

According to each of the test sub-texts, M hypothetical texts are generated using a Monte Carlo search method;

Input the M hypothetical texts into the discriminator model, obtain the reward average value of the M hypothetical texts as the reward value of the test sub-text, and input the test text into the discriminator model To obtain the reward value of the test text;

Calculate the gradient of the generator model according to the reward value of the test sub-text and the reward value of the test text, and update the parameters of the generator model according to the gradient to obtain an updated generator model.
A text generating device, characterized in that it comprises:

A text positive sample acquisition module, used to acquire a real text data set, and acquire a positive text sample from the real text data set;

The generator model acquisition module is used to establish an initial generator model, input the positive text samples to the initial generator model for pre-training, obtain a generator model, and generate a first negative text sample according to the generator model ；

The discriminator model acquisition module is used to establish an initial discriminator model, and input the positive text samples and the first negative text samples into the initial discriminant model for pre-training to obtain the discriminator model;

The generator model update module is configured to generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, and calculate the generator according to the reward value The gradient of the model, and update the generator model according to the gradient;

The discriminator model update module is used to generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and according to minimize cross entropy Update the discriminator model;

A text generation model acquisition module, configured to alternately update the generator model and the discriminator model, if the output of the discriminator model converges, obtain a text generation model according to the generator model at the time of convergence;

The target text generation module is used to obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.
7. The text generation device of claim 6, wherein the generator model acquisition module includes an initial generation model establishment unit, an initial generation model pre-training unit, and a generator model acquisition unit;

The initial generation model establishment unit is used to input initial generation parameters into the recurrent neural network to establish an initial generator model;

The initial generation model pre-training unit is configured to input the positive text sample into the initial generator model for pre-training, and convert it into a probability output according to the probability distribution function to obtain pre-trained parameters;

The generator model obtaining unit is configured to update the parameters of the initial generator model according to the pre-trained parameters to obtain a generator model.
7. The text generating device according to claim 6, wherein the discriminator model acquisition module comprises an initial discriminant model establishment unit, an initial discriminant model pre-training unit, and a discriminator model acquisition unit;

The initial discriminant model establishment unit is used to input initial discriminant parameters into the convolutional neural network to establish an initial discriminator model;

The initial discriminant model pre-training unit is used to input the positive text samples and the first negative text samples into the initial discriminator model for pre-training, convert them into probability output according to the probability distribution function, and according to the minimum Update the initial discriminating parameters of the initial discriminator to obtain the discriminating parameters after pre-training;

The discriminator model acquisition unit is configured to update the parameters of the initial discriminator model according to the discriminant parameters after the pre-training to obtain the discriminator model.
The text generation device according to claim 6, wherein the generator model update module is also used to obtain the text in the process of generating the test text as the test sub-text; Monte Carlo search is adopted according to each test sub-text Generate M hypothetical texts; input M hypothetical texts into the discriminator model, obtain the reward average of M hypothetical texts as the reward value of the test sub-text, and input the test text into the discriminator model to obtain the reward of the test text Value; calculate the gradient of the generator model according to the reward value of the test sub-text and the reward value of the test text, and update the parameters of the generator model according to the gradient to obtain the updated generator model.
The text generation device according to claim 6, wherein the text positive sample acquisition module is further used to select N text data from the real text data set, where N is a positive integer; and use the word vector model for the N text data Converted into vector form, and N text data converted into vector form are used as text positive samples.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:

Obtaining a real text data set, and obtaining a positive text sample from the real text data set;

Establishing an initial generator model, inputting the positive text samples to the initial generator model for pre-training, obtaining a generator model, and generating a first negative text sample according to the generator model;

Establishing an initial discriminator model, and inputting the positive text samples and the first negative text samples into the initial discriminating model for pre-training to obtain the discriminator model;

Generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and calculate the gradient of the generator model according to the reward value. Gradient update the generator model;

Generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross-entropy;

Alternately updating the generator model and the discriminator model, and if the output of the discriminator model converges, obtain a text generation model according to the generator model at the time of convergence;

Obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.
11. The computer device of claim 11, wherein said obtaining a positive text sample from said real text data set comprises:

Select N text data from the real text data set, where N is a positive integer;

The N pieces of text data are converted into a vector form using a word vector model, and the N pieces of text data converted into a vector form are used as text positive samples.
The computer device according to claim 11, wherein the initial generator model is established, the positive text samples are input to the initial generator model for pre-training to obtain the generator model, and the generator model is generated according to the The generator model generates the first negative sample of text, including:

Input the initial generation parameters into the recurrent neural network to establish the initial generator model;

Inputting the positive text sample into the initial generator model for pre-training, and converting it into a probability output according to a probability distribution function, to obtain pre-trained parameters;

The parameters of the initial generator model are updated according to the pre-trained parameters to obtain a generator model.
The computer device of claim 11, wherein the initial discriminator model is established, and the positive text samples and the first negative text samples are input into the initial discriminant model for pre-training to obtain the discrimination Model, including:

Input the initial discriminating parameters into the convolutional neural network to establish an initial discriminator model;

Input the positive text sample and the first negative text sample into the initial discriminator model for pre-training, transform it into a probability output according to the probability distribution function, and update the initial discriminator according to minimized cross entropy Discriminant parameters to obtain discriminant parameters after pre-training;

The parameters of the initial discriminator model are updated according to the discriminant parameters after the pre-training to obtain the discriminator model.
The computer device according to claim 11, wherein the test text is generated based on the generator model, the test text is input into the discriminator model to obtain the reward value of the test text, and the reward value of the test text is obtained according to the The reward value calculating the gradient of the generator model and updating the generator model according to the gradient includes:

Acquiring the text in the process of generating the test text as the test sub-text;

According to each of the test sub-texts, M hypothetical texts are generated using a Monte Carlo search method;

Input the M hypothetical texts into the discriminator model, obtain the reward average value of the M hypothetical texts as the reward value of the test sub-text, and input the test text into the discriminator model To obtain the reward value of the test text;

Calculate the gradient of the generator model according to the reward value of the test sub-text and the reward value of the test text, and update the parameters of the generator model according to the gradient to obtain an updated generator model.
One or more readable storage media storing computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtaining a real text data set, and obtaining a positive text sample from the real text data set;

Establishing an initial generator model, inputting the positive text samples to the initial generator model for pre-training, obtaining a generator model, and generating a first negative text sample according to the generator model;

Establishing an initial discriminator model, and inputting the positive text samples and the first negative text samples into the initial discriminating model for pre-training to obtain the discriminator model;

Generate test text based on the generator model, input the test text into the discriminator model to obtain the reward value of the test text, calculate the gradient of the generator model according to the reward value, and calculate the gradient of the generator model according to the reward value. Gradient update the generator model;

Generate a second negative text sample according to the updated generator model, input the second negative text sample and the positive text sample into the discriminator model, and update the discriminator model according to the minimized cross-entropy;

Alternately updating the generator model and the discriminator model, and if the output of the discriminator model converges, obtain a text generation model according to the generator model at the time of convergence;

Obtain the text to be recognized, input the text to be recognized into the text generation model, and generate the target text based on the text generation model.
15. The readable storage medium of claim 16, wherein said obtaining a positive text sample from the real text data set comprises:

Select N text data from the real text data set, where N is a positive integer;

The N pieces of text data are converted into a vector form using a word vector model, and the N pieces of text data converted into a vector form are used as text positive samples.
The readable storage medium according to claim 16, wherein the initial generator model is established, the positive text samples are input to the initial generator model for pre-training, and the generator model is obtained according to the The generator model generates the first negative sample of text, including:

Input the initial generation parameters into the recurrent neural network to establish the initial generator model;

Inputting the positive text sample into the initial generator model for pre-training, and converting it into a probability output according to a probability distribution function, to obtain pre-trained parameters;

The parameters of the initial generator model are updated according to the pre-trained parameters to obtain a generator model.
16. The readable storage medium of claim 16, wherein the establishing an initial discriminator model, inputting the positive text samples and the first negative text samples into the initial discriminating model for pre-training, Obtain the discriminator model, including:

Input the initial discriminating parameters into the convolutional neural network to establish an initial discriminator model;

Input the positive text sample and the first negative text sample into the initial discriminator model for pre-training, transform it into a probability output according to the probability distribution function, and update the initial discriminator according to minimized cross entropy Discriminant parameters to obtain discriminant parameters after pre-training;

The parameters of the initial discriminator model are updated according to the discriminant parameters after the pre-training to obtain the discriminator model.
15. The readable storage medium of claim 16, wherein the generating test text based on the generator model, inputting the test text into the discriminator model to obtain the reward value of the test text, Calculating the gradient of the generator model according to the reward value and updating the generator model according to the gradient includes:

Acquiring the text in the process of generating the test text as the test sub-text;

According to each of the test sub-texts, M hypothetical texts are generated using a Monte Carlo search method;

Input the M hypothetical texts into the discriminator model, obtain the reward average value of the M hypothetical texts as the reward value of the test sub-text, and input the test text into the discriminator model To obtain the reward value of the test text;

Calculate the gradient of the generator model according to the reward value of the test sub-text and the reward value of the test text, and update the parameters of the generator model according to the gradient to obtain an updated generator model.