CN114118024B

CN114118024B - Conditional text generation method and generation system

Info

Publication number: CN114118024B
Application number: CN202111474679.4A
Authority: CN
Inventors: 岳希; 罗伟尔; 高燕; 唐聃; 何磊
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-06-21
Anticipated expiration: 2041-12-06
Also published as: CN114118024A

Abstract

The invention discloses a conditional text generation method, which comprises the steps of collecting text data; preprocessing data; constructing an encoder and a decoder; respectively encoding the condition information and the text data through an encoder to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features; taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder; calculating loss; training the network model based on the output result and loss of the decoder until the training condition is met or the maximum training frequency is reached, and outputting the trained network model; and inputting condition information and a prompt text into the trained network model to generate a text. One of the technical problems to be solved by the invention is that the existing conditional text generation technology can generate results while training a model, so that the problems of low efficiency and low fine granularity are caused, and the purpose of generating the conditional text more efficiently and smoothly is realized.

Description

Conditional text generation method and generation system

Technical Field

The invention relates to the field of natural language processing, in particular to a conditional text generation method and a conditional text generation system.

Background

Natural Language processing is a popular technical field in recent years, and a natural Language generation task generally adopts NNLM (Neural Network Language Models), and a GPT-2 model based on ARLM (Auto-regressive Decoding) is a common one. Because the model is generated by depending on probability, the generated randomness is strong, the content cannot be controlled, and the specific requirements are not met.

One of the technical means for solving the above problems in the prior art is to add an attribute discriminator when generating a text, depending on condition information, such as keywords, emotions, styles, and the like: firstly, training a generator and a discriminator, wherein the generator p (g) generates texts, and the discriminator judges an attribute type p (c | g) and then obtains p (g | c). Returning the gradient, updating the internal state of the language model, enabling the actual prediction to be closer to the desired attribute, finally obtaining the probability distribution of new output, and sampling to generate a new word. However, although this prior art overcomes the problem of too strong randomness of the generated text to some extent, its working principle is to generate results while training the model, resulting in very low efficiency, and the generation of the text cannot be controlled on a finer granularity.

Disclosure of Invention

The invention provides a conditional text generation method and a conditional text generation system, which solve the technical problems that the existing conditional text generation technology can generate results while training a model, so that the efficiency is low and the fine granularity is not high, and achieve the aim of generating conditional texts more efficiently and smoothly.

The invention is realized by the following technical scheme:

a conditional text generation method, comprising:

s1, collecting text data; data preprocessing, namely converting text data into a data set suitable for training;

s2, constructing an encoder and a decoder;

s3, coding the condition information and the text data through a coder respectively to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features;

s4, taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder;

s5, calculating loss;

s6, training the network model based on the output result and loss of the decoder until the training condition is met or the maximum training times is reached, and outputting the trained network model;

and S7, inputting condition information and a prompt text into the trained network model to generate a text.

The invention provides a conditional text generation method, aiming at the problems that the efficiency is low and the fine granularity is low and the like caused by the fact that the existing conditional text generation technology can generate results while model training. Then, an encoder and a decoder are constructed, the condition information is encoded through the encoder, and the output of the encoder is used as condition characteristics; in the same way, the preprocessed text data is coded by a coder, and the output of the coder is used as the text characteristic; then, performing feature fusion on the condition features and the text features; and then the fused features are used as the input of a decoder to guide text generation. And then calculating a loss function, training the network model by using the output result of the decoder and the loss function, and inputting required condition information and a prompt text into the trained network model on the basis of the trained network model, so that the text generation can be completed.

It is particularly noted that although feature fusion belongs to the existing means in the aspect of machine learning, a way of fusing conditional features and text features in the field of conditional text generation belongs to one of the core inventions of the present application, and the applicant has paid a lot of creative work for this purpose, and the feature fusion way has a significant effect compared with the prior art: the condition information features and the source text features are combined, and the fusion of the condition features and the text features is realized by adopting a feature fusion means, so that compared with a conventional feature fusion method of vector addition or vector splicing, the defect that the fusion of the condition information features and each word in a text cannot be considered locally, and the fused features cannot express the meaning of original features due to the discreteness of the text features is overcome; in the method, the condition features and the text features are locally grasped based on the query vector by a technical means of feature fusion of the condition features and the text features, the condition information features can be locally grasped and fused with each word feature in the text, the condition information features can be fused into each word feature of the source text, and therefore the condition text is effectively generated, the text generation is controlled on a finer granularity, and the text generation effect is remarkably improved. In addition, the method trains the model first, and then uses the model after the model training is finished, so that the problem of low efficiency caused by synchronous training and generation when the text is generated by the attribute discriminator in the prior art is solved.

Further, in step S1, collecting text data published on the web by a crawler, and storing the text data in json format; the collected text data includes content and a subject. Wherein the json format is stored for use in subsequent training; meanwhile, the content and the theme of the text are collected, compared with the mode of only collecting the text content, a more reasonable training model can be obtained, and the fine granularity of the subsequently generated text is improved.

Further, in step S1, the data preprocessing includes:

s101, noise removal: removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode;

s102, normalization: replacing special symbols and replacing rare vocabularies;

s103, extracting keywords: extracting keywords from the collected text data, and screening out texts containing specified keywords as a data set;

s104, word segmentation: and performing word segmentation on the data set by using a word segmentation tool, and converting the words after word segmentation into corresponding position numbers in a dictionary.

The data preprocessing process of the scheme at least comprises four steps of noise elimination, normalization, keyword extraction and word segmentation which are sequentially carried out, wherein the sequence relation of the four steps cannot be reversed or staggered, and the matching degree of a data set and model training can be fully ensured.

Further, in step S2, the GPT-2 model is trained using the collected text data, and the trained model is denoted as model M, which is used as an encoder and a decoder. The scheme takes the existing GPT-2 model as a pre-training language model, the GPT-2 has a super-large scale, and the model is a huge model based on a transformer and trained on a mass data set. The GPT-2 model after training is used as an encoder and a decoder of the network model to be trained, and compared with a conventional encoding and decoding mode, the effect of generating the text can be obviously improved.

Further, step S3 includes:

s301, inputting condition information into the model M to obtain condition characteristics f_c；

S302, inputting the text data containing the condition information into a model M to obtain a text characteristic f_p；

S303, mixing f_cAnd f_pPerforming feature fusion to obtain a fusion feature f_cp。

Wherein f is_cI.e. the output obtained after the condition information is input into the model M, and f_pI.e. the output obtained after the text data is input into the model M.

Further, step S303 includes:

s3031, determining the number n of condition information, and sequentially recording the condition characteristics of the condition information as f_c1、f_c2…f_cn(ii) a Wherein n is a positive integer;

s3032, key vector key of each condition characteristic_c1、key_c2…key_cnValue vector value_c1、value_c2…value_cnRespectively connected to text features f_pBefore the key vector key and the value vector value of, and the text feature f is kept_pThe query vector query is unchanged to obtain a connected key vector key_lastValue vector value_last：

key_last＝[key_c1；key_c2…key_cn；key]；

value_last＝[value_c1；value_c2…value_cn；value]；

S3033, calculating the output value scroe by the following formula:

in the formula, q represents a query vector query, k^TRepresentative key_lastV represents value_last，d_kRepresentative key_lastDimension (d);

s3034, inputting the scroe into a feedforward neural network to obtain a fusion characteristic f_cp。

The scheme further limits the characteristic fusion process, wherein in order to meet the working condition with a plurality of condition information, the number n of the condition information is firstly determined, and the condition characteristics corresponding to each condition information are respectively marked as f_c1、f_c2…f_cn(ii) a Then, the key and value vectors of each condition feature are respectively connected to the text feature f_pBefore the key and value vectors, and keeping the query vector query unchanged, still from f_p. For example, when there is only one condition information, f_cKey of [1,2,3 ]],f_pKey of [4, 5, 6 ]]Then key connected by the method_lastIs [1,2,3, 4, 5, 6 ]]；value_lastThe same connection method as above.

After the connection by the method, the key vector key after the connection can be obtained_lastValue vector value_lasThen substituted into the calculation formula of scan, where v equals value_last，k^TIs key_lastQ represents the query vector query, still using the text feature f_pThe query in (1). Finally, the result output by the formula is used as the input of a feedforward neural network, and the finally required fusion characteristic f in the application can be obtained_cpEnsuring the realization of local grasp of condition information characteristics and textThe fusion of each word characteristic in the text, the fusion of the condition information characteristic into each word characteristic of the source text, the effective generation of the condition text and the control of the text generation effect on finer granularity.

Further, in step S5, the method for calculating the loss includes:

s501, calculating mutual information loss L of condition information and text data_point：

L_point＝(e^x-1)^-1；

Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information; e is a natural constant;

s502, setting the condition information as an empty set, and calculating the unconditional content loss L_null：

Wherein c represents condition information; x is the number of_iRepresenting a currently generated word; { x)₁，x₂...x_i-1Denotes generating x_iRequired prompt text;

denotes x_iA posterior probability of (d); k represents the maximum length of the generated text; t represents the length of the condition information;

s503, calculating the final Loss by the following formula:

wherein, when the mutual information loss participates in the training,

otherwise

When the unconditional content loss is engaged in training,

otherwise

As one of the important invention points of the application, the scheme also improves the loss function calculation mode of the network model in the conditional text generation process, and determines the loss function by combining the mutual information loss and the unconditional content loss. The mutual information loss is to calculate the relationship between condition information and text data based on a mutual information concept, and the calculation purpose in the application is to enable a generated text to contain condition information as much as possible and enable the generated text to be closer to the set condition after the hidden variable of the original text is fused with the hidden variable of the condition information; the unconditional content loss is a loss function when no condition information is contained, and the calculation purpose in the application is to reduce the influence of a feature fusion process on an original hidden variable as much as possible and enable the hidden variable to generate a smooth text after feature fusion. In the final Loss calculation formula, argmin is a mathematical symbol and represents a variable value when the target function in the bracket behind the argmin takes the minimum value;

it is decided whether or not mutual information loss participates in the training,

it is decided whether unconditional losses participate in the training.

The scheme innovatively adopts a method of combining mutual information loss and unconditional content loss, so that the fluency of the generated text can be improved, the association degree of the generated text and conditions can be ensured, and the fluency of the text and the text content can be better controlled.

A conditional text generation system, comprising:

the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;

the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;

the model training module is used for constructing a conditional text generation model and outputting a trained network model;

the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;

wherein the model training module comprises:

a pre-training language model unit for constructing an encoder and a decoder;

the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;

the feature fusion unit is used for performing feature fusion on the condition features and the text features to obtain fusion features;

the loss calculation unit is used for calculating the loss in the model training process;

and the decoder is used for taking the fusion features as input to obtain an output result so as to guide text generation.

Further, the feature fusion unit includes:

a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as f_c1、f_c2…f_cn(ii) a Wherein n is a positive integer;

a connection subunit for connecting the key vectors key of each condition feature_c1、key_c2…key_cnValue vector value_c1、value_c2…value_cnRespectively connected to text features f_pBefore the key vector key and the value vector value of, and the text feature f is kept_pThe query vector query is unchanged to obtain a connected key vector key_lastValue vector value_last：

key_last＝[key_c1；key_c2…key_cn；key]；

value_last＝[value_c1；value_c2…value_cn；value]；

A first calculating subunit configured to calculate the output value scroe by the following formula:

wherein q represents the query vector query, k^TRepresentative key_lastV represents value_last，d_kRepresentative key_lastDimension of (d);

and the second calculating subunit is used for inputting the scroe into the feedforward neural network to obtain the fusion characteristic fcp.

Further, the loss calculation unit includes:

a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formula_point：

L_point＝(e^x-1)^-1；

Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;

an unconditional content loss subunit forThe condition information is set as an empty set, and the unconditional content loss L is calculated by the following formula_null：

Wherein c represents condition information; x is the number of_iRepresenting a currently generated word; { x₁，x₂...x_i-1Denotes generating x_iRequired prompt text;

denotes x_iThe posterior probability of (d); k represents the maximum length of the generated text;

a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:

wherein, when the mutual information loss participates in the training,

otherwise

When the unconditional content loss is engaged in training,

otherwise

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention discloses a conditional text generation method and a conditional text generation system, which realize the fusion of conditional features and text features by combining the conditional information features and source text features and adopting a feature fusion means, and overcome the defects that the fusion of the conditional information features and each word in a text cannot be considered locally and the fused features cannot express the meaning of original features easily because of the discreteness of the text features compared with the conventional feature fusion manner of vector addition or vector splicing.

2. The conditional text generation method and the conditional text generation system can locally grasp the fusion of the conditional information features and each word feature in the text, can fuse the conditional information features into each word feature of the source text, further effectively generate the conditional text, control the text generation on finer granularity, and remarkably improve the text generation effect.

3. According to the conditional text generation method and the conditional text generation system, the model is trained firstly, and then used after the model training is finished, so that the problem of low efficiency caused by synchronous training and generation when the text is generated through the attribute discriminator in the prior art is solved.

4. According to the conditional text generation method and the conditional text generation system, the trained GPT-2 model is used as an encoder and a decoder of the network model to be trained, and compared with a conventional encoding and decoding mode, the effect of text generation can be obviously improved.

5. According to the conditional text generation method and the conditional text generation system, a method of combining mutual information loss and unconditional content loss is adopted, so that the fluency of the generated text can be improved, the association degree of the generated text and the conditions can be ensured, and the better control effect on the fluency of the text and the content of the text is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic flow chart of an embodiment of the present invention;

FIG. 2 is a system diagram of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention. In the description of the present application, it is to be understood that the terms "front", "back", "left", "right", "upper", "lower", "vertical", "horizontal", "high", "low", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the scope of the present application.

Example 1:

a conditional text generating method as shown in fig. 1 mainly includes the following steps:

data acquisition: pupil composition 162432 was crawled by a crawler mechanism and stored in json format.

(II) data preprocessing: the data is converted into a data set suitable for training. The method mainly comprises the steps of noise elimination, normalization, keyword extraction and word segmentation. The detailed steps are as follows:

1) and removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode, and performing special symbol replacement and rare vocabulary replacement.

2) Keyword extraction is carried out on all articles by utilizing jieba extraction, and the composition 23120 with keywords containing 'mom' is screened out to be used as a data set.

3) And utilizing a word segmentation tool to segment words of the data set respectively, and finally converting the words after word segmentation into corresponding position numbers in a dictionary.

(III) model training: the method is used for constructing the controllable text generation model and mainly comprises the following steps:

1) the crawled 162432 compositions were used to train the GPT2 model (the trained model was denoted as M) as an encoder and decoder.

2) Inputting the keyword ' mom ' into M by taking the keyword ' mom ' as condition information to obtain condition characteristic f of the keyword ' mom_c(ii) a Then inputting a composition containing a keyword 'mom' into M to obtain a text characteristic f of the composition_p。

3) Will the characteristic f_cAnd f_pAnd (3) carrying out feature fusion by using an encoder:

(1) will be characteristic f_cAnd f_pThrough a self-attention mechanism, f is_cKey of_cAnd value_cAre respectively connected to f_pBefore key and value of query is unchanged; if there are a plurality of condition information, connecting its key and value to the key and value of p, respectively. The ligation results are shown below: key_last＝[key_c；key]；value_last＝[value_c；value]。

(2) Connected key_last，value_lastThe values were calculated by the auto-attention mechanism, as follows:

wherein q represents the query vector query, k^TRepresentative key_lastV represents value_last，d_kRepresentative key_lastOf (c) is calculated.

Taking the result scroe calculated and output by the formula as the input of the feedforward neural network, and taking the finally obtained output as f_cAnd f_pFused feature f_cp。

4) Will be characteristic f_cpAs input for M, for directing the generation of text.

5) And calculating loss by adopting a method of combining two losses, including mutual information loss and unconditional content loss. Respectively as follows:

(1) mutual information loss L_point. The relationship between the text and the condition information is calculated by using the mutual information, and then the relationship is mapped to be a monotonous function which is larger than 0. The specific formula is as follows.

L_point＝(e^x-1)^-1

Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents a probability accumulation of each word in the text data with the condition information;

(2) unconditional content loss L_null. Will be in the condition of

denotes x_iA posterior probability of (d); k represents the maximum length of the generated text;

(3) the above two losses are trained in combination by the following formula.

In this example to

6) And training the network until the training condition is met or the maximum training times are finished.

7) And after the training is finished, outputting a training model.

(IV) text generation: and inputting a prompt text and condition information to generate a text, predicting the probability of the next word by using a model stored in training, normalizing by softmax, and outputting the next word by combining top-k and top-p.

In this embodiment, the cue word is given as "afternoon", the condition information is "mom" as described above, and 100 words are generated with the word count restriction.

The samples were generated as follows:

sample 1:

in the afternoon, i are watching tv, suddenly, mom comes out from the kitchen, saying to me: "son, mother want to go out and you read at home. "I say: "is mom, did i see tv and go for a while? "mom says: "you go to the meeting Bar first! "after saying, go.

Sample 2:

in the afternoon, the mother cooks in the kitchen, writes in the study, suddenly, a burst of sound is transmitted from the kitchen, and the mother plays a look before cooking.

The mom holds the dish, and at a glance, the hands of the mom are full of oil, so that the mom says: "you are a small greedy cat, i have fried you a pan, you have tasted a bar soon! "I happy: "good. "

Sample 3:

in the afternoon, I go to the park with mother to take a walk.

Walking half way, i found a stall for selling flowers, the stall owner was a grandma, she wore a gray garment, leaned on a crutch, carried a big bag with her hands, filled with various flowers, and put a basin of fresh flowers on the stall for selling flowers.

It can be seen from the above 3 samples that, the embodiment not only can effectively generate the conditional text, but also the generated conditional text has higher fluency, so that the text generation can be controlled on the fine granularity, and the effect of the conditional text generation is obviously improved.

Example 2:

a conditional text generation system, as shown in fig. 2, comprising:

wherein the model training module comprises:

a pre-training language model unit for constructing an encoder and a decoder;

and the decoder is used for taking the fusion characteristics as input to obtain an output result so as to guide text generation.

Wherein the feature fusion unit includes:

a connection subunit for connecting the key vectors key of each condition feature_c1、key_c2…key_cnValue vector value_c1、value_c2…value_cnRespectively connected to text features f_pBefore the key vector key and the value vector value of, and the text feature f is kept_pThe query vector query is unchanged, and after connection is obtainedKey vector key of_lastValue vector value_last：

key_last＝[key_c1；key_c2…key_cn；key]；

value_last＝[value_c1；value_c2…value_cn；value]；

wherein q represents the query vector query, k^TRepresentative key_lastV represents value_last，d_kRepresentative key_lastDimension (d);

a second calculating subunit, configured to input the scroe into the feedforward neural network to obtain a fusion feature f_cp。

Wherein the loss calculating unit includes:

L_point＝(e^x-1)^-1；

Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the conditional information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;

an unconditional content loss subunit for setting the condition information as an empty set and calculating an unconditional content loss L by the following formula_null：

wherein, when the mutual information loss participates in the training,

otherwise

When the unconditional content loss is engaged in training,

otherwise

The present embodiment compares the effect of the conditional text generated by the present system with the effect of the text generated by several existing text generation models, and the comparison result is shown in the following table:

model (model)	Topic relevance (%)
		GPT-2	25.8
PPLM	50.7
		CTRL	87.6
This example	91.4

It can be seen that, compared with the existing models such as GPT-2, PPLM, CTRL, etc., the model trained by the present embodiment can significantly improve the correlation between the generated text and the condition.

Example 3:

a conditional text generation apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the conditional text generation method as recited in embodiment 1 when executing the computer program.

The processor may be a Central Processing Unit (CPU), or may be other general-purpose processors, such as a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (Application Specific Integrated Circuit), an off-the-shelf programmable gate array (field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory can be used for storing the computer programs and/or modules, and the processor can realize various functions of the information inquiry device in the invention by operating or executing the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.

Example 4:

a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the conditional text generation method as recited in embodiment 1.

Further, the conditional text generating apparatus as in embodiment 3, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow in the method of implementing the embodiments of the present invention may also be stored in a computer readable storage medium through a computer program, and when the computer program is executed by a processor, the computer program may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.

In addition, a computer storage medium may include a propagated data signal with computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service.

Claims

1. A conditional text generation method, comprising:

s2, constructing an encoder and a decoder;

s5, calculating loss;

s6, training the network model based on the output result and loss of the decoder until the training condition is satisfied or the maximum training times is reached, and outputting the trained network model;

s7, inputting condition information and a prompt text into the trained network model to generate a text;

in step S2, training a GPT-2 model by using the collected text data, marking the trained model as a model M, and taking the model M as an encoder and a decoder;

step S3 includes:

S303, mixing f_cAnd f_pPerforming feature fusion to obtain a fusion feature f_cp；

Step S303 includes:

s3032, key vector key of each condition characteristic_c1Key vector key_c2… Key vector key_cnSum value vector value_c1Value vector value_c2… value vector value_cnRespectively connected to text features f_pBetween the key vector key and the value vector valueForward and hold text feature f_pThe query vector query is unchanged to obtain a connected key vector key_lastSum value vector value_last：

key_last＝[key_c1；key_c2…key_cn；key]；

value_last＝[value_c1；value_c2…value_cn；value]；

S3033, calculating the output value score by the following formula:

s3034, inputting the score into a feedforward neural network to obtain a fusion characteristic f_cp；

In step S5, the method of calculating the loss includes:

L_point＝(e^x-1)^-1；

Wherein c represents condition information; x is a radical of a fluorine atom_iRepresenting a currently generated word; { x₁,x₂…x_i-1Denotes generating x_iRequired prompt text;

s503, calculating the final Loss through the following formula:

wherein, when the mutual information loss participates in the training,

otherwise

When the unconditional content loss is engaged in training,

otherwise

2. The conditional text generation method according to claim 1, wherein in step S1, text data published on the web is collected by a crawler and stored in json format; the collected text data includes content and a subject.

3. The method for generating conditional text according to claim 1, wherein in step S1, the data preprocessing comprises:

s101, noise elimination: removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode;

s102, normalization: replacing special symbols and replacing rare vocabularies;

4. A conditional text generation system, comprising:

wherein the model training module comprises:

a pre-training language model unit for constructing an encoder and a decoder;

the characteristic fusion unit is used for carrying out characteristic fusion on the condition characteristic and the text characteristic to obtain a fusion characteristic;

the decoder is used for taking the fusion characteristics as input to obtain an output result so as to guide text generation;

the feature fusion unit includes:

a connection subunit for connecting the key vectors key of each condition feature_c1Key vector key_c2… Key vector key_cnSum value vector value_c1Value vector value_c2… value vector value_cnRespectively connected to text features f_pBefore the key vector key and value vector value of, and holds the text feature f_pThe query vector query is unchanged to obtain a connected key vector key_lastSum value vector value_last：

key_last＝[key_c1；key_c2…key_cn；key]；

value_last＝[value_c1；value_c2…value_cn；value]；

A first calculating subunit for calculating the output value score by the following formula:

a second calculating subunit, configured to input the score into the feedforward neural network to obtain a fusion feature f_cp；

The loss calculation unit includes:

L_point＝(e^x-1)^-1；

Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the conditional information and the occurrence probability of each word in the text data; p (a, c) represents a probability accumulation of each word in the text data with the condition information;

Wherein c represents condition information; x is the number of_iRepresenting a currently generated word; { x₁,x₂…x_i-1Denotes generating x_iRequired prompt text;

wherein, when the mutual information loss participates in the training,

otherwise

When the unconditional content loss is engaged in training,

otherwise