CN114118024B - Conditional text generation method and generation system - Google Patents
Conditional text generation method and generation system Download PDFInfo
- Publication number
- CN114118024B CN114118024B CN202111474679.4A CN202111474679A CN114118024B CN 114118024 B CN114118024 B CN 114118024B CN 202111474679 A CN202111474679 A CN 202111474679A CN 114118024 B CN114118024 B CN 114118024B
- Authority
- CN
- China
- Prior art keywords
- text
- key
- value
- condition information
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a conditional text generation method, which comprises the steps of collecting text data; preprocessing data; constructing an encoder and a decoder; respectively encoding the condition information and the text data through an encoder to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features; taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder; calculating loss; training the network model based on the output result and loss of the decoder until the training condition is met or the maximum training frequency is reached, and outputting the trained network model; and inputting condition information and a prompt text into the trained network model to generate a text. One of the technical problems to be solved by the invention is that the existing conditional text generation technology can generate results while training a model, so that the problems of low efficiency and low fine granularity are caused, and the purpose of generating the conditional text more efficiently and smoothly is realized.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a conditional text generation method and a conditional text generation system.
Background
Natural Language processing is a popular technical field in recent years, and a natural Language generation task generally adopts NNLM (Neural Network Language Models), and a GPT-2 model based on ARLM (Auto-regressive Decoding) is a common one. Because the model is generated by depending on probability, the generated randomness is strong, the content cannot be controlled, and the specific requirements are not met.
One of the technical means for solving the above problems in the prior art is to add an attribute discriminator when generating a text, depending on condition information, such as keywords, emotions, styles, and the like: firstly, training a generator and a discriminator, wherein the generator p (g) generates texts, and the discriminator judges an attribute type p (c | g) and then obtains p (g | c). Returning the gradient, updating the internal state of the language model, enabling the actual prediction to be closer to the desired attribute, finally obtaining the probability distribution of new output, and sampling to generate a new word. However, although this prior art overcomes the problem of too strong randomness of the generated text to some extent, its working principle is to generate results while training the model, resulting in very low efficiency, and the generation of the text cannot be controlled on a finer granularity.
Disclosure of Invention
The invention provides a conditional text generation method and a conditional text generation system, which solve the technical problems that the existing conditional text generation technology can generate results while training a model, so that the efficiency is low and the fine granularity is not high, and achieve the aim of generating conditional texts more efficiently and smoothly.
The invention is realized by the following technical scheme:
a conditional text generation method, comprising:
s1, collecting text data; data preprocessing, namely converting text data into a data set suitable for training;
s2, constructing an encoder and a decoder;
s3, coding the condition information and the text data through a coder respectively to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features;
s4, taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder;
s5, calculating loss;
s6, training the network model based on the output result and loss of the decoder until the training condition is met or the maximum training times is reached, and outputting the trained network model;
and S7, inputting condition information and a prompt text into the trained network model to generate a text.
The invention provides a conditional text generation method, aiming at the problems that the efficiency is low and the fine granularity is low and the like caused by the fact that the existing conditional text generation technology can generate results while model training. Then, an encoder and a decoder are constructed, the condition information is encoded through the encoder, and the output of the encoder is used as condition characteristics; in the same way, the preprocessed text data is coded by a coder, and the output of the coder is used as the text characteristic; then, performing feature fusion on the condition features and the text features; and then the fused features are used as the input of a decoder to guide text generation. And then calculating a loss function, training the network model by using the output result of the decoder and the loss function, and inputting required condition information and a prompt text into the trained network model on the basis of the trained network model, so that the text generation can be completed.
It is particularly noted that although feature fusion belongs to the existing means in the aspect of machine learning, a way of fusing conditional features and text features in the field of conditional text generation belongs to one of the core inventions of the present application, and the applicant has paid a lot of creative work for this purpose, and the feature fusion way has a significant effect compared with the prior art: the condition information features and the source text features are combined, and the fusion of the condition features and the text features is realized by adopting a feature fusion means, so that compared with a conventional feature fusion method of vector addition or vector splicing, the defect that the fusion of the condition information features and each word in a text cannot be considered locally, and the fused features cannot express the meaning of original features due to the discreteness of the text features is overcome; in the method, the condition features and the text features are locally grasped based on the query vector by a technical means of feature fusion of the condition features and the text features, the condition information features can be locally grasped and fused with each word feature in the text, the condition information features can be fused into each word feature of the source text, and therefore the condition text is effectively generated, the text generation is controlled on a finer granularity, and the text generation effect is remarkably improved. In addition, the method trains the model first, and then uses the model after the model training is finished, so that the problem of low efficiency caused by synchronous training and generation when the text is generated by the attribute discriminator in the prior art is solved.
Further, in step S1, collecting text data published on the web by a crawler, and storing the text data in json format; the collected text data includes content and a subject. Wherein the json format is stored for use in subsequent training; meanwhile, the content and the theme of the text are collected, compared with the mode of only collecting the text content, a more reasonable training model can be obtained, and the fine granularity of the subsequently generated text is improved.
Further, in step S1, the data preprocessing includes:
s101, noise removal: removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode;
s102, normalization: replacing special symbols and replacing rare vocabularies;
s103, extracting keywords: extracting keywords from the collected text data, and screening out texts containing specified keywords as a data set;
s104, word segmentation: and performing word segmentation on the data set by using a word segmentation tool, and converting the words after word segmentation into corresponding position numbers in a dictionary.
The data preprocessing process of the scheme at least comprises four steps of noise elimination, normalization, keyword extraction and word segmentation which are sequentially carried out, wherein the sequence relation of the four steps cannot be reversed or staggered, and the matching degree of a data set and model training can be fully ensured.
Further, in step S2, the GPT-2 model is trained using the collected text data, and the trained model is denoted as model M, which is used as an encoder and a decoder. The scheme takes the existing GPT-2 model as a pre-training language model, the GPT-2 has a super-large scale, and the model is a huge model based on a transformer and trained on a mass data set. The GPT-2 model after training is used as an encoder and a decoder of the network model to be trained, and compared with a conventional encoding and decoding mode, the effect of generating the text can be obviously improved.
Further, step S3 includes:
s301, inputting condition information into the model M to obtain condition characteristics fc;
S302, inputting the text data containing the condition information into a model M to obtain a text characteristic fp;
S303, mixing fcAnd fpPerforming feature fusion to obtain a fusion feature fcp。
Wherein f iscI.e. the output obtained after the condition information is input into the model M, and fpI.e. the output obtained after the text data is input into the model M.
Further, step S303 includes:
s3031, determining the number n of condition information, and sequentially recording the condition characteristics of the condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
s3032, key vector key of each condition characteristicc1、keyc2…keycnValue vector valuec1、valuec2…valuecnRespectively connected to text features fpBefore the key vector key and the value vector value of, and the text feature f is keptpThe query vector query is unchanged to obtain a connected key vector keylastValue vector valuelast:
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
S3033, calculating the output value scroe by the following formula:
in the formula, q represents a query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension (d);
s3034, inputting the scroe into a feedforward neural network to obtain a fusion characteristic fcp。
The scheme further limits the characteristic fusion process, wherein in order to meet the working condition with a plurality of condition information, the number n of the condition information is firstly determined, and the condition characteristics corresponding to each condition information are respectively marked as fc1、fc2…fcn(ii) a Then, the key and value vectors of each condition feature are respectively connected to the text feature fpBefore the key and value vectors, and keeping the query vector query unchanged, still from fp. For example, when there is only one condition information, fcKey of [1,2,3 ]],fpKey of [4, 5, 6 ]]Then key connected by the methodlastIs [1,2,3, 4, 5, 6 ]];valuelastThe same connection method as above.
After the connection by the method, the key vector key after the connection can be obtainedlastValue vector valuelasThen substituted into the calculation formula of scan, where v equals valuelast,kTIs keylastQ represents the query vector query, still using the text feature fpThe query in (1). Finally, the result output by the formula is used as the input of a feedforward neural network, and the finally required fusion characteristic f in the application can be obtainedcpEnsuring the realization of local grasp of condition information characteristics and textThe fusion of each word characteristic in the text, the fusion of the condition information characteristic into each word characteristic of the source text, the effective generation of the condition text and the control of the text generation effect on finer granularity.
Further, in step S5, the method for calculating the loss includes:
s501, calculating mutual information loss L of condition information and text datapoint:
Lpoint=(ex-1)-1;
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information; e is a natural constant;
s502, setting the condition information as an empty set, and calculating the unconditional content loss Lnull:
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x)1,x2...xi-1Denotes generating xiRequired prompt text;denotes xiA posterior probability of (d); k represents the maximum length of the generated text; t represents the length of the condition information;
As one of the important invention points of the application, the scheme also improves the loss function calculation mode of the network model in the conditional text generation process, and determines the loss function by combining the mutual information loss and the unconditional content loss. The mutual information loss is to calculate the relationship between condition information and text data based on a mutual information concept, and the calculation purpose in the application is to enable a generated text to contain condition information as much as possible and enable the generated text to be closer to the set condition after the hidden variable of the original text is fused with the hidden variable of the condition information; the unconditional content loss is a loss function when no condition information is contained, and the calculation purpose in the application is to reduce the influence of a feature fusion process on an original hidden variable as much as possible and enable the hidden variable to generate a smooth text after feature fusion. In the final Loss calculation formula, argmin is a mathematical symbol and represents a variable value when the target function in the bracket behind the argmin takes the minimum value;it is decided whether or not mutual information loss participates in the training,it is decided whether unconditional losses participate in the training.
The scheme innovatively adopts a method of combining mutual information loss and unconditional content loss, so that the fluency of the generated text can be improved, the association degree of the generated text and conditions can be ensured, and the fluency of the text and the text content can be better controlled.
A conditional text generation system, comprising:
the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;
the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;
the model training module is used for constructing a conditional text generation model and outputting a trained network model;
the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;
wherein the model training module comprises:
a pre-training language model unit for constructing an encoder and a decoder;
the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;
the feature fusion unit is used for performing feature fusion on the condition features and the text features to obtain fusion features;
the loss calculation unit is used for calculating the loss in the model training process;
and the decoder is used for taking the fusion features as input to obtain an output result so as to guide text generation.
Further, the feature fusion unit includes:
a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
a connection subunit for connecting the key vectors key of each condition featurec1、keyc2…keycnValue vector valuec1、valuec2…valuecnRespectively connected to text features fpBefore the key vector key and the value vector value of, and the text feature f is keptpThe query vector query is unchanged to obtain a connected key vector keylastValue vector valuelast:
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
A first calculating subunit configured to calculate the output value scroe by the following formula:wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension of (d);
and the second calculating subunit is used for inputting the scroe into the feedforward neural network to obtain the fusion characteristic fcp.
Further, the loss calculation unit includes:
a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formulapoint:
Lpoint=(ex-1)-1;
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;
an unconditional content loss subunit forThe condition information is set as an empty set, and the unconditional content loss L is calculated by the following formulanull:
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x1,x2...xi-1Denotes generating xiRequired prompt text;denotes xiThe posterior probability of (d); k represents the maximum length of the generated text;
a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention discloses a conditional text generation method and a conditional text generation system, which realize the fusion of conditional features and text features by combining the conditional information features and source text features and adopting a feature fusion means, and overcome the defects that the fusion of the conditional information features and each word in a text cannot be considered locally and the fused features cannot express the meaning of original features easily because of the discreteness of the text features compared with the conventional feature fusion manner of vector addition or vector splicing.
2. The conditional text generation method and the conditional text generation system can locally grasp the fusion of the conditional information features and each word feature in the text, can fuse the conditional information features into each word feature of the source text, further effectively generate the conditional text, control the text generation on finer granularity, and remarkably improve the text generation effect.
3. According to the conditional text generation method and the conditional text generation system, the model is trained firstly, and then used after the model training is finished, so that the problem of low efficiency caused by synchronous training and generation when the text is generated through the attribute discriminator in the prior art is solved.
4. According to the conditional text generation method and the conditional text generation system, the trained GPT-2 model is used as an encoder and a decoder of the network model to be trained, and compared with a conventional encoding and decoding mode, the effect of text generation can be obviously improved.
5. According to the conditional text generation method and the conditional text generation system, a method of combining mutual information loss and unconditional content loss is adopted, so that the fluency of the generated text can be improved, the association degree of the generated text and the conditions can be ensured, and the better control effect on the fluency of the text and the content of the text is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a system diagram of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention. In the description of the present application, it is to be understood that the terms "front", "back", "left", "right", "upper", "lower", "vertical", "horizontal", "high", "low", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the scope of the present application.
Example 1:
a conditional text generating method as shown in fig. 1 mainly includes the following steps:
data acquisition: pupil composition 162432 was crawled by a crawler mechanism and stored in json format.
(II) data preprocessing: the data is converted into a data set suitable for training. The method mainly comprises the steps of noise elimination, normalization, keyword extraction and word segmentation. The detailed steps are as follows:
1) and removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode, and performing special symbol replacement and rare vocabulary replacement.
2) Keyword extraction is carried out on all articles by utilizing jieba extraction, and the composition 23120 with keywords containing 'mom' is screened out to be used as a data set.
3) And utilizing a word segmentation tool to segment words of the data set respectively, and finally converting the words after word segmentation into corresponding position numbers in a dictionary.
(III) model training: the method is used for constructing the controllable text generation model and mainly comprises the following steps:
1) the crawled 162432 compositions were used to train the GPT2 model (the trained model was denoted as M) as an encoder and decoder.
2) Inputting the keyword ' mom ' into M by taking the keyword ' mom ' as condition information to obtain condition characteristic f of the keyword ' momc(ii) a Then inputting a composition containing a keyword 'mom' into M to obtain a text characteristic f of the compositionp。
3) Will the characteristic fcAnd fpAnd (3) carrying out feature fusion by using an encoder:
(1) will be characteristic fcAnd fpThrough a self-attention mechanism, f iscKey ofcAnd valuecAre respectively connected to fpBefore key and value of query is unchanged; if there are a plurality of condition information, connecting its key and value to the key and value of p, respectively. The ligation results are shown below: keylast=[keyc;key];valuelast=[valuec;value]。
(2) Connected keylast,valuelastThe values were calculated by the auto-attention mechanism, as follows:
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastOf (c) is calculated.
Taking the result scroe calculated and output by the formula as the input of the feedforward neural network, and taking the finally obtained output as fcAnd fpFused feature fcp。
4) Will be characteristic fcpAs input for M, for directing the generation of text.
5) And calculating loss by adopting a method of combining two losses, including mutual information loss and unconditional content loss. Respectively as follows:
(1) mutual information loss Lpoint. The relationship between the text and the condition information is calculated by using the mutual information, and then the relationship is mapped to be a monotonous function which is larger than 0. The specific formula is as follows.
Lpoint=(ex-1)-1
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents a probability accumulation of each word in the text data with the condition information;
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x)1,x2...xi-1Denotes generating xiRequired prompt text;denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
(3) the above two losses are trained in combination by the following formula.
6) And training the network until the training condition is met or the maximum training times are finished.
7) And after the training is finished, outputting a training model.
(IV) text generation: and inputting a prompt text and condition information to generate a text, predicting the probability of the next word by using a model stored in training, normalizing by softmax, and outputting the next word by combining top-k and top-p.
In this embodiment, the cue word is given as "afternoon", the condition information is "mom" as described above, and 100 words are generated with the word count restriction.
The samples were generated as follows:
sample 1:
in the afternoon, i are watching tv, suddenly, mom comes out from the kitchen, saying to me: "son, mother want to go out and you read at home. "I say: "is mom, did i see tv and go for a while? "mom says: "you go to the meeting Bar first! "after saying, go.
Sample 2:
in the afternoon, the mother cooks in the kitchen, writes in the study, suddenly, a burst of sound is transmitted from the kitchen, and the mother plays a look before cooking.
The mom holds the dish, and at a glance, the hands of the mom are full of oil, so that the mom says: "you are a small greedy cat, i have fried you a pan, you have tasted a bar soon! "I happy: "good. "
Sample 3:
in the afternoon, I go to the park with mother to take a walk.
Walking half way, i found a stall for selling flowers, the stall owner was a grandma, she wore a gray garment, leaned on a crutch, carried a big bag with her hands, filled with various flowers, and put a basin of fresh flowers on the stall for selling flowers.
It can be seen from the above 3 samples that, the embodiment not only can effectively generate the conditional text, but also the generated conditional text has higher fluency, so that the text generation can be controlled on the fine granularity, and the effect of the conditional text generation is obviously improved.
Example 2:
a conditional text generation system, as shown in fig. 2, comprising:
the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;
the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;
the model training module is used for constructing a conditional text generation model and outputting a trained network model;
the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;
wherein the model training module comprises:
a pre-training language model unit for constructing an encoder and a decoder;
the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;
the feature fusion unit is used for performing feature fusion on the condition features and the text features to obtain fusion features;
the loss calculation unit is used for calculating the loss in the model training process;
and the decoder is used for taking the fusion characteristics as input to obtain an output result so as to guide text generation.
Wherein the feature fusion unit includes:
a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
a connection subunit for connecting the key vectors key of each condition featurec1、keyc2…keycnValue vector valuec1、valuec2…valuecnRespectively connected to text features fpBefore the key vector key and the value vector value of, and the text feature f is keptpThe query vector query is unchanged, and after connection is obtainedKey vector key oflastValue vector valuelast:
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
A first calculating subunit configured to calculate the output value scroe by the following formula:wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension (d);
a second calculating subunit, configured to input the scroe into the feedforward neural network to obtain a fusion feature fcp。
Wherein the loss calculating unit includes:
a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formulapoint:
Lpoint=(ex-1)-1;
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the conditional information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;
an unconditional content loss subunit for setting the condition information as an empty set and calculating an unconditional content loss L by the following formulanull:
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x1,x2...xi-1Denotes generating xiRequired prompt text;denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:
The present embodiment compares the effect of the conditional text generated by the present system with the effect of the text generated by several existing text generation models, and the comparison result is shown in the following table:
model (model) | Topic relevance (%) |
GPT-2 | 25.8 |
PPLM | 50.7 |
CTRL | 87.6 |
This example | 91.4 |
It can be seen that, compared with the existing models such as GPT-2, PPLM, CTRL, etc., the model trained by the present embodiment can significantly improve the correlation between the generated text and the condition.
Example 3:
a conditional text generation apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the conditional text generation method as recited in embodiment 1 when executing the computer program.
The processor may be a Central Processing Unit (CPU), or may be other general-purpose processors, such as a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (Application Specific Integrated Circuit), an off-the-shelf programmable gate array (field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory can be used for storing the computer programs and/or modules, and the processor can realize various functions of the information inquiry device in the invention by operating or executing the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.
Example 4:
a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the conditional text generation method as recited in embodiment 1.
Further, the conditional text generating apparatus as in embodiment 3, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow in the method of implementing the embodiments of the present invention may also be stored in a computer readable storage medium through a computer program, and when the computer program is executed by a processor, the computer program may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.
In addition, a computer storage medium may include a propagated data signal with computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service.
Claims (4)
1. A conditional text generation method, comprising:
s1, collecting text data; data preprocessing, namely converting text data into a data set suitable for training;
s2, constructing an encoder and a decoder;
s3, coding the condition information and the text data through a coder respectively to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features;
s4, taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder;
s5, calculating loss;
s6, training the network model based on the output result and loss of the decoder until the training condition is satisfied or the maximum training times is reached, and outputting the trained network model;
s7, inputting condition information and a prompt text into the trained network model to generate a text;
in step S2, training a GPT-2 model by using the collected text data, marking the trained model as a model M, and taking the model M as an encoder and a decoder;
step S3 includes:
s301, inputting condition information into the model M to obtain condition characteristics fc;
S302, inputting the text data containing the condition information into a model M to obtain a text characteristic fp;
S303, mixing fcAnd fpPerforming feature fusion to obtain a fusion feature fcp;
Step S303 includes:
s3031, determining the number n of condition information, and sequentially recording the condition characteristics of the condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
s3032, key vector key of each condition characteristicc1Key vector keyc2… Key vector keycnSum value vector valuec1Value vector valuec2… value vector valuecnRespectively connected to text features fpBetween the key vector key and the value vector valueForward and hold text feature fpThe query vector query is unchanged to obtain a connected key vector keylastSum value vector valuelast:
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
S3033, calculating the output value score by the following formula:
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension (d);
s3034, inputting the score into a feedforward neural network to obtain a fusion characteristic fcp;
In step S5, the method of calculating the loss includes:
s501, calculating mutual information loss L of condition information and text datapoint:
Lpoint=(ex-1)-1;
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;
s502, setting the condition information as an empty set, and calculating the unconditional content loss Lnull:
Wherein c represents condition information; x is a radical of a fluorine atomiRepresenting a currently generated word; { x1,x2…xi-1Denotes generating xiRequired prompt text;denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
2. The conditional text generation method according to claim 1, wherein in step S1, text data published on the web is collected by a crawler and stored in json format; the collected text data includes content and a subject.
3. The method for generating conditional text according to claim 1, wherein in step S1, the data preprocessing comprises:
s101, noise elimination: removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode;
s102, normalization: replacing special symbols and replacing rare vocabularies;
s103, extracting keywords: extracting keywords from the collected text data, and screening out texts containing specified keywords as a data set;
s104, word segmentation: and performing word segmentation on the data set by using a word segmentation tool, and converting the words after word segmentation into corresponding position numbers in a dictionary.
4. A conditional text generation system, comprising:
the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;
the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;
the model training module is used for constructing a conditional text generation model and outputting a trained network model;
the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;
wherein the model training module comprises:
a pre-training language model unit for constructing an encoder and a decoder;
the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;
the characteristic fusion unit is used for carrying out characteristic fusion on the condition characteristic and the text characteristic to obtain a fusion characteristic;
the loss calculation unit is used for calculating the loss in the model training process;
the decoder is used for taking the fusion characteristics as input to obtain an output result so as to guide text generation;
the feature fusion unit includes:
a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
a connection subunit for connecting the key vectors key of each condition featurec1Key vector keyc2… Key vector keycnSum value vector valuec1Value vector valuec2… value vector valuecnRespectively connected to text features fpBefore the key vector key and value vector value of, and holds the text feature fpThe query vector query is unchanged to obtain a connected key vector keylastSum value vector valuelast:
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
A first calculating subunit for calculating the output value score by the following formula:wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension of (d);
a second calculating subunit, configured to input the score into the feedforward neural network to obtain a fusion feature fcp;
The loss calculation unit includes:
a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formulapoint:
Lpoint=(ex-1)-1;
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the conditional information and the occurrence probability of each word in the text data; p (a, c) represents a probability accumulation of each word in the text data with the condition information;
an unconditional content loss subunit for setting the condition information as an empty set and calculating an unconditional content loss L by the following formulanull:
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x1,x2…xi-1Denotes generating xiRequired prompt text;denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111474679.4A CN114118024B (en) | 2021-12-06 | 2021-12-06 | Conditional text generation method and generation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111474679.4A CN114118024B (en) | 2021-12-06 | 2021-12-06 | Conditional text generation method and generation system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114118024A CN114118024A (en) | 2022-03-01 |
CN114118024B true CN114118024B (en) | 2022-06-21 |
Family
ID=80366638
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111474679.4A Active CN114118024B (en) | 2021-12-06 | 2021-12-06 | Conditional text generation method and generation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114118024B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334497A (en) * | 2018-02-06 | 2018-07-27 | 北京航空航天大学 | The method and apparatus for automatically generating text |
CN109344391A (en) * | 2018-08-23 | 2019-02-15 | 昆明理工大学 | Multiple features fusion Chinese newsletter archive abstraction generating method neural network based |
CN110458216A (en) * | 2019-07-31 | 2019-11-15 | 中山大学 | The image Style Transfer method of confrontation network is generated based on condition |
CN112231582A (en) * | 2020-11-10 | 2021-01-15 | 南京大学 | Website recommendation method and equipment based on variational self-coding data fusion |
CN112417092A (en) * | 2020-11-11 | 2021-02-26 | 南京邮电大学 | Intelligent text automatic generation system based on deep learning and implementation method thereof |
CN112417134A (en) * | 2020-10-30 | 2021-02-26 | 同济大学 | Automatic abstract generation system and method based on voice text deep fusion features |
CN113554549A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Text image generation method and device, computer equipment and storage medium |
CN113609284A (en) * | 2021-08-02 | 2021-11-05 | 河南大学 | Method and device for automatically generating text abstract fused with multivariate semantics |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765345A (en) * | 2021-01-22 | 2021-05-07 | 重庆邮电大学 | Text abstract automatic generation method and system fusing pre-training model |
-
2021
- 2021-12-06 CN CN202111474679.4A patent/CN114118024B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334497A (en) * | 2018-02-06 | 2018-07-27 | 北京航空航天大学 | The method and apparatus for automatically generating text |
CN109344391A (en) * | 2018-08-23 | 2019-02-15 | 昆明理工大学 | Multiple features fusion Chinese newsletter archive abstraction generating method neural network based |
CN110458216A (en) * | 2019-07-31 | 2019-11-15 | 中山大学 | The image Style Transfer method of confrontation network is generated based on condition |
CN112417134A (en) * | 2020-10-30 | 2021-02-26 | 同济大学 | Automatic abstract generation system and method based on voice text deep fusion features |
CN112231582A (en) * | 2020-11-10 | 2021-01-15 | 南京大学 | Website recommendation method and equipment based on variational self-coding data fusion |
CN112417092A (en) * | 2020-11-11 | 2021-02-26 | 南京邮电大学 | Intelligent text automatic generation system based on deep learning and implementation method thereof |
CN113554549A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Text image generation method and device, computer equipment and storage medium |
CN113609284A (en) * | 2021-08-02 | 2021-11-05 | 河南大学 | Method and device for automatically generating text abstract fused with multivariate semantics |
Non-Patent Citations (2)
Title |
---|
Shuai Zhao 等.A Topical Keywords Fusion Based on Transformer For Text Summarization.《2020 13th International Conference on Intelligent Computation Technology and Automation (ICICTA)》.2021, * |
钱胜杰.基于深度学习的条件式文本生成的研究和应用.《中国优秀硕士学位论文全文数据库 信息科技辑》.2021, * |
Also Published As
Publication number | Publication date |
---|---|
CN114118024A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110427617B (en) | Push information generation method and device | |
CN109582767B (en) | Dialogue system processing method, device, equipment and readable storage medium | |
CN112487182B (en) | Training method of text processing model, text processing method and device | |
WO2021104102A1 (en) | Speech recognition error correction method, related devices, and readable storage medium | |
CN105095182B (en) | A kind of return information recommendation method and device | |
CN109146610A (en) | It is a kind of intelligently to insure recommended method, device and intelligence insurance robot device | |
CN110008409A (en) | Based on the sequence of recommendation method, device and equipment from attention mechanism | |
CN110246488B (en) | Voice conversion method and device of semi-optimized cycleGAN model | |
CN108959312A (en) | A kind of method, apparatus and terminal that multi-document summary generates | |
US20230244938A1 (en) | Using Chains of Thought to Prompt Machine-Learned Models Pre-Trained on Diversified Objectives | |
CN110209774A (en) | Handle the method, apparatus and terminal device of session information | |
CN108899013A (en) | Voice search method, device and speech recognition system | |
CN113934887B (en) | No-proposal time sequence language positioning method based on semantic decoupling | |
CN109582952A (en) | Poem generation method, device, computer equipment and medium | |
CN109543165A (en) | Document creation method and device based on cyclic convolution attention model | |
CN112767917A (en) | Speech recognition method, apparatus and storage medium | |
CN112463989A (en) | Knowledge graph-based information acquisition method and system | |
WO2023235346A1 (en) | Prompting machine-learned models using chains of thought | |
CN116051688A (en) | Transition animation generation method and device, computer readable storage medium and terminal | |
CN114445832A (en) | Character image recognition method and device based on global semantics and computer equipment | |
CN114118024B (en) | Conditional text generation method and generation system | |
CN117216223A (en) | Dialogue text generation method and device, storage medium and electronic equipment | |
CN110046239B (en) | Dialogue method based on emotion editing | |
CN112329437A (en) | Intelligent customer service voice quality inspection scoring method, equipment and storage medium | |
US11393454B1 (en) | Goal-oriented dialog generation using dialog template, API, and entity data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |