CN114118024B - Conditional text generation method and generation system - Google Patents

Conditional text generation method and generation system Download PDF

Info

Publication number
CN114118024B
CN114118024B CN202111474679.4A CN202111474679A CN114118024B CN 114118024 B CN114118024 B CN 114118024B CN 202111474679 A CN202111474679 A CN 202111474679A CN 114118024 B CN114118024 B CN 114118024B
Authority
CN
China
Prior art keywords
text
key
value
condition information
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111474679.4A
Other languages
Chinese (zh)
Other versions
CN114118024A (en
Inventor
岳希
罗伟尔
高燕
唐聃
何磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN202111474679.4A priority Critical patent/CN114118024B/en
Publication of CN114118024A publication Critical patent/CN114118024A/en
Application granted granted Critical
Publication of CN114118024B publication Critical patent/CN114118024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a conditional text generation method, which comprises the steps of collecting text data; preprocessing data; constructing an encoder and a decoder; respectively encoding the condition information and the text data through an encoder to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features; taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder; calculating loss; training the network model based on the output result and loss of the decoder until the training condition is met or the maximum training frequency is reached, and outputting the trained network model; and inputting condition information and a prompt text into the trained network model to generate a text. One of the technical problems to be solved by the invention is that the existing conditional text generation technology can generate results while training a model, so that the problems of low efficiency and low fine granularity are caused, and the purpose of generating the conditional text more efficiently and smoothly is realized.

Description

Conditional text generation method and generation system
Technical Field
The invention relates to the field of natural language processing, in particular to a conditional text generation method and a conditional text generation system.
Background
Natural Language processing is a popular technical field in recent years, and a natural Language generation task generally adopts NNLM (Neural Network Language Models), and a GPT-2 model based on ARLM (Auto-regressive Decoding) is a common one. Because the model is generated by depending on probability, the generated randomness is strong, the content cannot be controlled, and the specific requirements are not met.
One of the technical means for solving the above problems in the prior art is to add an attribute discriminator when generating a text, depending on condition information, such as keywords, emotions, styles, and the like: firstly, training a generator and a discriminator, wherein the generator p (g) generates texts, and the discriminator judges an attribute type p (c | g) and then obtains p (g | c). Returning the gradient, updating the internal state of the language model, enabling the actual prediction to be closer to the desired attribute, finally obtaining the probability distribution of new output, and sampling to generate a new word. However, although this prior art overcomes the problem of too strong randomness of the generated text to some extent, its working principle is to generate results while training the model, resulting in very low efficiency, and the generation of the text cannot be controlled on a finer granularity.
Disclosure of Invention
The invention provides a conditional text generation method and a conditional text generation system, which solve the technical problems that the existing conditional text generation technology can generate results while training a model, so that the efficiency is low and the fine granularity is not high, and achieve the aim of generating conditional texts more efficiently and smoothly.
The invention is realized by the following technical scheme:
a conditional text generation method, comprising:
s1, collecting text data; data preprocessing, namely converting text data into a data set suitable for training;
s2, constructing an encoder and a decoder;
s3, coding the condition information and the text data through a coder respectively to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features;
s4, taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder;
s5, calculating loss;
s6, training the network model based on the output result and loss of the decoder until the training condition is met or the maximum training times is reached, and outputting the trained network model;
and S7, inputting condition information and a prompt text into the trained network model to generate a text.
The invention provides a conditional text generation method, aiming at the problems that the efficiency is low and the fine granularity is low and the like caused by the fact that the existing conditional text generation technology can generate results while model training. Then, an encoder and a decoder are constructed, the condition information is encoded through the encoder, and the output of the encoder is used as condition characteristics; in the same way, the preprocessed text data is coded by a coder, and the output of the coder is used as the text characteristic; then, performing feature fusion on the condition features and the text features; and then the fused features are used as the input of a decoder to guide text generation. And then calculating a loss function, training the network model by using the output result of the decoder and the loss function, and inputting required condition information and a prompt text into the trained network model on the basis of the trained network model, so that the text generation can be completed.
It is particularly noted that although feature fusion belongs to the existing means in the aspect of machine learning, a way of fusing conditional features and text features in the field of conditional text generation belongs to one of the core inventions of the present application, and the applicant has paid a lot of creative work for this purpose, and the feature fusion way has a significant effect compared with the prior art: the condition information features and the source text features are combined, and the fusion of the condition features and the text features is realized by adopting a feature fusion means, so that compared with a conventional feature fusion method of vector addition or vector splicing, the defect that the fusion of the condition information features and each word in a text cannot be considered locally, and the fused features cannot express the meaning of original features due to the discreteness of the text features is overcome; in the method, the condition features and the text features are locally grasped based on the query vector by a technical means of feature fusion of the condition features and the text features, the condition information features can be locally grasped and fused with each word feature in the text, the condition information features can be fused into each word feature of the source text, and therefore the condition text is effectively generated, the text generation is controlled on a finer granularity, and the text generation effect is remarkably improved. In addition, the method trains the model first, and then uses the model after the model training is finished, so that the problem of low efficiency caused by synchronous training and generation when the text is generated by the attribute discriminator in the prior art is solved.
Further, in step S1, collecting text data published on the web by a crawler, and storing the text data in json format; the collected text data includes content and a subject. Wherein the json format is stored for use in subsequent training; meanwhile, the content and the theme of the text are collected, compared with the mode of only collecting the text content, a more reasonable training model can be obtained, and the fine granularity of the subsequently generated text is improved.
Further, in step S1, the data preprocessing includes:
s101, noise removal: removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode;
s102, normalization: replacing special symbols and replacing rare vocabularies;
s103, extracting keywords: extracting keywords from the collected text data, and screening out texts containing specified keywords as a data set;
s104, word segmentation: and performing word segmentation on the data set by using a word segmentation tool, and converting the words after word segmentation into corresponding position numbers in a dictionary.
The data preprocessing process of the scheme at least comprises four steps of noise elimination, normalization, keyword extraction and word segmentation which are sequentially carried out, wherein the sequence relation of the four steps cannot be reversed or staggered, and the matching degree of a data set and model training can be fully ensured.
Further, in step S2, the GPT-2 model is trained using the collected text data, and the trained model is denoted as model M, which is used as an encoder and a decoder. The scheme takes the existing GPT-2 model as a pre-training language model, the GPT-2 has a super-large scale, and the model is a huge model based on a transformer and trained on a mass data set. The GPT-2 model after training is used as an encoder and a decoder of the network model to be trained, and compared with a conventional encoding and decoding mode, the effect of generating the text can be obviously improved.
Further, step S3 includes:
s301, inputting condition information into the model M to obtain condition characteristics fc
S302, inputting the text data containing the condition information into a model M to obtain a text characteristic fp
S303, mixing fcAnd fpPerforming feature fusion to obtain a fusion feature fcp
Wherein f iscI.e. the output obtained after the condition information is input into the model M, and fpI.e. the output obtained after the text data is input into the model M.
Further, step S303 includes:
s3031, determining the number n of condition information, and sequentially recording the condition characteristics of the condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
s3032, key vector key of each condition characteristicc1、keyc2…keycnValue vector valuec1、valuec2…valuecnRespectively connected to text features fpBefore the key vector key and the value vector value of, and the text feature f is keptpThe query vector query is unchanged to obtain a connected key vector keylastValue vector valuelast
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
S3033, calculating the output value scroe by the following formula:
Figure BDA0003393229570000031
in the formula, q represents a query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension (d);
s3034, inputting the scroe into a feedforward neural network to obtain a fusion characteristic fcp
The scheme further limits the characteristic fusion process, wherein in order to meet the working condition with a plurality of condition information, the number n of the condition information is firstly determined, and the condition characteristics corresponding to each condition information are respectively marked as fc1、fc2…fcn(ii) a Then, the key and value vectors of each condition feature are respectively connected to the text feature fpBefore the key and value vectors, and keeping the query vector query unchanged, still from fp. For example, when there is only one condition information, fcKey of [1,2,3 ]],fpKey of [4, 5, 6 ]]Then key connected by the methodlastIs [1,2,3, 4, 5, 6 ]];valuelastThe same connection method as above.
After the connection by the method, the key vector key after the connection can be obtainedlastValue vector valuelasThen substituted into the calculation formula of scan, where v equals valuelast,kTIs keylastQ represents the query vector query, still using the text feature fpThe query in (1). Finally, the result output by the formula is used as the input of a feedforward neural network, and the finally required fusion characteristic f in the application can be obtainedcpEnsuring the realization of local grasp of condition information characteristics and textThe fusion of each word characteristic in the text, the fusion of the condition information characteristic into each word characteristic of the source text, the effective generation of the condition text and the control of the text generation effect on finer granularity.
Further, in step S5, the method for calculating the loss includes:
s501, calculating mutual information loss L of condition information and text datapoint
Lpoint=(ex-1)-1
Figure BDA0003393229570000041
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information; e is a natural constant;
s502, setting the condition information as an empty set, and calculating the unconditional content loss Lnull
Figure BDA0003393229570000042
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x)1,x2...xi-1Denotes generating xiRequired prompt text;
Figure BDA0003393229570000048
denotes xiA posterior probability of (d); k represents the maximum length of the generated text; t represents the length of the condition information;
s503, calculating the final Loss by the following formula:
Figure BDA0003393229570000043
wherein, when the mutual information loss participates in the training,
Figure BDA0003393229570000044
otherwise
Figure BDA0003393229570000045
When the unconditional content loss is engaged in training,
Figure BDA0003393229570000046
otherwise
Figure BDA0003393229570000047
As one of the important invention points of the application, the scheme also improves the loss function calculation mode of the network model in the conditional text generation process, and determines the loss function by combining the mutual information loss and the unconditional content loss. The mutual information loss is to calculate the relationship between condition information and text data based on a mutual information concept, and the calculation purpose in the application is to enable a generated text to contain condition information as much as possible and enable the generated text to be closer to the set condition after the hidden variable of the original text is fused with the hidden variable of the condition information; the unconditional content loss is a loss function when no condition information is contained, and the calculation purpose in the application is to reduce the influence of a feature fusion process on an original hidden variable as much as possible and enable the hidden variable to generate a smooth text after feature fusion. In the final Loss calculation formula, argmin is a mathematical symbol and represents a variable value when the target function in the bracket behind the argmin takes the minimum value;
Figure BDA0003393229570000049
it is decided whether or not mutual information loss participates in the training,
Figure BDA00033932295700000410
it is decided whether unconditional losses participate in the training.
The scheme innovatively adopts a method of combining mutual information loss and unconditional content loss, so that the fluency of the generated text can be improved, the association degree of the generated text and conditions can be ensured, and the fluency of the text and the text content can be better controlled.
A conditional text generation system, comprising:
the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;
the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;
the model training module is used for constructing a conditional text generation model and outputting a trained network model;
the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;
wherein the model training module comprises:
a pre-training language model unit for constructing an encoder and a decoder;
the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;
the feature fusion unit is used for performing feature fusion on the condition features and the text features to obtain fusion features;
the loss calculation unit is used for calculating the loss in the model training process;
and the decoder is used for taking the fusion features as input to obtain an output result so as to guide text generation.
Further, the feature fusion unit includes:
a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
a connection subunit for connecting the key vectors key of each condition featurec1、keyc2…keycnValue vector valuec1、valuec2…valuecnRespectively connected to text features fpBefore the key vector key and the value vector value of, and the text feature f is keptpThe query vector query is unchanged to obtain a connected key vector keylastValue vector valuelast
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
A first calculating subunit configured to calculate the output value scroe by the following formula:
Figure BDA0003393229570000051
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension of (d);
and the second calculating subunit is used for inputting the scroe into the feedforward neural network to obtain the fusion characteristic fcp.
Further, the loss calculation unit includes:
a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formulapoint
Lpoint=(ex-1)-1
Figure BDA0003393229570000052
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;
an unconditional content loss subunit forThe condition information is set as an empty set, and the unconditional content loss L is calculated by the following formulanull
Figure BDA0003393229570000061
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x1,x2...xi-1Denotes generating xiRequired prompt text;
Figure BDA0003393229570000067
denotes xiThe posterior probability of (d); k represents the maximum length of the generated text;
a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:
Figure BDA0003393229570000062
wherein, when the mutual information loss participates in the training,
Figure BDA0003393229570000063
otherwise
Figure BDA0003393229570000064
When the unconditional content loss is engaged in training,
Figure BDA0003393229570000065
otherwise
Figure BDA0003393229570000066
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention discloses a conditional text generation method and a conditional text generation system, which realize the fusion of conditional features and text features by combining the conditional information features and source text features and adopting a feature fusion means, and overcome the defects that the fusion of the conditional information features and each word in a text cannot be considered locally and the fused features cannot express the meaning of original features easily because of the discreteness of the text features compared with the conventional feature fusion manner of vector addition or vector splicing.
2. The conditional text generation method and the conditional text generation system can locally grasp the fusion of the conditional information features and each word feature in the text, can fuse the conditional information features into each word feature of the source text, further effectively generate the conditional text, control the text generation on finer granularity, and remarkably improve the text generation effect.
3. According to the conditional text generation method and the conditional text generation system, the model is trained firstly, and then used after the model training is finished, so that the problem of low efficiency caused by synchronous training and generation when the text is generated through the attribute discriminator in the prior art is solved.
4. According to the conditional text generation method and the conditional text generation system, the trained GPT-2 model is used as an encoder and a decoder of the network model to be trained, and compared with a conventional encoding and decoding mode, the effect of text generation can be obviously improved.
5. According to the conditional text generation method and the conditional text generation system, a method of combining mutual information loss and unconditional content loss is adopted, so that the fluency of the generated text can be improved, the association degree of the generated text and the conditions can be ensured, and the better control effect on the fluency of the text and the content of the text is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a system diagram of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention. In the description of the present application, it is to be understood that the terms "front", "back", "left", "right", "upper", "lower", "vertical", "horizontal", "high", "low", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the scope of the present application.
Example 1:
a conditional text generating method as shown in fig. 1 mainly includes the following steps:
data acquisition: pupil composition 162432 was crawled by a crawler mechanism and stored in json format.
(II) data preprocessing: the data is converted into a data set suitable for training. The method mainly comprises the steps of noise elimination, normalization, keyword extraction and word segmentation. The detailed steps are as follows:
1) and removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode, and performing special symbol replacement and rare vocabulary replacement.
2) Keyword extraction is carried out on all articles by utilizing jieba extraction, and the composition 23120 with keywords containing 'mom' is screened out to be used as a data set.
3) And utilizing a word segmentation tool to segment words of the data set respectively, and finally converting the words after word segmentation into corresponding position numbers in a dictionary.
(III) model training: the method is used for constructing the controllable text generation model and mainly comprises the following steps:
1) the crawled 162432 compositions were used to train the GPT2 model (the trained model was denoted as M) as an encoder and decoder.
2) Inputting the keyword ' mom ' into M by taking the keyword ' mom ' as condition information to obtain condition characteristic f of the keyword ' momc(ii) a Then inputting a composition containing a keyword 'mom' into M to obtain a text characteristic f of the compositionp
3) Will the characteristic fcAnd fpAnd (3) carrying out feature fusion by using an encoder:
(1) will be characteristic fcAnd fpThrough a self-attention mechanism, f iscKey ofcAnd valuecAre respectively connected to fpBefore key and value of query is unchanged; if there are a plurality of condition information, connecting its key and value to the key and value of p, respectively. The ligation results are shown below: keylast=[keyc;key];valuelast=[valuec;value]。
(2) Connected keylast,valuelastThe values were calculated by the auto-attention mechanism, as follows:
Figure BDA0003393229570000081
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastOf (c) is calculated.
Taking the result scroe calculated and output by the formula as the input of the feedforward neural network, and taking the finally obtained output as fcAnd fpFused feature fcp
4) Will be characteristic fcpAs input for M, for directing the generation of text.
5) And calculating loss by adopting a method of combining two losses, including mutual information loss and unconditional content loss. Respectively as follows:
(1) mutual information loss Lpoint. The relationship between the text and the condition information is calculated by using the mutual information, and then the relationship is mapped to be a monotonous function which is larger than 0. The specific formula is as follows.
Lpoint=(ex-1)-1
Figure BDA0003393229570000082
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents a probability accumulation of each word in the text data with the condition information;
(2) unconditional content loss Lnull. Will be in the condition of
Figure BDA0003393229570000086
Figure BDA0003393229570000083
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x)1,x2...xi-1Denotes generating xiRequired prompt text;
Figure BDA0003393229570000087
denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
(3) the above two losses are trained in combination by the following formula.
Figure BDA0003393229570000084
In this example to
Figure BDA0003393229570000085
6) And training the network until the training condition is met or the maximum training times are finished.
7) And after the training is finished, outputting a training model.
(IV) text generation: and inputting a prompt text and condition information to generate a text, predicting the probability of the next word by using a model stored in training, normalizing by softmax, and outputting the next word by combining top-k and top-p.
In this embodiment, the cue word is given as "afternoon", the condition information is "mom" as described above, and 100 words are generated with the word count restriction.
The samples were generated as follows:
sample 1:
in the afternoon, i are watching tv, suddenly, mom comes out from the kitchen, saying to me: "son, mother want to go out and you read at home. "I say: "is mom, did i see tv and go for a while? "mom says: "you go to the meeting Bar first! "after saying, go.
Sample 2:
in the afternoon, the mother cooks in the kitchen, writes in the study, suddenly, a burst of sound is transmitted from the kitchen, and the mother plays a look before cooking.
The mom holds the dish, and at a glance, the hands of the mom are full of oil, so that the mom says: "you are a small greedy cat, i have fried you a pan, you have tasted a bar soon! "I happy: "good. "
Sample 3:
in the afternoon, I go to the park with mother to take a walk.
Walking half way, i found a stall for selling flowers, the stall owner was a grandma, she wore a gray garment, leaned on a crutch, carried a big bag with her hands, filled with various flowers, and put a basin of fresh flowers on the stall for selling flowers.
It can be seen from the above 3 samples that, the embodiment not only can effectively generate the conditional text, but also the generated conditional text has higher fluency, so that the text generation can be controlled on the fine granularity, and the effect of the conditional text generation is obviously improved.
Example 2:
a conditional text generation system, as shown in fig. 2, comprising:
the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;
the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;
the model training module is used for constructing a conditional text generation model and outputting a trained network model;
the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;
wherein the model training module comprises:
a pre-training language model unit for constructing an encoder and a decoder;
the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;
the feature fusion unit is used for performing feature fusion on the condition features and the text features to obtain fusion features;
the loss calculation unit is used for calculating the loss in the model training process;
and the decoder is used for taking the fusion characteristics as input to obtain an output result so as to guide text generation.
Wherein the feature fusion unit includes:
a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
a connection subunit for connecting the key vectors key of each condition featurec1、keyc2…keycnValue vector valuec1、valuec2…valuecnRespectively connected to text features fpBefore the key vector key and the value vector value of, and the text feature f is keptpThe query vector query is unchanged, and after connection is obtainedKey vector key oflastValue vector valuelast
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
A first calculating subunit configured to calculate the output value scroe by the following formula:
Figure BDA0003393229570000101
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension (d);
a second calculating subunit, configured to input the scroe into the feedforward neural network to obtain a fusion feature fcp
Wherein the loss calculating unit includes:
a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formulapoint
Lpoint=(ex-1)-1
Figure BDA0003393229570000102
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the conditional information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;
an unconditional content loss subunit for setting the condition information as an empty set and calculating an unconditional content loss L by the following formulanull
Figure BDA0003393229570000103
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x1,x2...xi-1Denotes generating xiRequired prompt text;
Figure BDA0003393229570000109
denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:
Figure BDA0003393229570000104
wherein, when the mutual information loss participates in the training,
Figure BDA0003393229570000105
otherwise
Figure BDA0003393229570000106
When the unconditional content loss is engaged in training,
Figure BDA0003393229570000107
otherwise
Figure BDA0003393229570000108
The present embodiment compares the effect of the conditional text generated by the present system with the effect of the text generated by several existing text generation models, and the comparison result is shown in the following table:
model (model) Topic relevance (%)
GPT-2 25.8
PPLM 50.7
CTRL 87.6
This example 91.4
It can be seen that, compared with the existing models such as GPT-2, PPLM, CTRL, etc., the model trained by the present embodiment can significantly improve the correlation between the generated text and the condition.
Example 3:
a conditional text generation apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the conditional text generation method as recited in embodiment 1 when executing the computer program.
The processor may be a Central Processing Unit (CPU), or may be other general-purpose processors, such as a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (Application Specific Integrated Circuit), an off-the-shelf programmable gate array (field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory can be used for storing the computer programs and/or modules, and the processor can realize various functions of the information inquiry device in the invention by operating or executing the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.
Example 4:
a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the conditional text generation method as recited in embodiment 1.
Further, the conditional text generating apparatus as in embodiment 3, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow in the method of implementing the embodiments of the present invention may also be stored in a computer readable storage medium through a computer program, and when the computer program is executed by a processor, the computer program may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.
In addition, a computer storage medium may include a propagated data signal with computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service.

Claims (4)

1. A conditional text generation method, comprising:
s1, collecting text data; data preprocessing, namely converting text data into a data set suitable for training;
s2, constructing an encoder and a decoder;
s3, coding the condition information and the text data through a coder respectively to obtain condition characteristics and text characteristics; performing feature fusion on the condition features and the text features to obtain fused features, and recording the fused features as fusion features;
s4, taking the fusion characteristics as the input of a decoder to obtain the output result of the decoder;
s5, calculating loss;
s6, training the network model based on the output result and loss of the decoder until the training condition is satisfied or the maximum training times is reached, and outputting the trained network model;
s7, inputting condition information and a prompt text into the trained network model to generate a text;
in step S2, training a GPT-2 model by using the collected text data, marking the trained model as a model M, and taking the model M as an encoder and a decoder;
step S3 includes:
s301, inputting condition information into the model M to obtain condition characteristics fc
S302, inputting the text data containing the condition information into a model M to obtain a text characteristic fp
S303, mixing fcAnd fpPerforming feature fusion to obtain a fusion feature fcp
Step S303 includes:
s3031, determining the number n of condition information, and sequentially recording the condition characteristics of the condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
s3032, key vector key of each condition characteristicc1Key vector keyc2… Key vector keycnSum value vector valuec1Value vector valuec2… value vector valuecnRespectively connected to text features fpBetween the key vector key and the value vector valueForward and hold text feature fpThe query vector query is unchanged to obtain a connected key vector keylastSum value vector valuelast
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
S3033, calculating the output value score by the following formula:
Figure FDA0003629039660000011
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension (d);
s3034, inputting the score into a feedforward neural network to obtain a fusion characteristic fcp
In step S5, the method of calculating the loss includes:
s501, calculating mutual information loss L of condition information and text datapoint
Lpoint=(ex-1)-1
Figure FDA0003629039660000021
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the condition information and the occurrence probability of each word in the text data; p (a, c) represents the probability accumulation of each word in the text data with the condition information;
s502, setting the condition information as an empty set, and calculating the unconditional content loss Lnull
Figure FDA0003629039660000022
Wherein c represents condition information; x is a radical of a fluorine atomiRepresenting a currently generated word; { x1,x2…xi-1Denotes generating xiRequired prompt text;
Figure FDA0003629039660000023
denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
s503, calculating the final Loss through the following formula:
Figure FDA0003629039660000024
wherein, when the mutual information loss participates in the training,
Figure FDA0003629039660000025
otherwise
Figure FDA0003629039660000026
When the unconditional content loss is engaged in training,
Figure FDA0003629039660000027
otherwise
Figure FDA0003629039660000028
2. The conditional text generation method according to claim 1, wherein in step S1, text data published on the web is collected by a crawler and stored in json format; the collected text data includes content and a subject.
3. The method for generating conditional text according to claim 1, wherein in step S1, the data preprocessing comprises:
s101, noise elimination: removing useless symbols, redundant blanks, numbers, names of people and places from the crawled data by utilizing a regular mode;
s102, normalization: replacing special symbols and replacing rare vocabularies;
s103, extracting keywords: extracting keywords from the collected text data, and screening out texts containing specified keywords as a data set;
s104, word segmentation: and performing word segmentation on the data set by using a word segmentation tool, and converting the words after word segmentation into corresponding position numbers in a dictionary.
4. A conditional text generation system, comprising:
the data acquisition module is used for acquiring text data through a crawler and storing the text data in a json format;
the preprocessing module is used for preprocessing the text data and converting the text data into a data set suitable for training;
the model training module is used for constructing a conditional text generation model and outputting a trained network model;
the text generation module predicts the probability of the next word according to the trained network model through the input prompt text and the condition information, then outputs the next word by combining top-k and top-p through softmax normalization until the text generation is finished;
wherein the model training module comprises:
a pre-training language model unit for constructing an encoder and a decoder;
the encoder is used for respectively inputting the condition information and the text data to obtain condition characteristics and text characteristics;
the characteristic fusion unit is used for carrying out characteristic fusion on the condition characteristic and the text characteristic to obtain a fusion characteristic;
the loss calculation unit is used for calculating the loss in the model training process;
the decoder is used for taking the fusion characteristics as input to obtain an output result so as to guide text generation;
the feature fusion unit includes:
a front subunit for determining the number n of the condition information and sequentially recording the condition characteristics of each condition information as fc1、fc2…fcn(ii) a Wherein n is a positive integer;
a connection subunit for connecting the key vectors key of each condition featurec1Key vector keyc2… Key vector keycnSum value vector valuec1Value vector valuec2… value vector valuecnRespectively connected to text features fpBefore the key vector key and value vector value of, and holds the text feature fpThe query vector query is unchanged to obtain a connected key vector keylastSum value vector valuelast
keylast=[keyc1;keyc2…keycn;key];
valuelast=[valuec1;valuec2…valuecn;value];
A first calculating subunit for calculating the output value score by the following formula:
Figure FDA0003629039660000031
wherein q represents the query vector query, kTRepresentative keylastV represents valuelast,dkRepresentative keylastDimension of (d);
a second calculating subunit, configured to input the score into the feedforward neural network to obtain a fusion feature fcp
The loss calculation unit includes:
a mutual information loss subunit for calculating a mutual information loss L of the condition information and the text data by the following formulapoint
Lpoint=(ex-1)-1
Figure FDA0003629039660000032
Wherein a represents text data, and c represents condition information; x is a mutual information calculation result between the text data and the condition information; p (a) represents the accumulation of the probability of occurrence of each word in the text data; p (c) represents the accumulation of the conditional information and the occurrence probability of each word in the text data; p (a, c) represents a probability accumulation of each word in the text data with the condition information;
an unconditional content loss subunit for setting the condition information as an empty set and calculating an unconditional content loss L by the following formulanull
Figure FDA0003629039660000033
Wherein c represents condition information; x is the number ofiRepresenting a currently generated word; { x1,x2…xi-1Denotes generating xiRequired prompt text;
Figure FDA0003629039660000034
denotes xiA posterior probability of (d); k represents the maximum length of the generated text;
a combining subunit, configured to combine the mutual information Loss and the unconditional content Loss by using the following formula to obtain a final Loss:
Figure FDA0003629039660000041
wherein, when the mutual information loss participates in the training,
Figure FDA0003629039660000042
otherwise
Figure FDA0003629039660000043
When the unconditional content loss is engaged in training,
Figure FDA0003629039660000044
otherwise
Figure FDA0003629039660000045
CN202111474679.4A 2021-12-06 2021-12-06 Conditional text generation method and generation system Active CN114118024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111474679.4A CN114118024B (en) 2021-12-06 2021-12-06 Conditional text generation method and generation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111474679.4A CN114118024B (en) 2021-12-06 2021-12-06 Conditional text generation method and generation system

Publications (2)

Publication Number Publication Date
CN114118024A CN114118024A (en) 2022-03-01
CN114118024B true CN114118024B (en) 2022-06-21

Family

ID=80366638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111474679.4A Active CN114118024B (en) 2021-12-06 2021-12-06 Conditional text generation method and generation system

Country Status (1)

Country Link
CN (1) CN114118024B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN109344391A (en) * 2018-08-23 2019-02-15 昆明理工大学 Multiple features fusion Chinese newsletter archive abstraction generating method neural network based
CN110458216A (en) * 2019-07-31 2019-11-15 中山大学 The image Style Transfer method of confrontation network is generated based on condition
CN112231582A (en) * 2020-11-10 2021-01-15 南京大学 Website recommendation method and equipment based on variational self-coding data fusion
CN112417092A (en) * 2020-11-11 2021-02-26 南京邮电大学 Intelligent text automatic generation system based on deep learning and implementation method thereof
CN112417134A (en) * 2020-10-30 2021-02-26 同济大学 Automatic abstract generation system and method based on voice text deep fusion features
CN113554549A (en) * 2021-07-27 2021-10-26 深圳思谋信息科技有限公司 Text image generation method and device, computer equipment and storage medium
CN113609284A (en) * 2021-08-02 2021-11-05 河南大学 Method and device for automatically generating text abstract fused with multivariate semantics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765345A (en) * 2021-01-22 2021-05-07 重庆邮电大学 Text abstract automatic generation method and system fusing pre-training model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN109344391A (en) * 2018-08-23 2019-02-15 昆明理工大学 Multiple features fusion Chinese newsletter archive abstraction generating method neural network based
CN110458216A (en) * 2019-07-31 2019-11-15 中山大学 The image Style Transfer method of confrontation network is generated based on condition
CN112417134A (en) * 2020-10-30 2021-02-26 同济大学 Automatic abstract generation system and method based on voice text deep fusion features
CN112231582A (en) * 2020-11-10 2021-01-15 南京大学 Website recommendation method and equipment based on variational self-coding data fusion
CN112417092A (en) * 2020-11-11 2021-02-26 南京邮电大学 Intelligent text automatic generation system based on deep learning and implementation method thereof
CN113554549A (en) * 2021-07-27 2021-10-26 深圳思谋信息科技有限公司 Text image generation method and device, computer equipment and storage medium
CN113609284A (en) * 2021-08-02 2021-11-05 河南大学 Method and device for automatically generating text abstract fused with multivariate semantics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Shuai Zhao 等.A Topical Keywords Fusion Based on Transformer For Text Summarization.《2020 13th International Conference on Intelligent Computation Technology and Automation (ICICTA)》.2021, *
钱胜杰.基于深度学习的条件式文本生成的研究和应用.《中国优秀硕士学位论文全文数据库 信息科技辑》.2021, *

Also Published As

Publication number Publication date
CN114118024A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110427617B (en) Push information generation method and device
CN109582767B (en) Dialogue system processing method, device, equipment and readable storage medium
CN112487182B (en) Training method of text processing model, text processing method and device
WO2021104102A1 (en) Speech recognition error correction method, related devices, and readable storage medium
CN105095182B (en) A kind of return information recommendation method and device
CN109146610A (en) It is a kind of intelligently to insure recommended method, device and intelligence insurance robot device
CN110008409A (en) Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110246488B (en) Voice conversion method and device of semi-optimized cycleGAN model
CN108959312A (en) A kind of method, apparatus and terminal that multi-document summary generates
US20230244938A1 (en) Using Chains of Thought to Prompt Machine-Learned Models Pre-Trained on Diversified Objectives
CN110209774A (en) Handle the method, apparatus and terminal device of session information
CN108899013A (en) Voice search method, device and speech recognition system
CN113934887B (en) No-proposal time sequence language positioning method based on semantic decoupling
CN109582952A (en) Poem generation method, device, computer equipment and medium
CN109543165A (en) Document creation method and device based on cyclic convolution attention model
CN112767917A (en) Speech recognition method, apparatus and storage medium
CN112463989A (en) Knowledge graph-based information acquisition method and system
WO2023235346A1 (en) Prompting machine-learned models using chains of thought
CN116051688A (en) Transition animation generation method and device, computer readable storage medium and terminal
CN114445832A (en) Character image recognition method and device based on global semantics and computer equipment
CN114118024B (en) Conditional text generation method and generation system
CN117216223A (en) Dialogue text generation method and device, storage medium and electronic equipment
CN110046239B (en) Dialogue method based on emotion editing
CN112329437A (en) Intelligent customer service voice quality inspection scoring method, equipment and storage medium
US11393454B1 (en) Goal-oriented dialog generation using dialog template, API, and entity data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant