CN116108157A - Method for training text generation model, text generation method and device - Google Patents

Method for training text generation model, text generation method and device Download PDF

Info

Publication number
CN116108157A
CN116108157A CN202310387160.5A CN202310387160A CN116108157A CN 116108157 A CN116108157 A CN 116108157A CN 202310387160 A CN202310387160 A CN 202310387160A CN 116108157 A CN116108157 A CN 116108157A
Authority
CN
China
Prior art keywords
text
sample
training
output
diffusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310387160.5A
Other languages
Chinese (zh)
Other versions
CN116108157B (en
Inventor
袁正
苑洪意
谭传奇
黄非
黄松芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Institute Hangzhou Technology Co Ltd
Original Assignee
Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Damo Institute Hangzhou Technology Co Ltd filed Critical Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority to CN202310387160.5A priority Critical patent/CN116108157B/en
Publication of CN116108157A publication Critical patent/CN116108157A/en
Application granted granted Critical
Publication of CN116108157B publication Critical patent/CN116108157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a method for training a text generation model, a text generation method and a text generation device. The main technical scheme comprises the following steps: acquiring training data comprising a plurality of training samples, wherein the training samples comprise sample pairs formed by input text samples and output text samples; acquiring a characteristic representation of an output text sample in a sample pair, and carrying out noise adding diffusion treatment on the characteristic representation of the output text sample to obtain a noisy characteristic representation; and taking the input text sample of the sample pair and the characteristic representation after noise addition as input training text generation models, wherein the text generation models simulate inverse diffusion processing of the noise addition diffusion based on the input text sample and the characteristic representation after noise addition in the training process so as to take the output text sample as a target to be output. According to the method and the device, a diffusion probability generation mechanism is introduced into the field of text generation, and the effect of text generation can be improved.

Description

Method for training text generation model, text generation method and device
Technical Field
The present application relates to the field of natural language processing and artificial intelligence technologies, and in particular, to a method for training a text generation model, a text generation method and a text generation device.
Background
Text-to-text generation techniques refer primarily to techniques that transform and process input text to obtain new text. Mainly comprises text abstract, text rewrite, machine translation, automatic question and answer, and the like. Text-to-text generation uses a text generation model using mostly an encoder-decoder architecture, but conventional text generation models usually employ a maximum likelihood estimation method, i.e., a parameter value that uses a desired sample result to reverse the maximum probability to cause such result, i.e., determine the parameter value of the model so that the probability of occurrence of the desired sample result is maximized. However, the text generation effect by the maximum likelihood estimation method remains to be improved.
Disclosure of Invention
In view of this, the present application provides a method for training a text generation model, a text generation method and a device, so as to improve the effect of text generation.
The application provides the following scheme:
in a first aspect, a method of training a text generation model is provided, the method comprising:
acquiring training data comprising a plurality of training samples, wherein the training samples comprise sample pairs formed by input text samples and output text samples;
Acquiring a characteristic representation of an output text sample in a sample pair, and carrying out noise adding diffusion treatment on the characteristic representation of the output text sample to obtain a noisy characteristic representation;
and taking the input text sample of the sample pair and the characteristic representation after noise addition as input training text generation models, wherein the text generation models simulate inverse diffusion processing of the noise addition diffusion based on the input text sample and the characteristic representation after noise addition in the training process so as to take the output text sample as a target to be output.
According to one implementation in an embodiment of the present application, the text generation model includes an encoder and a decoder;
the encoder acquires a characteristic representation of an input text sample of an input text generation model, and the decoder performs the inverse diffusion processing by using the characteristic representation of the input text sample and the denoised characteristic representation to obtain the output text sample;
the training targets include: minimizing the difference between the distribution produced by the noisy diffusion process and the distribution produced by the inverse diffusion process.
According to an achievable mode of the embodiment of the application, the training target further includes: minimizing the difference between the distribution of the characteristic representation obtained by the diffusion treatment of the last time step and the normal distribution; and/or minimizing the difference between the feature representation resulting from the back-diffusion of the last time step and the feature representation of the output text sample.
According to an implementation manner in the embodiment of the present application, the obtaining the feature representation of the output text sample in the sample pair includes:
word embedding processing is carried out on the output text sample by utilizing an embedding network, so that characteristic representation of the output text sample is obtained; or alternatively, the process may be performed,
and encoding the output text sample by using an encoder in the text generation model to obtain the characteristic representation of the output text sample.
According to an implementation manner in the embodiments of the present application, in the noise-adding diffusion processing, the first time step of diffusion processing adds noise to the feature representation of the output text sample, and the subsequent diffusion processing of each time step adds noise to the feature representation obtained by the diffusion processing of the previous time step, and the feature representations obtained by the diffusion processing of each time step conform to the normal distribution.
According to the embodiment of the applicationIn one implementation manner, in the back diffusion process, the feature representation obtained by the back diffusion process of each time step is obtained by sampling on a posterior distribution of the feature representation obtained by the back diffusion process of the previous time step; alternatively, the characteristic representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure SMS_1
Up-sampled from the a priori distribution of said +.>
Figure SMS_2
Is a prediction result of the characteristic representation obtained by the first diffusion process.
In a second aspect, there is provided a text generation method, the method comprising:
acquiring an input text;
inputting the input text and random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the input text and the random noise to obtain an output text;
wherein the text generation model is pre-trained by the method described in the first aspect.
According to one implementation in an embodiment of the present application, the text generation model includes an encoder and a decoder;
the encoder obtains a feature representation of the input text;
the decoder performs inverse diffusion processing by using the characteristic representation of the input text and the random noise to predict an output text; wherein in the back diffusion process, the feature representation obtained by the back diffusion process of each time step is obtained by up-sampling a posterior distribution of the feature representation obtained by the back diffusion process of the previous time step; alternatively, the characteristic representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure SMS_3
Up-sampled from the a priori distribution of said +.>
Figure SMS_4
Is a prediction result of the characteristic representation obtained by the first diffusion process.
In a third aspect, a summary generating method is provided, the method includes:
acquiring an input text;
inputting the input text and random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the input text and the random noise to obtain a summary of the input text;
wherein the text generation model is pre-trained by the method described in the first aspect.
In a fourth aspect, a machine translation method is provided, the method comprising:
acquiring a text adopting a first language;
inputting the text adopting the first language and the random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the text adopting the first language and the random noise to obtain a text adopting a second language;
wherein the text generation model is pre-trained by the method described in the first aspect.
In a fifth aspect, there is provided an apparatus for training a text generation model, the apparatus comprising:
a sample acquisition unit configured to acquire training data including a plurality of training samples including a sample pair of an input text sample and an output text sample;
The noise adding and diffusing unit is configured to acquire the characteristic representation of the output text sample in the sample pair, and perform noise adding and diffusing treatment on the characteristic representation of the output text sample to obtain the characteristic representation after noise adding;
and a model training unit configured to take the input text sample and the denoised feature representation of the sample pair as input training text generation models, wherein the text generation models simulate inverse diffusion processing of the denoised diffusion based on the input text sample and the denoised feature representation in a training process so as to take the output text sample as a target to be output.
In a sixth aspect, there is provided a text generating apparatus, the apparatus comprising:
a text acquisition unit configured to acquire an input text;
a text generation unit configured to input the input text and random noise into a text generation model, and obtain an output text obtained by performing inverse diffusion processing on the text generation model based on the input text and the random noise;
wherein the text generation model is pre-trained by the apparatus of the fifth aspect.
According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the first aspects described above.
According to an eighth aspect, there is provided an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the first aspects above.
According to a specific embodiment provided by the application, the application discloses the following technical effects:
1) The method and the device do not adopt a mode of maximum likelihood estimation to train a text generation model any more, but provide a brand new thought, introduce a diffusion probability generation mechanism into the field of text generation, simulate the text generation process into inverse diffusion processing of noise diffusion, eliminate the influence on text generation caused by information loss generated by noise, and further obtain a better text generation effect.
2) In the actual prediction process, the input and the processing of the encoder are unchanged, that is, the encoder still only needs to perform feedforward calculation of the neural network once and does not need to participate in a back diffusion process, and the back diffusion process can need hundreds of time steps, so that the calculation resources can be greatly saved.
3) In the back diffusion processing, the sampling mode of the prior distribution of the characteristic representation based on the predicted output text sample is closer to a training target, and the characteristic representation with higher quality can be obtained.
Of course, not all of the above-described advantages need be achieved at the same time in practicing any one of the products of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a system architecture to which embodiments of the present application are applicable;
FIG. 2 is a flowchart of a method for training a text generation model according to an embodiment of the present application;
fig. 3 is a schematic diagram of training principle of a text generation model according to an embodiment of the present application;
fig. 4 is a flowchart of a text generation method provided in an embodiment of the present application;
fig. 5 is a schematic diagram of prediction principle of a text generation model according to an embodiment of the present application;
FIG. 6 is a schematic block diagram of an apparatus for training a text generation model provided by an embodiment of the present application;
FIG. 7 is a schematic block diagram of a text generating device provided in an embodiment of the present application;
fig. 8 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
To facilitate an understanding of the present application, a brief description of a system architecture to which the present application applies is first provided. FIG. 1 illustrates an exemplary system architecture to which embodiments of the present application may be applied, including model training means and text generation means, as shown in FIG. 1.
After the training data is acquired in an offline stage, the model training device can perform model training by adopting the method provided by the embodiment of the application to obtain a text generation model.
The text generation means generates output text based on the input text on-line using the text generation model that has been established. The text generation model is in effect a sequence-to-sequence (seq 2 seq) model, enabling predictions from text sequence to text sequence.
The model training device and the text generation device may be set as independent servers, may be set in the same server or server group, or may be set in independent or same cloud servers. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPs, virtual Private Server) service. The model training device and the text generating device can also be arranged on a computer terminal with stronger computing power.
In addition to the text generation on-line, the text generation device may perform text generation off-line, for example, to generate a summary of texts in a batch.
It should be understood that the number of model training devices, text generation devices, and text generation models in fig. 1 are merely illustrative. There may be any number of model training means, text generating means, and text generating models, as desired for implementation.
Fig. 2 is a flowchart of a method for training a text generation model according to an embodiment of the present application, where the method flow may be performed by a model training apparatus in the system shown in fig. 1. As shown in fig. 2, the method may include:
Step 202: training data is obtained that includes a plurality of training samples, the training samples including pairs of input text samples and output text samples.
Step 204: and obtaining the characteristic representation of the output text sample in the sample pair, and carrying out noise adding diffusion treatment on the characteristic representation of the output text sample to obtain the characteristic representation after noise adding.
Step 206: and taking the input text sample and the characteristic representation after noise addition of the sample pair as an input training text generation model, and simulating inverse diffusion processing of noise addition diffusion based on the input text sample and the characteristic representation after noise addition in the training process by the text generation model so as to take the output text sample as a target to be output.
From the above flow, the method and the device can be used for training the text generation model without adopting a mode of maximum likelihood estimation, and provide a brand new thought, introduce a diffusion probability generation mechanism into the field of text generation, simulate the text generation process into inverse diffusion processing of noise diffusion, and eliminate the influence of information loss caused by noise on text generation, thereby obtaining better text generation effect.
Each step in the above-described flow is described in detail below. The above step 202, i.e. "acquiring training data comprising a plurality of training samples", will be described in detail first with reference to the embodiments.
Although the method does not adopt the mode of maximum likelihood estimation, the acquired training data is still the same as the training data adopted by the traditional maximum likelihood estimation. The training data comprises a plurality of training samples, each training sample being an input text sample
Figure SMS_5
And output text sample +.>
Figure SMS_6
A sample pair is formed.
For example, in a summary generation application scenario, some articles may be taken as input text samples, and summaries of these articles may be taken as output text samples.
For another example, in the text rewrite field, sentences or paragraphs may be taken as input text samples, and text of another expression of the sentences or paragraphs, namely, rewrite text, may be taken as output text.
For another example, in the field of machine translation, a text in a first language may be used as an input text sample, and a text in a second language corresponding to the text sample may be used as an output text sample. For example, a chinese sentence is taken as an input text sample, and an english sentence corresponding to the translated chinese sentence is taken as an output text sample.
The step 204, namely, the step of obtaining the feature representation of the output text sample in the sample pair, and the step of performing noise adding diffusion processing on the feature representation of the output text sample to obtain the feature representation after noise adding, is described below in connection with the embodiment.
The denoising diffusion probability model is applied in the field of image generation, achieves the effect of generating an countermeasure model beyond the traditional method, and is still blank in the field of natural language processing. The denoising diffusion probability model mainly comprises: the two processes of forward noise adding and diffusing and backward noise removing are the process of forward noise adding and diffusing, namely the process of increasing noise step by step time on the basis of outputting text samples.
In the field of natural language processing, due to the discretization characteristic of natural language, a denoising diffusion probability model cannot be directly applied to a natural language generation task. In the embodiment of the application, the sequence comprising each Token (element) can be firstly contained
Figure SMS_7
Feature representation mapped to continuous +.>
Figure SMS_8
I.e. a feature representation consisting of word vectors for each Token. Wherein each Token of a text refers to an element constituting the text. For a text, the text is segmented into a sequence of characters or words, and the characters or words, the initiator and the separator in the sequence of text are Token. />
Figure SMS_9
,/>
Figure SMS_10
And->
Figure SMS_11
Respectively->
Figure SMS_12
And the dimensions of the word vector.
As one of the realizable modes, the embedded network can be utilized to output text samples
Figure SMS_13
Word embedding processing is carried out to obtain an output text sample +.>
Figure SMS_14
Characteristic representation of +.>
Figure SMS_15
。/>
As another implementation, output text samples are output using encoder pairs in a text generation model
Figure SMS_16
Coding to obtain output text sample +.>
Figure SMS_17
Characteristic representation of +.>
Figure SMS_18
The process of noisy diffusion is a multi-time-step diffusion process imposed on the feature representation of the output text sample. Diffusion processing at a first time step to output a representation of a characteristic of a text sample
Figure SMS_19
Adding noise to get->
Figure SMS_20
. The subsequent diffusion process of each time step adds noise to the characteristic representation obtained by the diffusion process of the previous time step. The resulting characteristic representation of the diffusion process at each time step conforms to a normal distribution, that is, the process of forward noise diffusion can be seen as adding an additional markov transition distribution.
As shown in FIG. 3, the Markov transition distribution in the diffusion process of the first time step may be defined as
Figure SMS_21
For example:
Figure SMS_22
(1)
in the subsequent diffusion treatment at other time steps, by
Figure SMS_23
The Markov transition distribution for the example +1 time step may be defined as +.>
Figure SMS_24
Figure SMS_25
(2)
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_27
is about->
Figure SMS_28
Is a distribution of +.>
Figure SMS_29
Mean, in->
Figure SMS_32
Is a normal distribution of variance. The diffusion treatment of each time step is performed by +. >
Figure SMS_34
,/>
Figure SMS_35
For preset parameters, ++>
Figure SMS_38
Is an identity matrix. />
Figure SMS_26
And->
Figure SMS_30
Is->
Figure SMS_31
Time step +1 and->
Figure SMS_33
The time-step diffusion process results in a characteristic representation. By a preset number of time steps (e.g. T+1 timesStep) diffusion, obtain ∈ ->
Figure SMS_36
As far as possible let->
Figure SMS_37
Near normal distribution. The more time steps in which diffusion occurs, +.>
Figure SMS_39
The closer to normal distribution, the better the effect, but the more computation resources are correspondingly occupied, the longer the time is, and therefore a relatively balanced value, for example, 2000 time steps, is required to be taken empirically or experimentally.
By the forward noise adding diffusion process, the discrete output text samples are fused into a continuous noise removing diffusion probability model, and gradually in the process of
Figure SMS_40
Adding noise to obtain samples meeting the above a priori distribution>
Figure SMS_41
The prior distribution adopted in the embodiment of the present application is a normal distribution.
The step 206 of "generating a model of the input text sample and the denoised feature representation of the sample pair, and the text generating model simulates inverse diffusion processing of the denoised diffusion based on the input text sample and the denoised feature representation during training to output the output text sample as a target" will be described in detail with reference to the embodiments.
Training of the text generation model is actually a process of simulating (i.e., learning) the back diffusion on the basis of the forward noise-added diffusion, and the architecture of the text generation model adopted in the embodiment of the application is an encoder-decoder structure. As shown in fig. 3, a text sample is entered
Figure SMS_42
Input encoder, encoder pair->
Figure SMS_43
Coding to obtain input text sample +.>
Figure SMS_44
Is characterized by the following.
The encoder may be implemented based on a Pre-Training language model, where a Pre-Training language model such as BERT (Bidirectional Encoder Representation from Transformers, bi-directional coded representation based on conversion), XLNet (an autoregressive model that implements bi-directional context information through an arrangement language model), GPT (generated Pre-Training) model, etc. is used as an initial encoder, and further time-step Training is performed based on this. The BERT is a bi-directional pre-training language model, and uses Transformer Encoder (transform encoder) as a model structure, and the BERT can well utilize context information for feature learning. XLNet is a BERT-like model, a more generalized autoregressive pre-training model. GPT uses Transformer Decoder (transform decoder) structure and only mask multi-headed attention is reserved in Transformer Decoder.
A transform network is a model that uses a self-attention mechanism to encode each Token of an input to transform into a representation of a feature. In addition, in addition to using a transducer-based encoder-decoder architecture, other network-based encoder-decoder architectures may be employed, such as RNN (Recurrent Neural Network ) based implementations.
And the decoder performs inverse diffusion processing by using the characteristic representation of the input text sample and the characteristic representation after noise addition to obtain an output text sample.
For the text generation task, each time step can be considered as removing noise from the feature representation resulting from the back-diffusion of the last time step, subject to the input text sample. Wherein for the first time step back diffusion, the characteristic representation after noise addition
Figure SMS_45
And removing noise.
Each time stepDenoising (i.e., back-diffusion processing) can be considered as simulating the inverse of the denoising process, i.e., simulating the posterior distribution of the forward denoising diffusion process, expressed as
Figure SMS_46
Which follow the form of a gaussian distribution family. />
Figure SMS_47
Is indicated at->
Figure SMS_48
And->
Figure SMS_49
Basic->
Figure SMS_50
Can be expressed as:
Figure SMS_51
(3)
Figure SMS_52
is about->
Figure SMS_53
Is a distribution of +. >
Figure SMS_54
Variance is->
Figure SMS_55
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_56
(4)
Figure SMS_57
(5)
Figure SMS_58
(6)
Figure SMS_59
(7)
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_60
is a preset parameter, ++>
Figure SMS_61
The processing function which needs to be simulated by the text generation model can also be considered as a denoising function learned by the model. />
Figure SMS_62
Refer to model parameters.
As can be seen from the above description, as one of the realizations, in the back-diffusion process, the feature representation obtained by the back-diffusion process of each time step is up-sampled on the posterior distribution of the feature representation obtained by the back-diffusion process of the previous time step. I.e.
Figure SMS_63
Is at->
Figure SMS_64
Up-sampling.
As another implementation, each time step in the noise-plus-diffusion process can be based on
Figure SMS_65
Expressed as a priori distribution of (a):
Figure SMS_66
(8)
thus, in the back diffusion process, the characteristic representation obtained by the back diffusion process at each time step
Figure SMS_68
Can be based on prediction +.>
Figure SMS_69
Up-sampled from a priori distribution of +.>
Figure SMS_71
Is a prediction result of the characteristic representation obtained by the first diffusion process. That is, each step of back diffusion treatment can predict one +.>
Figure SMS_73
Then based on a priori distribution->
Figure SMS_75
Sampling to get->
Figure SMS_76
. Initial->
Figure SMS_77
Is inaccurate, but over time, for +.>
Figure SMS_67
Is more and more accurate and is back-diffused in the final step to get +. >
Figure SMS_70
When the goal is to make +.>
Figure SMS_72
And->
Figure SMS_74
And consistent. Such a sampling method is more similar to the training target and can result in a higher quality feature representation.
Because the denoising process is the inverse process of the denoising diffusion, the ideal situation is that the text generation model completely learns the inverse process of the denoising diffusion so as to finally predict
Figure SMS_78
Therefore, training targets adopted by the training text generation model in the embodiment of the application mainly include: minimizing distribution of noisy diffusion process generation and back diffusion process generationIs a difference between the distributions of (a).
Further, in the noise-adding diffusion process, it is desirable that the characteristic representation obtained by the diffusion process of the last time step is the same as random noise, so the training target may further include: the difference between the distribution of the characteristic representation obtained by the diffusion processing of the last time step and the normal distribution is minimized.
Further, in the back diffusion (i.e., denoising) process, it is desirable that the back process of the denoising diffusion be completely simulated, and the characteristic representation obtained in the back diffusion at the last time step and
Figure SMS_79
is completely consistent. Thus, the training objectives described above may also include minimizing the difference between the feature representation resulting from the last time step back-diffusion and the feature representation of the text sample.
In this embodiment of the present disclosure, the loss function may be constructed according to the training target, and the model parameters may be updated by using the value of the loss function in each iteration, and using a manner such as gradient descent, until a preset training end condition is satisfied. The training ending condition may include, for example, the value of the loss function being less than or equal to a preset loss function threshold, the number of iterations reaching a preset number of times threshold, etc.
As one of the realizations, a loss function may be constructed
Figure SMS_80
The following are provided:
Figure SMS_81
wherein in the above formula
Figure SMS_82
Is to take the desired treatment, e.g.>
Figure SMS_83
Refers to that in
Figure SMS_84
Constraint take->
Figure SMS_85
Is a desire for content in the medium.
Figure SMS_86
Refers to based on->
Figure SMS_88
Is->
Figure SMS_90
Is in accordance with->
Figure SMS_92
Is a distribution of (a).
Figure SMS_94
The difference between the distribution produced by the back diffusion process and the distribution produced by the noisy diffusion process is reflected. />
Figure SMS_95
The expected value of the characteristic representation obtained by the diffusion processing of the last time step is represented, so that the difference between the characteristic representation obtained by the diffusion processing of the last time step and the normal distribution can be represented. />
Figure SMS_96
Indicating +.>
Figure SMS_87
On the premise of (1) predicting +.>
Figure SMS_89
Probability of (2), thus
Figure SMS_91
The characteristic representation obtained by back diffusion of the last time step and the characteristic representation of the output text sample are embodied in the fact +. >
Figure SMS_93
Differences between them.
Based on the text generation model obtained through training, a specific text generation task can be executed by using the text generation model. Fig. 4 is a flowchart of a text generation method according to an embodiment of the present application, where the method may be performed by the text generation device in the system shown in fig. 1. As shown in fig. 4, the method may include the steps of:
step 402: input text is obtained.
Step 404: and inputting the input text and random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the input text and random noise to obtain an output text. The text generation model is obtained by training in advance by adopting a method shown in fig. 2.
The structure of the text generation model trained in advance in the embodiment of the present application is shown in fig. 5, and includes an encoder and a decoder.
The encoder obtains a characteristic representation of the input text.
The decoder uses the characteristic representation of the input text and random noise to perform a back-diffusion process to predict the output text. Wherein in the back diffusion process, the feature representation obtained by the back diffusion process of each time step is obtained by up-sampling the posterior distribution of the feature representation obtained by the back diffusion process of the previous time step; alternatively, the characteristic representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure SMS_97
Up-sampled from a priori distribution of +.>
Figure SMS_98
Is a prediction result of the characteristic representation obtained by the first diffusion process.
That is, during the actual prediction process, the input and processing of the encoder are unchanged, that is, the encoder still only needs to perform the feedforward calculation of the neural network once, and does not need to participate in the back-diffusion process, and the back-diffusion process may need hundreds of time steps to process, so that the calculation resources can be greatly saved.
The input of the decoder is not only the output of the encoder, but also random noise is input into the decoder, the decoder performs the de-drying processing step by step according to the characteristic representation of the input text, and the characteristic representation is obtained in the last time step
Figure SMS_99
Then ∈>
Figure SMS_100
Mapping is carried out to obtain an output text.
As one of the realizable ways, the text generation method may be executed by a cloud server, that is, the functions of text generation are integrated in the cloud. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPS, virtual Private Server) service.
When the user wishes to generate an output text for the input text, the input text may be uploaded to the cloud server through the user terminal.
The above-mentioned user terminal may be, but is not limited to, such as: a cell phone, tablet, notebook, PDA (Personal Digital Assistant ), wearable device, PC (Personal Computer, personal computer), etc.
The cloud server acquires an input text from a user terminal; and then, performing inverse diffusion processing by utilizing the input text and random noise by utilizing a text generation model obtained based on pre-training to obtain an output text, generating an output text, and returning the output text to the user terminal.
The method provided by the embodiment of the application can be applied to various application scenes, and only a few of the methods are described herein:
application scenario 1: summary generation scenario
In this scenario, when training a text generation model, some articles may be used as input text samples, and summaries of the articles may be used as output text samples, thereby forming sample pairs. For example, some news text may be taken as an input text sample and a summary of the news text as an output text sample. For another example, some papers may be taken as input text samples and summaries of papers may be taken as output text samples. The news text and the abstract thereof, the paper and the abstract thereof and the like are easy to obtain on the network, so that a large number of training samples can be obtained as training data.
Then, obtaining the characteristic representation of the output text sample in the sample pair, and carrying out noise adding diffusion treatment on the characteristic representation of the output text sample to obtain the characteristic representation after noise adding; and in the training process of the text generation model, simulating inverse diffusion processing of noise adding diffusion based on the input text sample and the characteristic representation after noise adding so as to obtain an output text sample. The specific training process may be referred to in the method embodiment for the relevant descriptions of fig. 2 and 3, which are not repeated here.
When the abstract is actually generated, an input text is obtained, the input text and random noise are input into a text generation model which is trained in advance, and the text generation model carries out inverse diffusion processing based on the input text and the random noise to obtain the abstract of the input text.
By the method, accurate abstracts can be automatically generated for the input text, and abstracts can be automatically generated and released together when news texts and paper texts are released online. The text generating device may also be provided to the user as a tool, which is used to obtain an automatically generated summary by uploading his own document as input text.
Application scenario 2: machine translation scenario
In this scenario, when training the text generation model, some bilingual corpus may be used as a sample pair, where the bilingual corpus includes text in a first language as an input text sample, and text in a second language as an output text sample. For example, some chinese text and corresponding english text may be formed into sample pairs as training samples.
Then, obtaining the characteristic representation of the output text sample in the sample pair, and carrying out noise adding diffusion treatment on the characteristic representation of the output text sample to obtain the characteristic representation after noise adding; and in the training process of the text generation model, simulating inverse diffusion processing of noise adding diffusion based on the input text sample and the characteristic representation after noise adding so as to obtain an output text sample. The specific training process may be referred to in the method embodiment for the relevant descriptions of fig. 2 and 3, which are not repeated here.
When the machine translation is actually carried out, a text adopting a first language is obtained, the text adopting the first language and random noise are input into a text generation model which is obtained by training in advance, and the text generation model carries out back diffusion processing based on the text adopting the first language and the random noise, so that the text adopting a second language is obtained.
In this way, text in the first language can be automatically translated into text in the second language. For example, text may be automatically translated into another language for viewing by users in different countries or regions as the text is published on-line. For another example, the text generating device may be provided to the user as a tool, and the user uploads the document to be translated as the input text, and the tool may be used to obtain text in the specified language obtained by automatic translation.
But also to other application scenarios, not explicitly recited herein.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
According to an embodiment of another aspect, an apparatus for training a text generation model is provided. FIG. 6 illustrates a schematic block diagram of an apparatus for training text generation models, i.e., a model training apparatus in the architecture shown in FIG. 1, according to one embodiment. As shown in fig. 6, the apparatus 600 may include: a sample acquisition unit 601, a noise adding diffusion unit 602, and a model training unit 603. Wherein the main functions of each constituent unit are as follows:
The sample acquiring unit 601 is configured to acquire training data including a plurality of training samples, the training samples including sample pairs of input text samples and output text samples.
And the noise adding and diffusing unit 602 is configured to acquire the characteristic representation of the output text sample in the sample pair, and perform noise adding and diffusing processing on the characteristic representation of the output text sample to obtain a denoised characteristic representation.
The model training unit 603 is configured to take the input text sample and the feature representation after noise addition of the sample pair as an input training text generation model, and the text generation model simulates inverse diffusion processing of noise addition diffusion based on the input text sample and the feature representation after noise addition in the training process so as to take the output text sample as a target to be output.
Wherein the text generation model includes an encoder and a decoder. The encoder acquires the characteristic representation of the input text sample of the input text generation model, and the decoder performs inverse diffusion processing by using the characteristic representation of the input text sample and the characteristic representation after noise addition to obtain an output text sample.
When the model training unit 603 trains the text to generate a model, the training targets used include: the difference between the distribution produced by the noisy diffusion process and the distribution produced by the inverse diffusion process is minimized.
Further, when the model training unit 603 trains the text generation model, the training target used may further include: minimizing the difference between the distribution of the characteristic representation obtained by the diffusion treatment of the last time step and the normal distribution; and/or minimizing the difference between the feature representation resulting from the last time step back-diffusion and the feature representation of the output text sample.
As one of the possible manners, the noise adding and diffusing unit 602 may perform word embedding processing on the output text sample by using an embedding network, so as to obtain a feature representation of the output text sample.
As one of the possible ways, the noise adding and diffusing unit 602 may perform encoding processing on the output text sample by using an encoder in the text generation model, so as to obtain a feature representation of the output text sample.
As one of the realizable ways, in the noise-adding diffusion unit 602, in the noise-adding diffusion process, the diffusion process of the first time step adds noise to the feature representation of the output text sample, and the diffusion process of each subsequent time step adds noise to the feature representation obtained by the diffusion process of the last time step, and the feature representation obtained by the diffusion process of each time step conforms to the normal distribution.
As one of the realizations, in the back-diffusion process, the feature representation obtained by the back-diffusion process of each time step is obtained by up-sampling the posterior distribution of the feature representation obtained by the back-diffusion process of the previous time step.
As another implementation manner, in the back diffusion processing of the text generation model, the feature representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure SMS_101
Up-sampled from the a priori distribution of said +.>
Figure SMS_102
Is a prediction result of the characteristic representation obtained by the first diffusion process.
According to an embodiment of another aspect, a text generating apparatus is provided. Fig. 7 shows a schematic block diagram of a text generating apparatus according to an embodiment. As shown in fig. 7, the apparatus 700 may include: a text acquisition unit 701 and a text generation unit 702. Wherein the main functions of each constituent unit are as follows:
the text acquisition unit 701 is configured to acquire an input text.
The text generation unit 702 is configured to input the input text and random noise into a text generation model, and acquire an output text obtained by performing a back diffusion process on the text generation model based on the input text and random noise. The text generation model is trained in advance by the model training device shown in fig. 6.
Wherein the text generation model includes an encoder and a decoder. The encoder obtains a characteristic representation of the input text. The decoder uses the characteristic representation of the input text and random noise to carry out inverse diffusion processing and predicts the output text; wherein in the back diffusion process, the feature representation obtained by the back diffusion process of each time step is obtained by up-sampling the posterior distribution of the feature representation obtained by the back diffusion process of the previous time step; alternatively, the characteristic representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure SMS_103
Up-sampled from a priori distribution of +.>
Figure SMS_104
Is a prediction result of the characteristic representation obtained by the first diffusion process.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
In addition, the embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method of any one of the foregoing method embodiments.
And an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the preceding method embodiments.
Fig. 8 illustrates an architecture of an electronic device, which may include, inter alia, a processor 810, a video display adapter 811, a disk drive 812, an input/output interface 813, a network interface 814, and a memory 820. The processor 810, video display adapter 811, disk drive 812, input/output interface 813, network interface 814, and memory 820 may be communicatively coupled via a communication bus 830.
The processor 810 may be implemented by a general-purpose CPU, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technical solutions provided herein.
The Memory 820 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. The memory 820 may store an operating system 821 for controlling the operation of the electronic device 800, and a Basic Input Output System (BIOS) 822 for controlling the low-level operation of the electronic device 800. In addition, a web browser 823, a data storage management system 824, a model training device/text generation device 825, and the like may also be stored. The model training device/text generating device 825 may be an application program that specifically implements the operations of the foregoing steps in the embodiments of the present application. In general, when implemented in software or firmware, the relevant program code is stored in memory 820 and executed by processor 810.
The input/output interface 813 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Network interface 814 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 830 includes a path for transferring information between components of the device (e.g., processor 810, video display adapter 811, disk drive 812, input/output interface 813, network interface 814, and memory 820).
It is noted that although the above-described devices illustrate only the processor 810, video display adapter 811, disk drive 812, input/output interface 813, network interface 814, memory 820, bus 830, etc., the device may include other components necessary to achieve proper operation in an implementation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the present application, and not all the components shown in the drawings.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer program product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; also, as will occur to those of ordinary skill in the art, many modifications are possible in view of the teachings of the present application, both in the detailed description and the scope of its applications. In view of the foregoing, this description should not be construed as limiting the application.

Claims (14)

1. A method of training a text generation model, the method comprising:
acquiring training data comprising a plurality of training samples, wherein the training samples comprise sample pairs formed by input text samples and output text samples;
acquiring a characteristic representation of an output text sample in a sample pair, and carrying out noise adding diffusion treatment on the characteristic representation of the output text sample to obtain a noisy characteristic representation;
and taking the input text sample of the sample pair and the characteristic representation after noise addition as input training text generation models, wherein the text generation models simulate inverse diffusion processing of the noise addition diffusion based on the input text sample and the characteristic representation after noise addition in the training process so as to take the output text sample as a target to be output.
2. The method of claim 1, wherein the text generation model comprises an encoder and a decoder;
the encoder acquires a characteristic representation of an input text sample of an input text generation model, and the decoder performs the inverse diffusion processing by using the characteristic representation of the input text sample and the denoised characteristic representation to obtain the output text sample;
The training targets include: minimizing the difference between the distribution produced by the noisy diffusion process and the distribution produced by the inverse diffusion process.
3. The method of claim 2, wherein the training goal further comprises: minimizing the difference between the distribution of the characteristic representation obtained by the diffusion treatment of the last time step and the normal distribution; and/or minimizing the difference between the feature representation resulting from the back-diffusion of the last time step and the feature representation of the output text sample.
4. The method of claim 2, wherein obtaining a characteristic representation of the output text sample in the sample pair comprises:
word embedding processing is carried out on the output text sample by utilizing an embedding network, so that characteristic representation of the output text sample is obtained; or alternatively, the process may be performed,
and encoding the output text sample by using an encoder in the text generation model to obtain the characteristic representation of the output text sample.
5. The method of claim 1, wherein in the noise-added diffusion process, the first time step of diffusion process adds noise to the feature representation of the output text sample, and each subsequent time step of diffusion process adds noise to the feature representation obtained by the previous time step of diffusion process, and the feature representation obtained by the diffusion process of each time step conforms to a normal distribution.
6. The method according to claim 1 or 5, wherein in the back-diffusion process, the feature representation obtained by the back-diffusion process of each time step is obtained by up-sampling a posterior distribution of feature representations obtained by the back-diffusion process of the previous time step; alternatively, the characteristic representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure QLYQS_1
Up-sampled from the a priori distribution of said +.>
Figure QLYQS_2
Is a prediction result of the characteristic representation obtained by the first diffusion process.
7. A method of text generation, the method comprising:
acquiring an input text;
inputting the input text and random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the input text and the random noise to obtain an output text;
wherein the text generation model is pre-trained using the method of any one of claims 1 to 6.
8. The method of claim 7, wherein the text generation model comprises an encoder and a decoder;
the encoder obtains a feature representation of the input text;
the decoder performs inverse diffusion processing by using the characteristic representation of the input text and the random noise to predict an output text; wherein in the back diffusion process, the characteristic representation obtained by the back diffusion process of each time step is the characteristic obtained by the back diffusion process based on the previous time step The posterior distribution of the sign representation is obtained by up-sampling; alternatively, the characteristic representation obtained by the back diffusion processing of each time step is obtained based on prediction
Figure QLYQS_3
Up-sampled from the a priori distribution of said +.>
Figure QLYQS_4
Is a prediction result of the characteristic representation obtained by the first diffusion process.
9. A digest generation method, the method comprising:
acquiring an input text;
inputting the input text and random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the input text and the random noise to obtain a summary of the input text;
wherein the text generation model is pre-trained using the method of any one of claims 1 to 6.
10. A machine translation method, the method comprising:
acquiring a text adopting a first language;
inputting the text adopting the first language and the random noise into a text generation model, and performing inverse diffusion processing by the text generation model based on the text adopting the first language and the random noise to obtain a text adopting a second language;
wherein the text generation model is pre-trained using the method of any one of claims 1 to 6.
11. An apparatus for training a text generation model, the apparatus comprising:
a sample acquisition unit configured to acquire training data including a plurality of training samples including a sample pair of an input text sample and an output text sample;
the noise adding and diffusing unit is configured to acquire the characteristic representation of the output text sample in the sample pair, and perform noise adding and diffusing treatment on the characteristic representation of the output text sample to obtain the characteristic representation after noise adding;
and a model training unit configured to take the input text sample and the denoised feature representation of the sample pair as input training text generation models, wherein the text generation models simulate inverse diffusion processing of the denoised diffusion based on the input text sample and the denoised feature representation in a training process so as to take the output text sample as a target to be output.
12. A text generation apparatus, the apparatus comprising:
a text acquisition unit configured to acquire an input text;
a text generation unit configured to input the input text and random noise into a text generation model, and obtain an output text obtained by performing inverse diffusion processing on the text generation model based on the input text and the random noise;
Wherein the text generation model is pre-trained by the apparatus of claim 11.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.
14. An electronic device, comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 10.
CN202310387160.5A 2023-04-11 2023-04-11 Method for training text generation model, text generation method and device Active CN116108157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310387160.5A CN116108157B (en) 2023-04-11 2023-04-11 Method for training text generation model, text generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310387160.5A CN116108157B (en) 2023-04-11 2023-04-11 Method for training text generation model, text generation method and device

Publications (2)

Publication Number Publication Date
CN116108157A true CN116108157A (en) 2023-05-12
CN116108157B CN116108157B (en) 2023-09-12

Family

ID=86264079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310387160.5A Active CN116108157B (en) 2023-04-11 2023-04-11 Method for training text generation model, text generation method and device

Country Status (1)

Country Link
CN (1) CN116108157B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116962657A (en) * 2023-09-21 2023-10-27 中国科学院深圳先进技术研究院 Color video generation method, device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590531A (en) * 2017-08-14 2018-01-16 华南理工大学 A kind of WGAN methods based on text generation
CN114022582A (en) * 2021-09-22 2022-02-08 浙江工业大学 Text image generation method
CN114841142A (en) * 2022-04-22 2022-08-02 北京字跳网络技术有限公司 Text generation method and device, electronic equipment and storage medium
CN115641485A (en) * 2022-11-02 2023-01-24 阿里巴巴(中国)有限公司 Generative model training method and device
CN115641834A (en) * 2022-09-09 2023-01-24 平安科技(深圳)有限公司 Voice synthesis method and device, electronic equipment and storage medium
CN115687565A (en) * 2022-09-23 2023-02-03 阿里巴巴达摩院(杭州)科技有限公司 Text generation method and device
US20230109379A1 (en) * 2021-10-05 2023-04-06 Nvidia Corporation Diffusion-based generative modeling for synthetic data generation systems and applications

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590531A (en) * 2017-08-14 2018-01-16 华南理工大学 A kind of WGAN methods based on text generation
CN114022582A (en) * 2021-09-22 2022-02-08 浙江工业大学 Text image generation method
US20230109379A1 (en) * 2021-10-05 2023-04-06 Nvidia Corporation Diffusion-based generative modeling for synthetic data generation systems and applications
CN114841142A (en) * 2022-04-22 2022-08-02 北京字跳网络技术有限公司 Text generation method and device, electronic equipment and storage medium
CN115641834A (en) * 2022-09-09 2023-01-24 平安科技(深圳)有限公司 Voice synthesis method and device, electronic equipment and storage medium
CN115687565A (en) * 2022-09-23 2023-02-03 阿里巴巴达摩院(杭州)科技有限公司 Text generation method and device
CN115641485A (en) * 2022-11-02 2023-01-24 阿里巴巴(中国)有限公司 Generative model training method and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HONGYI YUAN,ZHENG YUAN,CHUANQI TAN ,FEI HUANG,SONGFANG HUANG: "SeqDiffiuSeq:Text Diffusion with Encoder--Decoder Transformers", ARXIV, pages 1 - 9 *
HONGYI YUAN: "SeqDiffiuSeq:Text Diffusion with Encoder-Decoder Transformers", ARXIV, pages 1 - 9 *
LING YANG: "Diffusion Models :A Comprehensive Survey of Methods and Applications", ARXIV, pages 1 - 49 *
YIFAN LI,KUN ZHOU,WAYNE XIN ZHAO,JI-RONG WEN: "Diffusion Models for Non-autoregressive Text Generation:A Survey", ARXIV, pages 1 - 10 *
ZOMI酱: "Diffusion Models:生成扩散模型", pages 1 - 10, Retrieved from the Internet <URL:https://blog.csdn.net/m0_37046057/article/details/126151446> *
卡卡猡特: "DDPM解读(一)| 数学基础,扩散与逆扩散过程和训练推理方法", pages 1 - 9, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com//p/530602853> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116962657A (en) * 2023-09-21 2023-10-27 中国科学院深圳先进技术研究院 Color video generation method, device, electronic equipment and storage medium
CN116962657B (en) * 2023-09-21 2024-02-27 中国科学院深圳先进技术研究院 Color video generation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116108157B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
US11734375B2 (en) Automatic navigation of interactive web documents
WO2022007823A1 (en) Text data processing method and device
CN112487182A (en) Training method of text processing model, and text processing method and device
WO2018085577A1 (en) Implicit bridging of machine learning tasks
US11397892B2 (en) Method of and system for training machine learning algorithm to generate text summary
CN111738025B (en) Artificial intelligence based translation method and device, electronic equipment and storage medium
CN110234018B (en) Multimedia content description generation method, training method, device, equipment and medium
US11709893B2 (en) Search method, electronic device and storage medium
EP3732629A1 (en) Training sequence generation neural networks using quality scores
CN116108157B (en) Method for training text generation model, text generation method and device
CN110929532B (en) Data processing method, device, equipment and storage medium
CN116050425A (en) Method for establishing pre-training language model, text prediction method and device
CN117173269A (en) Face image generation method and device, electronic equipment and storage medium
CN114792097B (en) Method and device for determining prompt vector of pre-training model and electronic equipment
CN116662496A (en) Information extraction method, and method and device for training question-answering processing model
CN115718830A (en) Method for training information extraction model, information extraction method and corresponding device
CN112464654B (en) Keyword generation method and device, electronic equipment and computer readable medium
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium
CN115270719A (en) Text abstract generating method, training method and device based on multi-mode information
CN110442706B (en) Text abstract generation method, system, equipment and storage medium
CN113919372A (en) Machine translation quality evaluation method, device and storage medium
CN116629336A (en) Method for training generation model, resource generation method and device
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN115982343B (en) Abstract generation method, and method and device for training abstract generation model
CN111988673B (en) Method and related equipment for generating video description sentences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant