CN112765317A - Method and device for generating image by introducing text of class information - Google Patents

Method and device for generating image by introducing text of class information Download PDF

Info

Publication number
CN112765317A
CN112765317A CN202110071013.8A CN202110071013A CN112765317A CN 112765317 A CN112765317 A CN 112765317A CN 202110071013 A CN202110071013 A CN 202110071013A CN 112765317 A CN112765317 A CN 112765317A
Authority
CN
China
Prior art keywords
image
text
class information
class
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110071013.8A
Other languages
Chinese (zh)
Inventor
周德宇
孙凯
胡名起
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110071013.8A priority Critical patent/CN112765317A/en
Publication of CN112765317A publication Critical patent/CN112765317A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a device for generating an image by a text with introduced class information, wherein the method for generating the image by the text with introduced class information comprises a training stage and a testing stage, wherein the training stage is based on generation of a confrontation network and utilizes a natural language text for describing an image, a class label of the text, a corresponding real image training generator and a discriminator; in the testing stage, the text and the class labels thereof are utilized to generate corresponding images in the generator, and the invention has the advantages that: according to the text information coding and the class information coding, text semantic image features and class information image features are generated through transcoding respectively, then the two levels of image features are fused, decoding is performed to generate images, the corresponding class information is introduced in the image generating process to strengthen the correlation between the generated images and texts, meanwhile, the images with higher resolution are generated step by step through the multi-stage generating process in the training process, and the training difficulty of directly generating high-resolution images is reduced.

Description

Method and device for generating image by introducing text of class information
Technical Field
The invention relates to the technical field of deep learning generation models, in particular to a method and a device for generating an image by introducing class information text.
Background
Text generation images are an important issue and have wide applications such as computer-aided medicine, news photo generation, etc.
The research of the text image generation method is mainly based on two models of generating formulas, namely a Conditional variable Auto-Encoder (CVAE) and a Conditional Generative Adaptive Network (CGAN). Among them, pictures generated by the CVAE method often have the problem of picture blurring, and the current mainstream methods are all based on CGAN models.
Because of the instability of GAN training, it is very difficult to directly generate high-resolution images from text descriptions, so a hierarchical generation countermeasure network (StackGAN) proposes a strategy of generating low-resolution images from text and then gradually generating high-resolution images from low-resolution images, and is widely applied in later work.
The problem of generating images in text can be mainly divided into two sub-problems: 1) how to capture text semantic representation is generally to extract visual related information in text semantic embedding and text description through a text encoder; 2) how to generate a vivid and text-fitting image by the generator using the semantic representation of the text in 1, so that human beings can misunderstand that the image is a real related picture.
Conventional text-generating image networks use only text information and ignore class information of the text itself. But the type information is also helpful for text generation, objects of the same type often have certain similarity, the introduction of the type information of the text can help solve the one-sided problem of single text description, and in addition, the correlation between the generated image and the text can be drawn. As a method for generating an image from a text, an index of Inclusion Score (IS) IS widely used for evaluation. The IS evaluates the quality of the generated image by calculating the correlation between the generated image distribution and the real image distribution, and the higher the IS value IS, the more clear and easier-to-identify entity IS contained in the generated image.
Disclosure of Invention
The invention provides a method and a device for generating an image by a text with class information introduced, aiming at the defects of the existing image generation method, which can introduce the class information to which the text belongs in the image generation process, restrict the relevance of the same class of text generated pictures through the class information and solve the problem that the single text description is not comprehensive.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a method and a device for generating an image by a text with introduced class information are characterized in that: the method comprises the following steps:
step 1, encoding a natural language text for describing an image to obtain text semantic embedded representation;
step 2, encoding the class label of the text to obtain class information semantic embedded representation;
step 3, mixing the text semantic embedded representation obtained in the step 1 with random noise, reading the text semantic embedded representation and the random noise by adopting a recurrent neural network, and outputting an object steganography of the text;
step 4, mixing the semantic embedded representation of the class information obtained in the step 2 with noise, and obtaining object hidden codes of the class information through variational inference;
step 5, respectively decoding the text hidden codes and the class information hidden codes obtained in the step 3 and the step 4 to obtain image characteristics of a text level and image characteristics of a class level;
step 6, decoding the image characteristics of the fusion text level and the image characteristics of the class level obtained in the step 5 to generate an image;
step 7, performing countermeasure training on the generated image obtained in the step 6 and the corresponding real image;
step 8, respectively up-sampling the image characteristics obtained in the step 5 to obtain image characteristics with different dimensions, and repeating the steps 6-7 to gradually generate images with higher resolution;
and 9, inputting the text and the class labels thereof in the testing stage, repeating the steps 1-6, and generating a high-resolution image in the image generator through multiple stages.
Further, in step 1, the method for encoding the natural language text describing the image is as follows: segmenting the natural language text to obtain a word sequence p ═ w (with the length d)1,w2,…wd) Wherein each word wiRepresenting by adopting a pre-trained word vector, i is 1-d, and coding the text by using the obtained word vector;
further, in the step 2, if each text-image data only belongs to one class, the class information is encoded in a one-bit efficient coding (one-hot) manner; if the text-image data term is multiple classes, the class information is encoded using a multi-bit-efficient coding (multi-hot) approach.
Further, in step 3, the recurrent neural network adopts a long-term memory network.
Further, in the step 3, a direct connection mode is adopted for a mixed mode of text semantic embedding representation and noise, the adopted noise is Gaussian noise z-N (0, I), and a mixed result of text semantic embedding s and z is zs=(s,z)。
Furthermore, the mixed mode of class information semantic embedding and noise in the step 4 is variable estimation, that is, a variable encoder estimates the hidden attribute distribution q (z) of class information under the condition of given noise z and class information ccI c, z), the semantic embedding of class information sampled from the distribution represents zc
Further, in the step 5, an upsampling operation is adopted to decode the text steganography and the class information steganography to obtain the image characteristics.
Further, in the step 6, the image feature h generated by the textcImage feature h generated by sum class informationrThe image features are fused in a cascading mode, and the fused image features canIs denoted by h1=hc⊙hr(ii) a And decoding the fused image features by adopting a convolutional neural network to generate an image.
Further, the confrontation training method in step 7 is as follows: and respectively obtaining image implicit representation of the generated image and the real image through a convolutional neural network, simultaneously inputting corresponding text and class information, and outputting scores of image real degree, image and text matching degree and image and class information matching degree.
Further, in the step 8, a staged image generation method is adopted to generate pictures with higher resolution step by step, taking two-stage image generation as an example, the first stage generates a picture with low resolution by using the fused image features; the second stage generates image characteristics h of the text obtained in the first stagecImage feature h generated by sum class informationrAnd performing upsampling to obtain image features and text features with higher dimensionality, and then generating a picture with higher resolution.
Further, in the two-stage image generation network, in which the resolution of the image generated in the first stage is 64 × 64 and the resolution of the image generated in the second stage is 128 × 128, the model may be further stacked.
Further, the input of the test stage in step 9 is text and its class labels, and the high-resolution image is generated in stages through the generator model obtained in the training stage.
A method and a device for generating an image by introducing a text of class information are characterized in that the device comprises:
the text encoder is used for encoding the text describing the image to obtain text semantic embedded representation;
the class information encoder is used for encoding class information of a text describing an image to obtain semantic embedded representation of the class information;
the generator comprises a recurrent neural network transcoder, a variational inference transcoder, an image feature fusion device and an image decoder, wherein the recurrent neural network transcoder is used for reading the text semantic embedding and the hidden state of the previous step of the transcoder and outputting the corresponding text image feature; the variational inference transcoder is used for reading semantic embedding of input class information and outputting corresponding class information image characteristics; the image feature fusion device fuses text image features and information-like image features generated by the recurrent neural network transcoder and the variational inference; the image decoder decodes the input fusion image characteristics to generate an image;
the discriminator comprises an image semantic discriminator, a text semantic discriminator and a class information discriminator, and the image semantic discriminator judges the correlation between the generated image and the real image; the text semantic discriminator judges the correlation between the generated image and the corresponding text; the class information discriminator judges the correlation between the generated image and the class information. The discriminator comprises an image semantic discriminator, a text semantic discriminator and a class information discriminator, wherein the image semantic discriminator judges the correlation between the generated image and the real image; the text semantic discriminator judges the correlation between the generated image and the corresponding text; the class information discriminator judges the correlation between the generated image and the class information.
Compared with the prior art, the invention has the following beneficial effects:
the invention introduces extra class information in the process of generating the image by the text, such as class labels of birds and different object labels contained in the picture, the class information obtains hidden codes by a variation inference method, entity information behind the class labels can be fully mined, image features generated by the text and image features generated by the class information are fused in an image space, and the training difficulty is reduced.
Drawings
FIG. 1 is a flow chart of a generator method of the present invention.
FIG. 2 is a flow chart of a method of the discriminator of the present invention.
Detailed Description
The invention will be further elucidated with reference to the drawings and specific embodiments, it being understood that these examples are intended to illustrate the invention only and are not intended to limit the scope of the invention. Various equivalent modifications of the invention, which fall within the scope of the appended claims of this application, will occur to persons skilled in the art upon reading this disclosure. A text-generating image method for generating introductory class information of a confrontational text based on a condition, as shown in fig. 1-2, comprising the steps of:
step 1, constructing a text encoding unit comprising a text encoder and a recurrent neural network transcoder, as described in S1 in fig. 1. A natural language text is input into a text coder and an embedded representation of the text is output. The natural language text is an English text, a word sequence with the length d is obtained after stop words are removed, and each word is represented by a pre-trained word vector.
For example, inputting the natural language text "This little text is moved with blackprimary and secondaries", removing stop words to obtain the final word sequence of [ little text, bird, move, blue, black, primary, secondaries ], d ═ 7, we set the maximum value of d to 18, the insufficient part is filled in, and the excessive part is cut out.
The goal of the recurrent neural network transcoder is to extract high-level semantic features in natural language text, served by a pre-trained Bi-directional long-and-short memory network (Bi-LSTM). Input text sequence, hidden state h of each word outputiAs the feature of the word level of the word, the output hidden state time sequence average at all the time is embedded as the semantic meaning of the text, namely
Figure BDA0002905777480000051
The method is only one preferred mode of the text coding unit, and other reasonable coding modes can be adopted for coding.
And 2, constructing a class information coding unit comprising a class information coder and a variation inference coder, as described in S2 in FIG. 1. The class information encoder inputs class information corresponding to the text and outputs an embedded representation of the class information. The class information of the text has two cases, namely single class and multi-class, if each text has only one class label, we encode the class information in a one-hot form, for example, "This little text is an album with a blackprimary and secondary," there is only one class label, i.e., "there is only one class label, which represents that the type of bird corresponding to the text is changed, and there are 20 different classes of birds in the data set, we encode the class information into a 20-dimensional one-hot vector [1,0, … 0,0 ]; if the text has a plurality of class attributes, the text is coded by using a multi-hot coding mode alignment, for example, coded into [1,0,0,1,0], so that the text has class labels of a first class and a third class.
After class information is coded into a class vector c, the class vector is converted into class information to be embedded by adopting a variation inference mode. The encoder uses the class vector c and the noise data
Figure BDA0002905777480000052
Conditional on a given z and
Figure BDA0002905777480000053
the posterior inference of the hidden variable z is performed. We assume the posterior distribution of hidden variables
Figure BDA0002905777480000054
We here use a three-layer linear neural network for inference, subject to multivariate diagonal gaussian distributions, where the mean and variance of the implicit distributions are learned by the encoder, or more complex encoding schemes based on the distribution of classes in the data.
And 3, constructing a text information and class information fusion module as described in S3 in FIG. 1. The module fuses image features generated by text and image features generated by class information. The fusion module is composed of an up-sampling network for extracting image characteristics, text image characteristics and the splicing of information-like image characteristics. For text information, upsampling network input text semantic embedding s and joint data z of noisesAnd obtaining image characteristics h of corresponding dimensionality by up-samplings(ii) a For class information, the up-sampling network predicts the posterior distribution of hidden variables from class information encoder
Figure BDA0002905777480000062
Intermediate sampling to obtain input zcObtaining the class information image characteristics h with the same dimensionalityc(ii) a And finally, performing point multiplication operation on the text information image characteristics and the class information image characteristics to obtain finally fused image characteristics h.
Step 4, constructing the condition generating countermeasure network is composed of a generator and an arbiter, as described in S4 in fig. 1 and fig. 2. The generator is composed of a convolution neural network, and the discriminator is composed of an image discriminator, a text semantic discriminator and a class information discriminator. The generator decodes the image by adopting the scale-invariant convolution layer, and converts the characteristics of the fused image into a finally generated image; the image discriminator scores the truth of the generated image, the text semantic discriminator evaluates the relation between the generated image and the original text, and the class information discriminator scores the matching degree of the class information of the generated image.
Step 5, respectively inputting the natural language text describing the image and the corresponding class information into a text encoder and a class information encoder to obtain text semantic embedded representation and class information embedded representation;
step 6, the generated text semantic embedded representation and class information embedded representation output text information and class information fusion module to obtain the image characteristics fusing the two information;
step 7, inputting the fused image features into an image generator to generate a picture with lower resolution, wherein the set resolution is 64 × 64; and inputting corresponding real pictures, natural language texts and class information into the discriminator to perform countermeasure training. The loss functions of the generator and the discriminator in the countermeasure training process are respectively as follows:
LD=-(Ex~P[log[D(x)r]+Ex~Q[1-log[D(x)r])
-(Ex~P[log[D(x)c]+Ex~Q[1-log[D(x)c])
-(Ex~P[log[D(x,s)]+Ex~Q[1-log[D(x,s)])
Figure BDA0002905777480000061
in the formula, P is the actual data, Q is the generated data distribution, D (x)rRepresenting the probability that the generated image x is true, D (x)cRepresenting the probability that the generated image belongs to the correct class label, and D (x, s) representing the probability of a match between the generated image and the descriptive text.
Two KL divergence terms are added to the loss function of the generator as constrained two hidden variables zcAnd zsLoss of regularization. During training, first, with a fixed generator, the loss L is reducedDOptimizing the arbiter D (x), and then, in the case of a fixed arbiter, pressing the penalty LGThe generator G is optimized. The above two steps are alternately trained by small batches of random gradient descent.
And 8, respectively up-sampling the image characteristic text image characteristic and the class information image characteristic generated in the first stage to 128 × 128 dimensionality. The confrontational training process of step 7 is repeated in a higher dimension, generating higher resolution images.
When the network is trained, Normalization techniques such as Batch Normalization and Spectral Normalization can be added into the generator and the discriminator to stabilize the training, and the generation quality is further improved.
In summary, compared with the previous method, the method for generating the image by the text with the introduced class information disclosed by the invention is additionally provided with the modules for encoding the class information and fusing the class information and the text information. The method introduces the category mark of the text, limits the category of the generated image in the discriminator, and improves the correlation between the generated image and the text by introducing the category information.
In the experimental process, the benchmark model based on the StackGAN is based on the experiment, the dimensions of the hidden variable and the noise variable are set to be 128, the arbiter is trained once per iteration of the countertraining, and the generator is trained once; training the network using Adam solver, where β1=0.5,β20.999; the learning rate α is 0.0002.
The IS IS promoted to 3.74 +/-0.03 from 3.35 +/-0.02 on the CUB data set; the IS IS promoted from 7.34 +/-0.17 to 7.46 +/-0.30 on the COCO data set, and the image generation quality and the entity definition in the generated image are better than those of a reference model.
The above examples are only preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above examples, and it should be noted that: it will be apparent to those skilled in the art that various modifications, alterations, combinations, and simplifications may be made without departing from the spirit of the invention, which is equivalent to the substitution and is intended to be within the scope of the invention.

Claims (13)

1. A method and a device for generating an image by a text with introduced class information are characterized in that: the method comprises the following steps:
step 1, encoding a natural language text for describing an image to obtain text semantic embedded representation;
step 2, encoding the class label of the text to obtain class information semantic embedded representation;
step 3, mixing the text semantic embedded representation obtained in the step 1 with random noise, reading the text semantic embedded representation and the random noise by adopting a recurrent neural network, and outputting an object steganography of the text;
step 4, mixing the semantic embedded representation of the class information obtained in the step 2 with noise, and obtaining object hidden codes of the class information through variational inference;
step 5, respectively decoding the text hidden codes and the class information hidden codes obtained in the step 3 and the step 4 to obtain image characteristics of a text level and image characteristics of a class level;
step 6, decoding the image characteristics of the fusion text level and the image characteristics of the class level obtained in the step 5 to generate an image;
step 7, performing countermeasure training on the generated image obtained in the step 6 and the corresponding real image;
step 8, respectively up-sampling the image characteristics obtained in the step 5 to obtain image characteristics with different dimensions, and repeating the steps 6-7 to gradually generate images with higher resolution;
and 9, inputting the text and the class labels thereof in the testing stage, repeating the steps 1-6, and generating a high-resolution image in the image generator through multiple stages.
2. The method and apparatus for generating images according to the text of the lead-in information as claimed in claim 1, wherein: in step 1, the method for encoding the natural language text describing the image includes: segmenting the natural language text to obtain a word sequence p ═ w (with the length d)1,w2,…wd) Wherein each word wiAnd representing by using a pre-trained word vector, i is 1-d, and coding the text by using the obtained word vector.
3. The method and apparatus for generating images according to the text of the lead-in information as claimed in claim 1, wherein: in step 2, if each text-image data only belongs to one class, the class information is encoded by using a one-hot (one-hot) method, and if the text-image data belongs to a plurality of classes, the class information is encoded by using a multi-hot (multi-hot) method.
4. The method and apparatus for generating images according to the text of the lead-in information as claimed in claim 1, wherein: in the step 3, the recurrent neural network adopts a bidirectional long-time and short-time memory network.
5. The method and apparatus for generating images according to the text of the lead-in information as claimed in claim 1, wherein: in the step 3, the text semantic embedding expression and the noise mixing mode adopt a direct connection mode, the adopted noise is Gaussian noise z-N (0, I), and the text semantic embedding s and the mixing result of z are zs=(s,z)。
6. The method and apparatus for generating images according to the text of the lead-in information as claimed in claim 1, wherein: the mixed mode of class information semantic embedding and noise in the step 4For variational inference, i.e. the variational encoder infers the hidden attribute distribution q (z) of class information given noise z and class information ccI c, z), the semantic embedding of class information sampled from the distribution represents zc
7. The method and apparatus for generating images according to the text of the lead-in information as claimed in claim 1, wherein: in the step 5, the text steganography and the class information steganography are decoded by adopting an upsampling operation to obtain image characteristics.
8. The method and apparatus for generating image according to the text with the introduction of class information as claimed in claim 1, wherein in the step 6, the image feature h generated by the textcImage feature h generated by sum class informationrThe fusion is carried out in a dot multiplication mode, and the fused image features can be expressed as h1=hc⊙hr(ii) a And decoding the fused image features by adopting a convolutional neural network to generate an image.
9. The method and apparatus for generating image according to the text with the generic information as claimed in claim 1, wherein the countertraining method in step 7 is: and respectively obtaining image implicit representation of the generated image and the real image through a convolutional neural network, simultaneously inputting corresponding text and class information, and outputting scores of image real degree, image and text matching degree and image and class information matching degree.
10. The method and apparatus for generating image according to the text with the introduction of class information as claimed in claim 1, wherein the step 8 employs a staged image generation method to generate the pictures with higher resolution step by step, for example, two-stage image generation, the first stage generates the pictures with lower resolution by using the fused image features; the second stage generates image characteristics h of the text obtained in the first stagesImage feature h generated by sum class informationcUp-sampling is carried out to obtain image characteristics and text characteristics with higher dimensionality, and then generation is carried outA picture with higher resolution is formed.
11. The method and apparatus of claim 10, wherein the model is further stacked in a two-stage image generation network, wherein the resolution of the first stage image generation is 64 x 64 and the second stage image generation is 128 x 128.
12. The method and apparatus for generating image from text with class information as claimed in claim 1, wherein the input of the test stage in step 9 is text and its class label, and the high resolution image is generated in stages by the generator model obtained in the training stage.
13. A method and a device for generating an image by introducing a text of class information are characterized in that the device comprises:
the text encoder is used for encoding the text describing the image to obtain text semantic embedded representation;
the class information encoder is used for encoding class information of a text describing an image to obtain semantic embedded representation of the class information;
the generator comprises a recurrent neural network transcoder, a variational inference transcoder, an image feature fusion device and an image decoder, wherein the recurrent neural network transcoder is used for reading the text semantic embedding and the hidden state of the previous step of the transcoder and outputting the corresponding text image feature; the variational inference transcoder is used for reading semantic embedding of input class information and outputting corresponding class information image characteristics; the image feature fusion device fuses text image features and information-like image features generated by the recurrent neural network transcoder and the variational inference; the image decoder decodes the input fusion image characteristics to generate an image;
the discriminator comprises an image semantic discriminator, a text semantic discriminator and a class information discriminator, and the image semantic discriminator judges the correlation between the generated image and the real image; the text semantic discriminator judges the correlation between the generated image and the corresponding text; the class information discriminator judges the correlation between the generated image and the class information.
CN202110071013.8A 2021-01-19 2021-01-19 Method and device for generating image by introducing text of class information Pending CN112765317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110071013.8A CN112765317A (en) 2021-01-19 2021-01-19 Method and device for generating image by introducing text of class information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110071013.8A CN112765317A (en) 2021-01-19 2021-01-19 Method and device for generating image by introducing text of class information

Publications (1)

Publication Number Publication Date
CN112765317A true CN112765317A (en) 2021-05-07

Family

ID=75703285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110071013.8A Pending CN112765317A (en) 2021-01-19 2021-01-19 Method and device for generating image by introducing text of class information

Country Status (1)

Country Link
CN (1) CN112765317A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254694A (en) * 2021-05-21 2021-08-13 中国科学技术大学 Text-to-image method and device
WO2023030348A1 (en) * 2021-08-31 2023-03-09 北京字跳网络技术有限公司 Image generation method and apparatus, and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543159A (en) * 2018-11-12 2019-03-29 南京德磐信息科技有限公司 A kind of text generation image method and device
US20190380657A1 (en) * 2015-10-23 2019-12-19 Siemens Medical Solutions Usa, Inc. Generating natural language representations of mental content from functional brain images
CN111968193A (en) * 2020-07-28 2020-11-20 西安工程大学 Text image generation method based on StackGAN network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190380657A1 (en) * 2015-10-23 2019-12-19 Siemens Medical Solutions Usa, Inc. Generating natural language representations of mental content from functional brain images
CN109543159A (en) * 2018-11-12 2019-03-29 南京德磐信息科技有限公司 A kind of text generation image method and device
CN111968193A (en) * 2020-07-28 2020-11-20 西安工程大学 Text image generation method based on StackGAN network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254694A (en) * 2021-05-21 2021-08-13 中国科学技术大学 Text-to-image method and device
CN113254694B (en) * 2021-05-21 2022-07-15 中国科学技术大学 Text-to-image method and device
WO2023030348A1 (en) * 2021-08-31 2023-03-09 北京字跳网络技术有限公司 Image generation method and apparatus, and device and storage medium

Similar Documents

Publication Publication Date Title
CN109543159B (en) Text image generation method and device
CN110795556B (en) Abstract generation method based on fine-grained plug-in decoding
JP5128629B2 (en) Part-of-speech tagging system, part-of-speech tagging model training apparatus and method
CN111897908A (en) Event extraction method and system fusing dependency information and pre-training language model
CN112765316A (en) Method and device for generating image by introducing text of capsule network
CN111444367B (en) Image title generation method based on global and local attention mechanism
CN111078866B (en) Chinese text abstract generation method based on sequence-to-sequence model
CN113283244B (en) Pre-training model-based bidding data named entity identification method
CN110032638B (en) Encoder-decoder-based generative abstract extraction method
CN112765317A (en) Method and device for generating image by introducing text of class information
CN110390049B (en) Automatic answer generation method for software development questions
CN113140020B (en) Method for generating image based on text of countermeasure network generated by accompanying supervision
CN111402365B (en) Method for generating picture from characters based on bidirectional architecture confrontation generation network
CN113961736A (en) Method and device for generating image by text, computer equipment and storage medium
CN107463928A (en) Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM
CN114529903A (en) Text refinement network
CN113140023A (en) Text-to-image generation method and system based on space attention
US20220300708A1 (en) Method and device for presenting prompt information and storage medium
Bie et al. RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
CN110750669B (en) Method and system for generating image captions
CN117034951A (en) Digital person with specific language style based on large language model
CN117093864A (en) Text generation model training method and device
CN116521857A (en) Method and device for abstracting multi-text answer abstract of question driven abstraction based on graphic enhancement
CN115496134A (en) Traffic scene video description generation method and device based on multi-modal feature fusion
CN115331073A (en) Image self-supervision learning method based on TransUnnet architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination