CN111260740A - Text-to-image generation method based on generation countermeasure network - Google Patents

Text-to-image generation method based on generation countermeasure network Download PDF

Info

Publication number
CN111260740A
CN111260740A CN202010046540.9A CN202010046540A CN111260740A CN 111260740 A CN111260740 A CN 111260740A CN 202010046540 A CN202010046540 A CN 202010046540A CN 111260740 A CN111260740 A CN 111260740A
Authority
CN
China
Prior art keywords
image
matrix
word
generation
feature matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010046540.9A
Other languages
Chinese (zh)
Other versions
CN111260740B (en
Inventor
田安捷
陆璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010046540.9A priority Critical patent/CN111260740B/en
Publication of CN111260740A publication Critical patent/CN111260740A/en
Application granted granted Critical
Publication of CN111260740B publication Critical patent/CN111260740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a text-to-image generation method based on generation of a countermeasure network, which comprises the following steps: 1) inputting a text description into a network, and generating a word feature matrix and a sentence feature vector according to the text description; 2) adding conditions and noise vectors to the sentence characteristic vectors to obtain an image characteristic matrix; 3) calculating a word context matrix of the image features; 4) calculating in the generation of the countermeasure network by utilizing the image characteristic matrix and the word context matrix, and gradually generating images with higher and higher resolutions in three stages; 5) acquiring a local image feature matrix according to the generated image; 6) and evaluating the similarity between the generated image and the text description, and optimizing the next image generation. The image generation method of the invention not only can ensure that the content of the generated image is consistent with the semantics of the text description, but also can ensure that the generated image has more optimized image details, can effectively improve the resolution of the generated image and increase the diversity of the generated image.

Description

Text-to-image generation method based on generation countermeasure network
Technical Field
The invention relates to the field of image generation, in particular to a text-to-image generation method based on a generation countermeasure network.
Background
Generating high resolution and realistic images based on textual descriptions is a very meaningful study. In industry, it not only provides help for deeper visual understanding for related research in the field of computer vision, but also has wide practical application. In academia, it has become one of the most popular research directions in the field of computer vision in recent years, with significant results. Recurrent Neural Networks (RNNs) and generative countermeasure networks (GANs) are often combined to generate realistic images based on natural language descriptions. These methods have been able to produce satisfactory results in certain fields, such as creating a fine image of a flower or bird.
The original GAN model contains a generator and an arbiter. The generator can generate samples distributed to real data through optimization, and therefore the purpose of deceiving the discriminator is achieved. The trained discriminator may separate the true data distribution samples from the spurious samples generated by the generator. The generator and the discriminator reach the optimum in the mutual game, so that the generated result is better and better.
While impressive results have been achieved, many challenges remain in training conditions to generate an antagonistic network. Most models tend to learn only one data distribution pattern, which tends to collapse, i.e., the generator will generate the same image each time. Although the image is sharp, it is unchanged. Another major challenge is that the training process is unstable and the losses obtained during the training process do not converge. In addition, most existing image generation methods focus on global sentence vectors, and useful fine-grained image features and word-level text information are ignored. Furthermore, in evaluating the generated image, it is not assumed that each sub-region of the image has a different effect on the overall image. Such a method would on the one hand hinder the generation of high quality images and on the other hand also reduce the diversity of the generated images. This problem becomes more severe as the scenes and objects that need to be generated are more complex.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a text-to-image generation method based on a generation countermeasure network, and can achieve the purposes of meeting the condition that the content of a generated image is consistent with the semantics of text description and enabling the generated image to have more optimized image details, effectively improving the resolution of the generated image and increasing the diversity of the image.
The purpose of the invention is realized by the following technical scheme:
a text-to-image generation method based on generation of a countermeasure network, comprising the steps of:
1) inputting a text description into a network, and generating a word feature matrix and a sentence feature vector according to the text description;
2) adding conditions and noise vectors to the sentence characteristic vectors to obtain an image characteristic matrix;
3) calculating a word context matrix of the image features;
4) calculating in the generation of the countermeasure network by utilizing the image characteristic matrix and the word context matrix, and gradually generating images with higher and higher resolutions in three stages;
5) acquiring a local image feature matrix according to the generated image;
6) and evaluating the similarity between the generated image and the text description, and optimizing the next image generation.
In the step 1), the text description is used for describing the attributes of more than one object, and two hidden states corresponding to each word in the text description are connected in series through a bidirectional long-term and short-term memory network to represent the semantics of the words; the attributes comprise type, size, number, shape and position; the last hidden state of the two hidden states is connected to obtain a global sentence vector, and the rest hidden states are connected in series to obtain a word feature matrix.
The step 2) is specifically as follows:
2.1) adding conditional forming conditional enhancement to the sentence feature vector to enhance the training data and avoid overfitting;
2.2) splicing the noise vector sampled from the standard normal distribution to obtain an image characteristic matrix.
In step 3), the word context matrix of the image features is calculated by using the image feature matrix obtained in step 2) and the word feature matrix obtained in step 1), and each column of the word context matrix of the image features represents a word context vector associated with a sub-region of the image.
The word context matrix of the image features is obtained by calculating the image feature matrix obtained in the step 2) and the word feature matrix obtained in the step 1), and specifically comprises the following steps:
firstly, converting the word characteristics into a public semantic space of image characteristics by adding a new sensor layer;
then calculating the weight of the jth sub-area of the image corresponding to the ith word: the method is obtained by the normalized calculation of the product of the image feature vector of the jth column (namely, a column vector of an image feature matrix) and the word feature vector of the ith column (namely, a column vector of a word feature matrix);
then, calculating the product sum of the weights of each word and the corresponding image sub-region to obtain a word context vector of the image sub-region; each column vector of the word feature matrix corresponds to a word context vector for a subregion of the image.
The step 4) is specifically as follows:
4.1) inputting the image feature matrix into the first-layer generation countermeasure network to obtain an image feature matrix after primary optimization, and performing 3x3 convolution on the image feature matrix to output an image with 64 x 64 resolution;
4.2) inputting the image feature matrix and the word context matrix after the primary optimization into a second layer generation countermeasure network to obtain an image feature matrix after the secondary optimization, and performing 3x3 convolution on the image feature matrix to output an image with a resolution of 128 × 128;
4.3) adding an attention mechanism to the image feature matrix, strengthening key subregions of the image, weakening unimportant regions of the image, and updating a word context matrix by utilizing the step 3);
4.4) inputting the image feature matrix after the second optimization and the updated word context matrix into the third layer generation countermeasure network to obtain a final image feature matrix, and performing 3x3 convolution on the final image feature matrix to output an image with 256 × 256 resolution.
In step 5), the local image feature matrix is obtained according to the generated image and is completed through an image encoder; the image encoder utilizes the inclusion-v 3 model pre-trained on the ImageNet dataset, which is essentially a convolutional neural network.
In step 6), the specific process of evaluating the similarity between the generated image and the text description is as follows:
6.1) adding an attention mechanism to the local image feature matrix, strengthening key subregions of the image and weakening unimportant regions of the image;
6.2) calculating cosine similarity of the optimized local image feature matrix and the word feature matrix, and evaluating the similarity of the text description and the generated image to assist in optimizing a generator in the generation countermeasure network.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention adopts an attention mechanism, and the central idea is to distinguish information of a plurality of parts and add attention of different degrees to different parts so as to attach importance to the information which needs to be focused. Based on the method, the text-to-image generation method based on the generation countermeasure network is provided to pay more attention to the key area of the generated image, so that the image with richer and richer details is generated through multiple stages.
In the conventional text-to-image generation method, when a training condition generates a confrontation network, most of the existing methods focus on a global sentence vector, and useful image features with fine-grained details and word-level text information are ignored. Also, in evaluating the quality of the generated image, it is neglected that each sub-region of the image has a different effect on the whole image. These methods may result in less important sub-regions in the image (e.g., background regions of the image) being of excessive interest, and some fine-grained details that need to be continually optimized being ignored. In contrast, the present invention provides a generation countermeasure network with an added image attention mechanism, which generates a higher resolution and more detailed image by focusing on optimizing the generation effect of important sub-regions of the image, i.e., more attention on the important sub-regions and rich-content sub-regions of the image when generating the image.
Drawings
Fig. 1 is an architecture diagram of a text-to-image generation method based on a generation countermeasure network according to the present invention.
Fig. 2 is a flow chart of a text-to-image generation method based on generation of a countermeasure network according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1 and 2, a text-to-image generation method based on generation of a confrontation network includes the following steps:
1) a meaningful text description is input into the network, and the text description can be a description of attributes representative of the type, size, number, color, shape, position and the like of one or more entity objects. The two hidden states corresponding to each word in the text description are concatenated to represent the word's semantics by using a bi-directional long-short term memory network (bi-directional LSTM). The global sentence vector is obtained by connecting the last hidden state, and the word feature matrix is obtained by connecting the other hidden states in series.
2) Acquiring an image characteristic matrix, wherein the specific process is as follows:
2.1) adding condition formation condition enhancement to the obtained sentence feature vector to enhance training data and avoid overfitting;
and 2.2) splicing the condition enhancement and the noise vector sampled from the standard normal distribution to obtain an image characteristic matrix.
3) And calculating a word context matrix of the image features by using the image feature matrix obtained in the step 2) and the word feature matrix obtained in the step 1), wherein each column of the matrix represents a word context vector associated with a sub-region of the image.
4) And calculating and optimizing an image characteristic matrix by utilizing the three layers of generation countermeasure networks to generate an image. The specific operation of each layer network is as follows:
4.1) inputting the image feature matrix into the first-layer generation countermeasure network to obtain an image feature matrix after primary optimization, and performing 3x3 convolution on the image feature matrix to output an image with 64 x 64 resolution;
4.2) inputting the image feature matrix and the word context matrix after the primary optimization into a second layer generation countermeasure network to obtain an image feature matrix after the secondary optimization, and performing 3x3 convolution on the image feature matrix to output an image with a resolution of 128 × 128;
4.3) adding an attention mechanism to the image feature matrix, strengthening key subregions of the image, weakening unimportant regions of the image, and updating the word context matrix by utilizing the step 3;
4.4) inputting the image feature matrix after the second optimization and the updated word context matrix into the third layer generation countermeasure network to obtain a final image feature matrix, and performing 3x3 convolution on the final image feature matrix to output an image with 256 × 256 resolution.
5) The generated high resolution image is mapped to a local image feature matrix using the inclusion-v 3 model pre-trained on the ImageNet dataset as the image encoder. The image encoder is essentially a convolutional neural network.
6) And evaluating the similarity of the generated image and the text description, wherein the specific process is as follows:
6.1) adding an attention mechanism to the local image feature matrix, strengthening key subregions of the image and weakening unimportant regions of the image;
6.2) calculating cosine similarity of the optimized local image feature matrix and the word feature matrix, and evaluating the similarity of the text description and the generated image to assist in optimizing a generator in the generation countermeasure network.
In summary, after the scheme is adopted, the invention provides a new method for the process of generating the text to the image, and the image is generated by using the generation countermeasure network added with the attention mechanism, so that the content of the generated image is ensured to be consistent with the semantics of the text description, the generated image can be ensured to have more optimized image details, the resolution of the generated image can be effectively improved, and the diversity of the generated image is increased.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (8)

1. A text-to-image generation method based on generation of a confrontation network is characterized by comprising the following steps:
1) inputting a text description into a network, and generating a word feature matrix and a sentence feature vector according to the text description;
2) adding conditions and noise vectors to the sentence characteristic vectors to obtain an image characteristic matrix;
3) calculating a word context matrix of the image features;
4) calculating in the generation of the countermeasure network by utilizing the image characteristic matrix and the word context matrix, and gradually generating images with higher and higher resolutions in three stages;
5) acquiring a local image feature matrix according to the generated image;
6) and evaluating the similarity between the generated image and the text description, and optimizing the next image generation.
2. The method for generating text-to-image based on generation countermeasure network as claimed in claim 1, wherein in step 1), the text description is a description of attributes of more than one object, and two hidden states corresponding to each word in the text description are concatenated through a bidirectional long-short term memory network to represent the semantic of the word; the attributes comprise type, size, number, shape and position; the last hidden state of the two hidden states is connected to obtain a global sentence vector, and the rest hidden states are connected in series to obtain a word feature matrix.
3. The text-to-image generation method based on generation of an countermeasure network according to claim 1, wherein the step 2) is specifically as follows:
2.1) adding conditional forming conditional enhancement to the sentence feature vector to enhance the training data and avoid overfitting;
2.2) splicing the noise vector sampled from the standard normal distribution to obtain an image characteristic matrix.
4. The text-to-image generation method based on generation of countermeasure network as claimed in claim 1, wherein in step 3), the word context matrix of the image feature is calculated by using the image feature matrix obtained in step 2) and the word feature matrix obtained in step 1), and each column of the word context matrix of the image feature represents a word context vector associated with a sub-region of the image.
5. The method for generating texts to images based on generation of countermeasure networks according to claim 4, wherein the word context matrix of the image features is calculated by using the image feature matrix obtained in step 2) and the word feature matrix obtained in step 1), specifically:
firstly, converting the word characteristics into a public semantic space of image characteristics by adding a new sensor layer;
then calculating the weight of the jth sub-area of the image corresponding to the ith word: the method is obtained by the normalized calculation of the product of the image feature vector of the jth column and the word feature vector of the ith column;
then, calculating the product sum of the weights of each word and the corresponding image sub-region to obtain a word context vector of the image sub-region; each column vector of the word feature matrix corresponds to a word context vector for a subregion of the image.
6. The text-to-image generation method based on generation of countermeasure network according to claim 1, wherein the step 4) is specifically as follows:
4.1) inputting the image feature matrix into the first-layer generation countermeasure network to obtain an image feature matrix after primary optimization, and performing 3x3 convolution on the image feature matrix to output an image with 64 x 64 resolution;
4.2) inputting the image feature matrix and the word context matrix after the primary optimization into a second layer generation countermeasure network to obtain an image feature matrix after the secondary optimization, and performing 3x3 convolution on the image feature matrix to output an image with a resolution of 128 × 128;
4.3) adding an attention mechanism to the image feature matrix, strengthening key subregions of the image, weakening unimportant regions of the image, and updating a word context matrix by utilizing the step 3);
4.4) inputting the image feature matrix after the second optimization and the updated word context matrix into the third layer generation countermeasure network to obtain a final image feature matrix, and performing 3x3 convolution on the final image feature matrix to output an image with 256 × 256 resolution.
7. The text-to-image generation method based on generation of countermeasure network as claimed in claim 1, wherein in step 5), the local image feature matrix is obtained from the generated image by an image encoder; the image encoder utilizes the inclusion-v 3 model pre-trained on the ImageNet dataset, which is essentially a convolutional neural network.
8. The method for generating the text-to-image based on the generation countermeasure network as claimed in claim 1, wherein in step 6), the specific process of evaluating the similarity between the generated image and the text description is as follows:
6.1) adding an attention mechanism to the local image feature matrix, strengthening key subregions of the image and weakening unimportant regions of the image;
6.2) calculating cosine similarity of the optimized local image feature matrix and the word feature matrix, and evaluating the similarity of the text description and the generated image to assist in optimizing a generator in the generation countermeasure network.
CN202010046540.9A 2020-01-16 2020-01-16 Text-to-image generation method based on generation countermeasure network Active CN111260740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010046540.9A CN111260740B (en) 2020-01-16 2020-01-16 Text-to-image generation method based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010046540.9A CN111260740B (en) 2020-01-16 2020-01-16 Text-to-image generation method based on generation countermeasure network

Publications (2)

Publication Number Publication Date
CN111260740A true CN111260740A (en) 2020-06-09
CN111260740B CN111260740B (en) 2023-05-23

Family

ID=70950653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010046540.9A Active CN111260740B (en) 2020-01-16 2020-01-16 Text-to-image generation method based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN111260740B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111918071A (en) * 2020-06-29 2020-11-10 北京大学 Data compression method, device, equipment and storage medium
CN112348911A (en) * 2020-10-28 2021-02-09 山东师范大学 Semantic constraint-based method and system for generating fine-grained image by stacking texts
CN113191375A (en) * 2021-06-09 2021-07-30 北京理工大学 Text-to-multi-object image generation method based on joint embedding
CN113343705A (en) * 2021-04-26 2021-09-03 山东师范大学 Text semantic based detail preservation image generation method and system
CN113361250A (en) * 2021-05-12 2021-09-07 山东师范大学 Bidirectional text image generation method and system based on semantic consistency
CN113361251A (en) * 2021-05-13 2021-09-07 山东师范大学 Text image generation method and system based on multi-stage generation countermeasure network
CN113537416A (en) * 2021-09-17 2021-10-22 深圳市安软科技股份有限公司 Method and related equipment for converting text into image based on generative confrontation network
CN113674374A (en) * 2021-07-20 2021-11-19 广东技术师范大学 Chinese text image generation method and device based on generation type countermeasure network
CN113793404A (en) * 2021-08-19 2021-12-14 西南科技大学 Artificially controllable image synthesis method based on text and outline
CN113837229A (en) * 2021-08-30 2021-12-24 厦门大学 Knowledge-driven text-to-image generation method
WO2022007685A1 (en) * 2020-07-06 2022-01-13 Ping An Technology (Shenzhen) Co., Ltd. Method and device for text-based image generation
CN114078172A (en) * 2020-08-19 2022-02-22 四川大学 Text image generation method for progressively generating confrontation network based on resolution
CN114332288A (en) * 2022-03-15 2022-04-12 武汉大学 Method for generating text generation image of confrontation network based on phrase driving and network
CN115797495A (en) * 2023-02-07 2023-03-14 武汉理工大学 Method for generating image by text sensed by sentence-character semantic space fusion
CN116710910A (en) * 2020-12-29 2023-09-05 迪真诺有限公司 Design generating method based on condition generated by learning and device thereof
CN117095083A (en) * 2023-10-17 2023-11-21 华南理工大学 Text-image generation method, system, device and storage medium
CN117152370A (en) * 2023-10-30 2023-12-01 碳丝路文化传播(成都)有限公司 AIGC-based 3D terrain model generation method, system, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3040165A1 (en) * 2016-11-18 2018-05-24 Salesforce.Com, Inc. Spatial attention model for image captioning
CN110135441A (en) * 2019-05-17 2019-08-16 北京邮电大学 A kind of text of image describes method and device
CN110609891A (en) * 2019-09-18 2019-12-24 合肥工业大学 Visual dialog generation method based on context awareness graph neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3040165A1 (en) * 2016-11-18 2018-05-24 Salesforce.Com, Inc. Spatial attention model for image captioning
CN110135441A (en) * 2019-05-17 2019-08-16 北京邮电大学 A kind of text of image describes method and device
CN110609891A (en) * 2019-09-18 2019-12-24 合肥工业大学 Visual dialog generation method based on context awareness graph neural network

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111918071A (en) * 2020-06-29 2020-11-10 北京大学 Data compression method, device, equipment and storage medium
WO2022007685A1 (en) * 2020-07-06 2022-01-13 Ping An Technology (Shenzhen) Co., Ltd. Method and device for text-based image generation
CN114078172B (en) * 2020-08-19 2023-04-07 四川大学 Text image generation method for progressively generating confrontation network based on resolution
CN114078172A (en) * 2020-08-19 2022-02-22 四川大学 Text image generation method for progressively generating confrontation network based on resolution
CN112348911B (en) * 2020-10-28 2023-04-18 山东师范大学 Semantic constraint-based method and system for generating fine-grained image by stacking texts
CN112348911A (en) * 2020-10-28 2021-02-09 山东师范大学 Semantic constraint-based method and system for generating fine-grained image by stacking texts
CN116710910A (en) * 2020-12-29 2023-09-05 迪真诺有限公司 Design generating method based on condition generated by learning and device thereof
CN113343705A (en) * 2021-04-26 2021-09-03 山东师范大学 Text semantic based detail preservation image generation method and system
CN113343705B (en) * 2021-04-26 2022-07-05 山东师范大学 Text semantic based detail preservation image generation method and system
CN113361250A (en) * 2021-05-12 2021-09-07 山东师范大学 Bidirectional text image generation method and system based on semantic consistency
CN113361251A (en) * 2021-05-13 2021-09-07 山东师范大学 Text image generation method and system based on multi-stage generation countermeasure network
CN113191375A (en) * 2021-06-09 2021-07-30 北京理工大学 Text-to-multi-object image generation method based on joint embedding
CN113191375B (en) * 2021-06-09 2023-05-09 北京理工大学 Text-to-multi-object image generation method based on joint embedding
CN113674374A (en) * 2021-07-20 2021-11-19 广东技术师范大学 Chinese text image generation method and device based on generation type countermeasure network
CN113674374B (en) * 2021-07-20 2022-07-01 广东技术师范大学 Chinese text image generation method and device based on generation type countermeasure network
CN113793404A (en) * 2021-08-19 2021-12-14 西南科技大学 Artificially controllable image synthesis method based on text and outline
CN113837229B (en) * 2021-08-30 2024-03-15 厦门大学 Knowledge-driven text-to-image generation method
CN113837229A (en) * 2021-08-30 2021-12-24 厦门大学 Knowledge-driven text-to-image generation method
CN113537416A (en) * 2021-09-17 2021-10-22 深圳市安软科技股份有限公司 Method and related equipment for converting text into image based on generative confrontation network
CN114332288B (en) * 2022-03-15 2022-06-14 武汉大学 Method for generating text generation image of confrontation network based on phrase drive and network
CN114332288A (en) * 2022-03-15 2022-04-12 武汉大学 Method for generating text generation image of confrontation network based on phrase driving and network
CN115797495A (en) * 2023-02-07 2023-03-14 武汉理工大学 Method for generating image by text sensed by sentence-character semantic space fusion
CN117095083A (en) * 2023-10-17 2023-11-21 华南理工大学 Text-image generation method, system, device and storage medium
CN117095083B (en) * 2023-10-17 2024-03-15 华南理工大学 Text-image generation method, system, device and storage medium
CN117152370A (en) * 2023-10-30 2023-12-01 碳丝路文化传播(成都)有限公司 AIGC-based 3D terrain model generation method, system, equipment and storage medium
CN117152370B (en) * 2023-10-30 2024-02-02 碳丝路文化传播(成都)有限公司 AIGC-based 3D terrain model generation method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN111260740B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN111260740A (en) Text-to-image generation method based on generation countermeasure network
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN109344288B (en) Video description combining method based on multi-modal feature combining multi-layer attention mechanism
CN110263912B (en) Image question-answering method based on multi-target association depth reasoning
US11966839B2 (en) Auto-regressive neural network systems with a soft attention mechanism using support data patches
CN110163299B (en) Visual question-answering method based on bottom-up attention mechanism and memory network
CN109635883A (en) The Chinese word library generation method of the structural information guidance of network is stacked based on depth
CN109712108B (en) Visual positioning method for generating network based on diversity discrimination candidate frame
CN114119975A (en) Language-guided cross-modal instance segmentation method
CN111598183A (en) Multi-feature fusion image description method
CN115797495B (en) Method for generating image by sentence-character semantic space fusion perceived text
CN113140023B (en) Text-to-image generation method and system based on spatial attention
CN115222998B (en) Image classification method
CN116363261A (en) Training method of image editing model, image editing method and device
Agrawal et al. Image Caption Generator Using Attention Mechanism
CN112989843B (en) Intention recognition method, device, computing equipment and storage medium
CN117033609A (en) Text visual question-answering method, device, computer equipment and storage medium
CN116704506A (en) Cross-environment-attention-based image segmentation method
CN115171052B (en) Crowded crowd attitude estimation method based on high-resolution context network
Luhman et al. High fidelity image synthesis with deep vaes in latent space
WO2023154192A1 (en) Video synthesis via multimodal conditioning
CN116434058A (en) Image description generation method and system based on visual text alignment
Zhang et al. CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image
CN115512191A (en) Question and answer combined image natural language description method
Kasi et al. A Deep Learning Based Cross Model Text to Image Generation using DC-GAN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant