CN111260740A - Text-to-image generation method based on generation countermeasure network - Google Patents
Text-to-image generation method based on generation countermeasure network Download PDFInfo
- Publication number
- CN111260740A CN111260740A CN202010046540.9A CN202010046540A CN111260740A CN 111260740 A CN111260740 A CN 111260740A CN 202010046540 A CN202010046540 A CN 202010046540A CN 111260740 A CN111260740 A CN 111260740A
- Authority
- CN
- China
- Prior art keywords
- image
- matrix
- word
- generation
- feature matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a text-to-image generation method based on generation of a countermeasure network, which comprises the following steps: 1) inputting a text description into a network, and generating a word feature matrix and a sentence feature vector according to the text description; 2) adding conditions and noise vectors to the sentence characteristic vectors to obtain an image characteristic matrix; 3) calculating a word context matrix of the image features; 4) calculating in the generation of the countermeasure network by utilizing the image characteristic matrix and the word context matrix, and gradually generating images with higher and higher resolutions in three stages; 5) acquiring a local image feature matrix according to the generated image; 6) and evaluating the similarity between the generated image and the text description, and optimizing the next image generation. The image generation method of the invention not only can ensure that the content of the generated image is consistent with the semantics of the text description, but also can ensure that the generated image has more optimized image details, can effectively improve the resolution of the generated image and increase the diversity of the generated image.
Description
Technical Field
The invention relates to the field of image generation, in particular to a text-to-image generation method based on a generation countermeasure network.
Background
Generating high resolution and realistic images based on textual descriptions is a very meaningful study. In industry, it not only provides help for deeper visual understanding for related research in the field of computer vision, but also has wide practical application. In academia, it has become one of the most popular research directions in the field of computer vision in recent years, with significant results. Recurrent Neural Networks (RNNs) and generative countermeasure networks (GANs) are often combined to generate realistic images based on natural language descriptions. These methods have been able to produce satisfactory results in certain fields, such as creating a fine image of a flower or bird.
The original GAN model contains a generator and an arbiter. The generator can generate samples distributed to real data through optimization, and therefore the purpose of deceiving the discriminator is achieved. The trained discriminator may separate the true data distribution samples from the spurious samples generated by the generator. The generator and the discriminator reach the optimum in the mutual game, so that the generated result is better and better.
While impressive results have been achieved, many challenges remain in training conditions to generate an antagonistic network. Most models tend to learn only one data distribution pattern, which tends to collapse, i.e., the generator will generate the same image each time. Although the image is sharp, it is unchanged. Another major challenge is that the training process is unstable and the losses obtained during the training process do not converge. In addition, most existing image generation methods focus on global sentence vectors, and useful fine-grained image features and word-level text information are ignored. Furthermore, in evaluating the generated image, it is not assumed that each sub-region of the image has a different effect on the overall image. Such a method would on the one hand hinder the generation of high quality images and on the other hand also reduce the diversity of the generated images. This problem becomes more severe as the scenes and objects that need to be generated are more complex.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a text-to-image generation method based on a generation countermeasure network, and can achieve the purposes of meeting the condition that the content of a generated image is consistent with the semantics of text description and enabling the generated image to have more optimized image details, effectively improving the resolution of the generated image and increasing the diversity of the image.
The purpose of the invention is realized by the following technical scheme:
a text-to-image generation method based on generation of a countermeasure network, comprising the steps of:
1) inputting a text description into a network, and generating a word feature matrix and a sentence feature vector according to the text description;
2) adding conditions and noise vectors to the sentence characteristic vectors to obtain an image characteristic matrix;
3) calculating a word context matrix of the image features;
4) calculating in the generation of the countermeasure network by utilizing the image characteristic matrix and the word context matrix, and gradually generating images with higher and higher resolutions in three stages;
5) acquiring a local image feature matrix according to the generated image;
6) and evaluating the similarity between the generated image and the text description, and optimizing the next image generation.
In the step 1), the text description is used for describing the attributes of more than one object, and two hidden states corresponding to each word in the text description are connected in series through a bidirectional long-term and short-term memory network to represent the semantics of the words; the attributes comprise type, size, number, shape and position; the last hidden state of the two hidden states is connected to obtain a global sentence vector, and the rest hidden states are connected in series to obtain a word feature matrix.
The step 2) is specifically as follows:
2.1) adding conditional forming conditional enhancement to the sentence feature vector to enhance the training data and avoid overfitting;
2.2) splicing the noise vector sampled from the standard normal distribution to obtain an image characteristic matrix.
In step 3), the word context matrix of the image features is calculated by using the image feature matrix obtained in step 2) and the word feature matrix obtained in step 1), and each column of the word context matrix of the image features represents a word context vector associated with a sub-region of the image.
The word context matrix of the image features is obtained by calculating the image feature matrix obtained in the step 2) and the word feature matrix obtained in the step 1), and specifically comprises the following steps:
firstly, converting the word characteristics into a public semantic space of image characteristics by adding a new sensor layer;
then calculating the weight of the jth sub-area of the image corresponding to the ith word: the method is obtained by the normalized calculation of the product of the image feature vector of the jth column (namely, a column vector of an image feature matrix) and the word feature vector of the ith column (namely, a column vector of a word feature matrix);
then, calculating the product sum of the weights of each word and the corresponding image sub-region to obtain a word context vector of the image sub-region; each column vector of the word feature matrix corresponds to a word context vector for a subregion of the image.
The step 4) is specifically as follows:
4.1) inputting the image feature matrix into the first-layer generation countermeasure network to obtain an image feature matrix after primary optimization, and performing 3x3 convolution on the image feature matrix to output an image with 64 x 64 resolution;
4.2) inputting the image feature matrix and the word context matrix after the primary optimization into a second layer generation countermeasure network to obtain an image feature matrix after the secondary optimization, and performing 3x3 convolution on the image feature matrix to output an image with a resolution of 128 × 128;
4.3) adding an attention mechanism to the image feature matrix, strengthening key subregions of the image, weakening unimportant regions of the image, and updating a word context matrix by utilizing the step 3);
4.4) inputting the image feature matrix after the second optimization and the updated word context matrix into the third layer generation countermeasure network to obtain a final image feature matrix, and performing 3x3 convolution on the final image feature matrix to output an image with 256 × 256 resolution.
In step 5), the local image feature matrix is obtained according to the generated image and is completed through an image encoder; the image encoder utilizes the inclusion-v 3 model pre-trained on the ImageNet dataset, which is essentially a convolutional neural network.
In step 6), the specific process of evaluating the similarity between the generated image and the text description is as follows:
6.1) adding an attention mechanism to the local image feature matrix, strengthening key subregions of the image and weakening unimportant regions of the image;
6.2) calculating cosine similarity of the optimized local image feature matrix and the word feature matrix, and evaluating the similarity of the text description and the generated image to assist in optimizing a generator in the generation countermeasure network.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention adopts an attention mechanism, and the central idea is to distinguish information of a plurality of parts and add attention of different degrees to different parts so as to attach importance to the information which needs to be focused. Based on the method, the text-to-image generation method based on the generation countermeasure network is provided to pay more attention to the key area of the generated image, so that the image with richer and richer details is generated through multiple stages.
In the conventional text-to-image generation method, when a training condition generates a confrontation network, most of the existing methods focus on a global sentence vector, and useful image features with fine-grained details and word-level text information are ignored. Also, in evaluating the quality of the generated image, it is neglected that each sub-region of the image has a different effect on the whole image. These methods may result in less important sub-regions in the image (e.g., background regions of the image) being of excessive interest, and some fine-grained details that need to be continually optimized being ignored. In contrast, the present invention provides a generation countermeasure network with an added image attention mechanism, which generates a higher resolution and more detailed image by focusing on optimizing the generation effect of important sub-regions of the image, i.e., more attention on the important sub-regions and rich-content sub-regions of the image when generating the image.
Drawings
Fig. 1 is an architecture diagram of a text-to-image generation method based on a generation countermeasure network according to the present invention.
Fig. 2 is a flow chart of a text-to-image generation method based on generation of a countermeasure network according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1 and 2, a text-to-image generation method based on generation of a confrontation network includes the following steps:
1) a meaningful text description is input into the network, and the text description can be a description of attributes representative of the type, size, number, color, shape, position and the like of one or more entity objects. The two hidden states corresponding to each word in the text description are concatenated to represent the word's semantics by using a bi-directional long-short term memory network (bi-directional LSTM). The global sentence vector is obtained by connecting the last hidden state, and the word feature matrix is obtained by connecting the other hidden states in series.
2) Acquiring an image characteristic matrix, wherein the specific process is as follows:
2.1) adding condition formation condition enhancement to the obtained sentence feature vector to enhance training data and avoid overfitting;
and 2.2) splicing the condition enhancement and the noise vector sampled from the standard normal distribution to obtain an image characteristic matrix.
3) And calculating a word context matrix of the image features by using the image feature matrix obtained in the step 2) and the word feature matrix obtained in the step 1), wherein each column of the matrix represents a word context vector associated with a sub-region of the image.
4) And calculating and optimizing an image characteristic matrix by utilizing the three layers of generation countermeasure networks to generate an image. The specific operation of each layer network is as follows:
4.1) inputting the image feature matrix into the first-layer generation countermeasure network to obtain an image feature matrix after primary optimization, and performing 3x3 convolution on the image feature matrix to output an image with 64 x 64 resolution;
4.2) inputting the image feature matrix and the word context matrix after the primary optimization into a second layer generation countermeasure network to obtain an image feature matrix after the secondary optimization, and performing 3x3 convolution on the image feature matrix to output an image with a resolution of 128 × 128;
4.3) adding an attention mechanism to the image feature matrix, strengthening key subregions of the image, weakening unimportant regions of the image, and updating the word context matrix by utilizing the step 3;
4.4) inputting the image feature matrix after the second optimization and the updated word context matrix into the third layer generation countermeasure network to obtain a final image feature matrix, and performing 3x3 convolution on the final image feature matrix to output an image with 256 × 256 resolution.
5) The generated high resolution image is mapped to a local image feature matrix using the inclusion-v 3 model pre-trained on the ImageNet dataset as the image encoder. The image encoder is essentially a convolutional neural network.
6) And evaluating the similarity of the generated image and the text description, wherein the specific process is as follows:
6.1) adding an attention mechanism to the local image feature matrix, strengthening key subregions of the image and weakening unimportant regions of the image;
6.2) calculating cosine similarity of the optimized local image feature matrix and the word feature matrix, and evaluating the similarity of the text description and the generated image to assist in optimizing a generator in the generation countermeasure network.
In summary, after the scheme is adopted, the invention provides a new method for the process of generating the text to the image, and the image is generated by using the generation countermeasure network added with the attention mechanism, so that the content of the generated image is ensured to be consistent with the semantics of the text description, the generated image can be ensured to have more optimized image details, the resolution of the generated image can be effectively improved, and the diversity of the generated image is increased.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (8)
1. A text-to-image generation method based on generation of a confrontation network is characterized by comprising the following steps:
1) inputting a text description into a network, and generating a word feature matrix and a sentence feature vector according to the text description;
2) adding conditions and noise vectors to the sentence characteristic vectors to obtain an image characteristic matrix;
3) calculating a word context matrix of the image features;
4) calculating in the generation of the countermeasure network by utilizing the image characteristic matrix and the word context matrix, and gradually generating images with higher and higher resolutions in three stages;
5) acquiring a local image feature matrix according to the generated image;
6) and evaluating the similarity between the generated image and the text description, and optimizing the next image generation.
2. The method for generating text-to-image based on generation countermeasure network as claimed in claim 1, wherein in step 1), the text description is a description of attributes of more than one object, and two hidden states corresponding to each word in the text description are concatenated through a bidirectional long-short term memory network to represent the semantic of the word; the attributes comprise type, size, number, shape and position; the last hidden state of the two hidden states is connected to obtain a global sentence vector, and the rest hidden states are connected in series to obtain a word feature matrix.
3. The text-to-image generation method based on generation of an countermeasure network according to claim 1, wherein the step 2) is specifically as follows:
2.1) adding conditional forming conditional enhancement to the sentence feature vector to enhance the training data and avoid overfitting;
2.2) splicing the noise vector sampled from the standard normal distribution to obtain an image characteristic matrix.
4. The text-to-image generation method based on generation of countermeasure network as claimed in claim 1, wherein in step 3), the word context matrix of the image feature is calculated by using the image feature matrix obtained in step 2) and the word feature matrix obtained in step 1), and each column of the word context matrix of the image feature represents a word context vector associated with a sub-region of the image.
5. The method for generating texts to images based on generation of countermeasure networks according to claim 4, wherein the word context matrix of the image features is calculated by using the image feature matrix obtained in step 2) and the word feature matrix obtained in step 1), specifically:
firstly, converting the word characteristics into a public semantic space of image characteristics by adding a new sensor layer;
then calculating the weight of the jth sub-area of the image corresponding to the ith word: the method is obtained by the normalized calculation of the product of the image feature vector of the jth column and the word feature vector of the ith column;
then, calculating the product sum of the weights of each word and the corresponding image sub-region to obtain a word context vector of the image sub-region; each column vector of the word feature matrix corresponds to a word context vector for a subregion of the image.
6. The text-to-image generation method based on generation of countermeasure network according to claim 1, wherein the step 4) is specifically as follows:
4.1) inputting the image feature matrix into the first-layer generation countermeasure network to obtain an image feature matrix after primary optimization, and performing 3x3 convolution on the image feature matrix to output an image with 64 x 64 resolution;
4.2) inputting the image feature matrix and the word context matrix after the primary optimization into a second layer generation countermeasure network to obtain an image feature matrix after the secondary optimization, and performing 3x3 convolution on the image feature matrix to output an image with a resolution of 128 × 128;
4.3) adding an attention mechanism to the image feature matrix, strengthening key subregions of the image, weakening unimportant regions of the image, and updating a word context matrix by utilizing the step 3);
4.4) inputting the image feature matrix after the second optimization and the updated word context matrix into the third layer generation countermeasure network to obtain a final image feature matrix, and performing 3x3 convolution on the final image feature matrix to output an image with 256 × 256 resolution.
7. The text-to-image generation method based on generation of countermeasure network as claimed in claim 1, wherein in step 5), the local image feature matrix is obtained from the generated image by an image encoder; the image encoder utilizes the inclusion-v 3 model pre-trained on the ImageNet dataset, which is essentially a convolutional neural network.
8. The method for generating the text-to-image based on the generation countermeasure network as claimed in claim 1, wherein in step 6), the specific process of evaluating the similarity between the generated image and the text description is as follows:
6.1) adding an attention mechanism to the local image feature matrix, strengthening key subregions of the image and weakening unimportant regions of the image;
6.2) calculating cosine similarity of the optimized local image feature matrix and the word feature matrix, and evaluating the similarity of the text description and the generated image to assist in optimizing a generator in the generation countermeasure network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010046540.9A CN111260740B (en) | 2020-01-16 | 2020-01-16 | Text-to-image generation method based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010046540.9A CN111260740B (en) | 2020-01-16 | 2020-01-16 | Text-to-image generation method based on generation countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111260740A true CN111260740A (en) | 2020-06-09 |
CN111260740B CN111260740B (en) | 2023-05-23 |
Family
ID=70950653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010046540.9A Active CN111260740B (en) | 2020-01-16 | 2020-01-16 | Text-to-image generation method based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111260740B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111918071A (en) * | 2020-06-29 | 2020-11-10 | 北京大学 | Data compression method, device, equipment and storage medium |
CN112348911A (en) * | 2020-10-28 | 2021-02-09 | 山东师范大学 | Semantic constraint-based method and system for generating fine-grained image by stacking texts |
CN113191375A (en) * | 2021-06-09 | 2021-07-30 | 北京理工大学 | Text-to-multi-object image generation method based on joint embedding |
CN113343705A (en) * | 2021-04-26 | 2021-09-03 | 山东师范大学 | Text semantic based detail preservation image generation method and system |
CN113361250A (en) * | 2021-05-12 | 2021-09-07 | 山东师范大学 | Bidirectional text image generation method and system based on semantic consistency |
CN113361251A (en) * | 2021-05-13 | 2021-09-07 | 山东师范大学 | Text image generation method and system based on multi-stage generation countermeasure network |
CN113537416A (en) * | 2021-09-17 | 2021-10-22 | 深圳市安软科技股份有限公司 | Method and related equipment for converting text into image based on generative confrontation network |
CN113674374A (en) * | 2021-07-20 | 2021-11-19 | 广东技术师范大学 | Chinese text image generation method and device based on generation type countermeasure network |
CN113793404A (en) * | 2021-08-19 | 2021-12-14 | 西南科技大学 | Artificially controllable image synthesis method based on text and outline |
CN113837229A (en) * | 2021-08-30 | 2021-12-24 | 厦门大学 | Knowledge-driven text-to-image generation method |
WO2022007685A1 (en) * | 2020-07-06 | 2022-01-13 | Ping An Technology (Shenzhen) Co., Ltd. | Method and device for text-based image generation |
CN114078172A (en) * | 2020-08-19 | 2022-02-22 | 四川大学 | Text image generation method for progressively generating confrontation network based on resolution |
CN114332288A (en) * | 2022-03-15 | 2022-04-12 | 武汉大学 | Method for generating text generation image of confrontation network based on phrase driving and network |
CN115797495A (en) * | 2023-02-07 | 2023-03-14 | 武汉理工大学 | Method for generating image by text sensed by sentence-character semantic space fusion |
CN116710910A (en) * | 2020-12-29 | 2023-09-05 | 迪真诺有限公司 | Design generating method based on condition generated by learning and device thereof |
CN117095083A (en) * | 2023-10-17 | 2023-11-21 | 华南理工大学 | Text-image generation method, system, device and storage medium |
CN117152370A (en) * | 2023-10-30 | 2023-12-01 | 碳丝路文化传播(成都)有限公司 | AIGC-based 3D terrain model generation method, system, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3040165A1 (en) * | 2016-11-18 | 2018-05-24 | Salesforce.Com, Inc. | Spatial attention model for image captioning |
CN110135441A (en) * | 2019-05-17 | 2019-08-16 | 北京邮电大学 | A kind of text of image describes method and device |
CN110609891A (en) * | 2019-09-18 | 2019-12-24 | 合肥工业大学 | Visual dialog generation method based on context awareness graph neural network |
-
2020
- 2020-01-16 CN CN202010046540.9A patent/CN111260740B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3040165A1 (en) * | 2016-11-18 | 2018-05-24 | Salesforce.Com, Inc. | Spatial attention model for image captioning |
CN110135441A (en) * | 2019-05-17 | 2019-08-16 | 北京邮电大学 | A kind of text of image describes method and device |
CN110609891A (en) * | 2019-09-18 | 2019-12-24 | 合肥工业大学 | Visual dialog generation method based on context awareness graph neural network |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111918071A (en) * | 2020-06-29 | 2020-11-10 | 北京大学 | Data compression method, device, equipment and storage medium |
WO2022007685A1 (en) * | 2020-07-06 | 2022-01-13 | Ping An Technology (Shenzhen) Co., Ltd. | Method and device for text-based image generation |
CN114078172B (en) * | 2020-08-19 | 2023-04-07 | 四川大学 | Text image generation method for progressively generating confrontation network based on resolution |
CN114078172A (en) * | 2020-08-19 | 2022-02-22 | 四川大学 | Text image generation method for progressively generating confrontation network based on resolution |
CN112348911B (en) * | 2020-10-28 | 2023-04-18 | 山东师范大学 | Semantic constraint-based method and system for generating fine-grained image by stacking texts |
CN112348911A (en) * | 2020-10-28 | 2021-02-09 | 山东师范大学 | Semantic constraint-based method and system for generating fine-grained image by stacking texts |
CN116710910A (en) * | 2020-12-29 | 2023-09-05 | 迪真诺有限公司 | Design generating method based on condition generated by learning and device thereof |
CN113343705A (en) * | 2021-04-26 | 2021-09-03 | 山东师范大学 | Text semantic based detail preservation image generation method and system |
CN113343705B (en) * | 2021-04-26 | 2022-07-05 | 山东师范大学 | Text semantic based detail preservation image generation method and system |
CN113361250A (en) * | 2021-05-12 | 2021-09-07 | 山东师范大学 | Bidirectional text image generation method and system based on semantic consistency |
CN113361251A (en) * | 2021-05-13 | 2021-09-07 | 山东师范大学 | Text image generation method and system based on multi-stage generation countermeasure network |
CN113191375A (en) * | 2021-06-09 | 2021-07-30 | 北京理工大学 | Text-to-multi-object image generation method based on joint embedding |
CN113191375B (en) * | 2021-06-09 | 2023-05-09 | 北京理工大学 | Text-to-multi-object image generation method based on joint embedding |
CN113674374A (en) * | 2021-07-20 | 2021-11-19 | 广东技术师范大学 | Chinese text image generation method and device based on generation type countermeasure network |
CN113674374B (en) * | 2021-07-20 | 2022-07-01 | 广东技术师范大学 | Chinese text image generation method and device based on generation type countermeasure network |
CN113793404A (en) * | 2021-08-19 | 2021-12-14 | 西南科技大学 | Artificially controllable image synthesis method based on text and outline |
CN113837229B (en) * | 2021-08-30 | 2024-03-15 | 厦门大学 | Knowledge-driven text-to-image generation method |
CN113837229A (en) * | 2021-08-30 | 2021-12-24 | 厦门大学 | Knowledge-driven text-to-image generation method |
CN113537416A (en) * | 2021-09-17 | 2021-10-22 | 深圳市安软科技股份有限公司 | Method and related equipment for converting text into image based on generative confrontation network |
CN114332288B (en) * | 2022-03-15 | 2022-06-14 | 武汉大学 | Method for generating text generation image of confrontation network based on phrase drive and network |
CN114332288A (en) * | 2022-03-15 | 2022-04-12 | 武汉大学 | Method for generating text generation image of confrontation network based on phrase driving and network |
CN115797495A (en) * | 2023-02-07 | 2023-03-14 | 武汉理工大学 | Method for generating image by text sensed by sentence-character semantic space fusion |
CN117095083A (en) * | 2023-10-17 | 2023-11-21 | 华南理工大学 | Text-image generation method, system, device and storage medium |
CN117095083B (en) * | 2023-10-17 | 2024-03-15 | 华南理工大学 | Text-image generation method, system, device and storage medium |
CN117152370A (en) * | 2023-10-30 | 2023-12-01 | 碳丝路文化传播(成都)有限公司 | AIGC-based 3D terrain model generation method, system, equipment and storage medium |
CN117152370B (en) * | 2023-10-30 | 2024-02-02 | 碳丝路文化传播(成都)有限公司 | AIGC-based 3D terrain model generation method, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111260740B (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111260740A (en) | Text-to-image generation method based on generation countermeasure network | |
CN108875807B (en) | Image description method based on multiple attention and multiple scales | |
CN109344288B (en) | Video description combining method based on multi-modal feature combining multi-layer attention mechanism | |
CN110263912B (en) | Image question-answering method based on multi-target association depth reasoning | |
US11966839B2 (en) | Auto-regressive neural network systems with a soft attention mechanism using support data patches | |
CN110163299B (en) | Visual question-answering method based on bottom-up attention mechanism and memory network | |
CN109635883A (en) | The Chinese word library generation method of the structural information guidance of network is stacked based on depth | |
CN109712108B (en) | Visual positioning method for generating network based on diversity discrimination candidate frame | |
CN114119975A (en) | Language-guided cross-modal instance segmentation method | |
CN111598183A (en) | Multi-feature fusion image description method | |
CN115797495B (en) | Method for generating image by sentence-character semantic space fusion perceived text | |
CN113140023B (en) | Text-to-image generation method and system based on spatial attention | |
CN115222998B (en) | Image classification method | |
CN116363261A (en) | Training method of image editing model, image editing method and device | |
Agrawal et al. | Image Caption Generator Using Attention Mechanism | |
CN112989843B (en) | Intention recognition method, device, computing equipment and storage medium | |
CN117033609A (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN116704506A (en) | Cross-environment-attention-based image segmentation method | |
CN115171052B (en) | Crowded crowd attitude estimation method based on high-resolution context network | |
Luhman et al. | High fidelity image synthesis with deep vaes in latent space | |
WO2023154192A1 (en) | Video synthesis via multimodal conditioning | |
CN116434058A (en) | Image description generation method and system based on visual text alignment | |
Zhang et al. | CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image | |
CN115512191A (en) | Question and answer combined image natural language description method | |
Kasi et al. | A Deep Learning Based Cross Model Text to Image Generation using DC-GAN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |