CN113052784A - Image generation method based on multiple auxiliary information - Google Patents
Image generation method based on multiple auxiliary information Download PDFInfo
- Publication number
- CN113052784A CN113052784A CN202110301738.1A CN202110301738A CN113052784A CN 113052784 A CN113052784 A CN 113052784A CN 202110301738 A CN202110301738 A CN 202110301738A CN 113052784 A CN113052784 A CN 113052784A
- Authority
- CN
- China
- Prior art keywords
- image
- information
- text
- stage
- scene graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000004927 fusion Effects 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- WURBVZBTWMNKQT-UHFFFAOYSA-N 1-(4-chlorophenoxy)-3,3-dimethyl-1-(1,2,4-triazol-1-yl)butan-2-one Chemical compound C1=NC=NN1C(C(=O)C(C)(C)C)OC1=CC=C(Cl)C=C1 WURBVZBTWMNKQT-UHFFFAOYSA-N 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 2
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 7
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 7
- 238000011160 research Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 241000195493 Cryptophyta Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Editing Of Facsimile Originals (AREA)
- Image Processing (AREA)
Abstract
The invention belongs to the field of image generation under computer vision tasks, and provides an image generation method based on multiple auxiliary information. The invention utilizes various auxiliary information to guide the model to complete the image generation task for the first time, the generation task is completed in two stages, the input of the model in the first stage is the fusion characteristic of scene graph information and text information, the scene graph information is taken as the main part, the text information is taken as the auxiliary part, and the GAN network model is taken as the prototype to generate the rough image; the model input of the second stage is text information and the output of the first stage, and the purpose is to enrich the image details by using the text information and generate a high-quality image. The invention uses the real data set to train and evaluate, and simultaneously compares with the current mainstream image generation model to evaluate the performance improvement.
Description
Technical Field
The invention belongs to the field of image generation under computer vision tasks, and relates to a method for guiding image generation based on participation of various auxiliary information.
Background
In daily production and life, such a scene is ubiquitous: the poster designer cannot well understand the description of the customer, so that the customer and the poster designer are in ineffective communication for a long time, and the efficiency is low; the witness witnesses in the case-out site can describe the patterns of the suspects, and the public security organization needs to obtain the patterns of the suspects according to the description of the witness and carry out case breaking; when a house is decorated, according to the description of an owner, if a decoration result graph can be seen quickly, the satisfaction degree of the owner on a decoration scheme can be greatly improved. In the past, people have many times of pursuing luxuriant of pictures and texts when needing aesthetic feelings, images can visually impact people, the meanings which cannot be described by characters can be shown, and the characters can show the beautiful characters which cannot be felt by sense by gorgeous word algae from the aspect of semantics. Only when the pictures and texts appear together, the omnibearing reading of a scene can be presented from different angles. However, in the actual life scene, text data and voice data are easy to obtain, and image data are difficult to obtain to a certain extent, so how to display the picture of text description by using the technical form of the emerging technology under the background that artificial intelligence continuously obtains new results is an important research direction for promoting production and improving life quality. In recent years, machine learning and deep learning are continuously developed and achieve more achievements in practical application, and the exploration and application of multi-modal learning become hot points of artificial intelligence gradually due to continuous progress in various fields. In the present academic field, the most widely studied is the interaction between images and characters, for example, one segment of characters is used as input, and the output is the image corresponding to the characters. The generation of images according to texts is a common application in a multi-modal learning task, the research can bring great driving force to the field of data intelligence, and the landing of the research can also bring great convenience to production and life.
At present, the mainstream image generation method only adopts single information to participate in the training process of the model. For example, the sg2im model instructs image generation using scene graph (scenegraph) information as an input of the model; mainstream models such as stackGAN and attnGAN are described in text to guide the models to generate images meeting requirements. sg2im provides that each object and the relation thereof in a text are modeled through a scene graph, a bounding box and a mask of each object in the semantics are obtained on the basis of obtaining the scene graph, so that a scene layout related to the text semantics is obtained, and then the scene layout is used as an input to be added into a subsequent GAN network to generate a picture. stackGAN uses two GANs to step through the image. Because the quality of generated pictures cannot be improved by simply adding up sampling in the network, a two-stage GAN network is proposed, wherein the first stage is used for generating low-precision (64x64) images, and the first stage mainly focuses on basic information such as backgrounds, colors, contours and the like of the images; in the second stage, the output of the first stage is used as input, and text embedding is used again, so that the detail information lost in the first stage is obtained, and 256 × 256 finer pictures are generated. Meanwhile, a CA (conditioning assessment) module is added into the method to add some practical random noise to the text features, so that the generated image has more variability. an attention mechanism is added to attnGAN, not only is the content feature of a text extracted as global constraint, but also word embedding extracted from the attribute to the word level is fed into a network as local constraint, and a generator and a discriminator are accurately optimized for the word embedding part each time, so that the generated image can highlight details in the text.
Disclosure of Invention
The method provided by the invention is based on the image generation of various auxiliary information, and the generated image is restored as truly as possible by extracting and fusing the characteristics of various information and fully utilizing all the auxiliary information. The method takes a scene graph and text description information as examples to introduce research contents.
The research goal of the task has two important aspects:
(1) extracting and fusing features: the input data of the task is a scene graph and text description, the scene graph provides the position relation of each object in the image, the text description provides the implementation details of each object, and efficient feature extraction and fusion of the input data are required to generate a high-quality image. The method aims to realize a high-quality feature fusion algorithm and retain the original information of two data as much as possible.
(2) Use of fusion features: the obtained fusion features greatly retain the original information of the data, the features are applied to layout generation, then mask generation is carried out, and finally the image is generated. The application links and application ways of the fusion features will be studied here, i.e. in which link how to add the feature can make the feature most useful, so that a satisfactory image is finally generated.
The technical scheme of the invention is as follows:
an image generation method based on multi-auxiliary information comprises the following steps:
step S1: aiming at scene graphs and text information, a current mainstream method is used for representation learning;
step S2: and a first stage of image generation, namely establishing a GAN network model, and training by taking the obtained scene graph and text information as model input. In the first stage, attention is paid to scene graph information, and a feature fusion algorithm module is designed, so that the scene graph information and text information can be fully utilized to assist in the training process of the model. A first stage of generating a rough graph meeting requirements;
step S3: in the second stage of image generation, features are processed before the generation model in the second stage is input. The input information of the second stage is the output image and text information of the first stage, and the process focuses on fully utilizing the text information;
the process of performing representation learning on the scene graph and the text information by using the currently mainstream method in step S1 is as follows:
step S11: embedding scene graph information by adopting a GCN network, training each scene graph, and finally obtaining vector representation of each object;
step S12: the text information is encoded by using a CNN-RNN textencoder, and the text description of each image is input into the model to obtain an embedded vector of each text description;
the specific steps of establishing the first-stage generation model in the step S2 are as follows:
step S21: performing feature fusion on the obtained scene graph information and text information, and guiding the image generation in the first stage by taking the scene graph information as a main part and the text information as an auxiliary part;
step S22: establishing an image generation model taking a GAN network as a prototype, wherein the image generation model comprises a generator and a discriminator, fusion characteristics are taken as model input, and output is a rough image with lower quality;
the specific steps of establishing the second-stage generation model in the step S3 are as follows:
step S31: processing the text information to enable the second stage to fully use the text information and capture more information;
step S32: and constructing an image generation model comprising a generator and a discriminator, taking the processed text information and the generated image in the first stage as input, and outputting a high-quality image.
The invention has the beneficial effects that: (1) the traditional image generation algorithm only uses single information for model training, and the invention guides image generation by means of various information; (2) the scene graph information is mainly used in the first stage of the task generation, the text information is used as assistance, the position relation of each object in the image can be mainly grasped by using the scene graph information, and the text information is used in the second stage, so that the details of the object can be further refined, and the quality of the image is improved.
Drawings
FIG. 1 is a block diagram of the overall module design of the present invention.
FIG. 2 is a design diagram of a multi-information fusion module according to the present invention.
FIG. 3 is a drawing of a text-information-guided two-stage generative model design of the present invention.
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
An image generation method based on multi-auxiliary information comprises the following steps:
step S1: extracting a COCO data set, and extracting a scene graph of each image according to the labeling information of each image to obtain a training set of scene graph data; extracting text information corresponding to the image by means of the labeling information of the image to obtain a corresponding text information training set;
step S11: firstly, initializing and embedding objects and relations in a scene graph to obtain an object initial matrix and a relation initial matrix, then inputting the initialized and embedded objects and relations into a GCN network to obtain an updated object matrix and relation matrix, realizing embedding of scene graph information and obtaining a scene graph vector matrix; the GCN network is formed by stacking five layers of convolution blocks, and each convolution block consists of a full connection layer, a Relu layer, a full connection layer and a Relu layer;
step S12: for the obtained text information, character embedding is carried out by using a char-CNN-RNN text encoder model, and the model consists of two parts: ConvAutoencorder for image feature extraction and CharEmbedding for obtaining text embedding; finally outputting the text embedded vector containing the image information;
step S2: the model structure of the first stage, the main structure is a generative confrontation network GAN, including a generator and a discriminator; performing feature fusion on the obtained scene graph vector matrix and the text embedding vector to obtain fusion features; the generator generates Gaussian distribution by the fusion characteristics through a full connection layer to obtain a condition variable, then the condition variable is spliced with random noise to be used as the input of the generator, and finally an image is generated through a group of up-sampling layers; compressing the text embedding vector by the discriminator, performing spatial repetition to obtain a characteristic tenor, inputting the image generated by the generator into a down-sampling layer to obtain an image tenor, and finally inputting the characteristic tenor and the image tenor into the convolutional layer to obtain a confidence score through a single-node full-connection layer;
step S21: the fusion of scene graph information and text information is realized, the scene graph information is taken as the main part, and the text information is taken as the auxiliary part; the text information passes through a full-connection layer lacking nodes, partial text information is reserved, and the text information is spliced with scene graph information;
step S22: obtaining condition variables from random samples in a Gaussian distributionConcatenating with randomly sampled noise z as input to train generator G0And a discriminator D0The objective function is as follows:
wherein, the real image I0And the characteristic input t is derived from the actual data distribution pdata,pzThe index is a quasi-normal distribution,is a word-embedded vector derived by a precoder, z denotes from pzOf randomly extracted noise, mu0Sum Σ0Is thatThe regularization parameter is lambda which is obtained by generating Gaussian distribution through the full connection layer;
step S3: the network model of the second stage also takes GAN as a main body and consists of a generator and a discriminator, the model input is a text embedded vector and an image generated in the first stage, and the high-resolution image is generated by emphasizing the use of text information in the second stage; the structure of the discriminator is roughly consistent with that of the discriminator in the first stage, and only the step length of the convolution layer on the input size is changed to be 2 times of that of the original convolution layer, so that the down-sampling layer is changed from 3 to 4; on the generator, generating Gaussian distribution by using a text embedding vector through a full connection layer to obtain a condition variable, then performing spatial repetition to obtain a characteristic tensor, simultaneously performing down sampling on the output of the first stage to obtain a characteristic tensor of 1 × 1, splicing the two characteristic tensors, and performing a series of residual blocks to obtain an image through up sampling;
step S31: the text description of each image is multiple, so that the obtained text embedding vectors are multiple, and one text embedding vector and the image generated in the first stage are selected as the input of the second-stage generator during each training; the discriminator reserves the image with the highest confidence score as a final image;
step S32: gaussian hidden variable of the second stageAnd the generator output of the first stageTrain generator G for input1And a discriminator D1The objective functions are respectively:
the above-mentioned expanding model using stackGAN as the baseline in step S2 and step S3 is only a preferred embodiment of the present invention, and all equivalent changes and modifications made according to the claimed scope of the present invention should be covered by the present invention.
Claims (1)
1. An image generation method based on multi-auxiliary information is characterized by comprising the following steps:
step S1: extracting a COCO data set, and extracting a scene graph of each image according to the labeling information of each image to obtain a training set of scene graph data; extracting text information corresponding to the image by means of the labeling information of the image to obtain a corresponding text information training set;
step S11: firstly, initializing and embedding objects and relations in a scene graph to obtain an object initial matrix and a relation initial matrix, then inputting the initialized and embedded objects and relations into a GCN network to obtain an updated object matrix and relation matrix, realizing embedding of scene graph information and obtaining a scene graph vector matrix; the GCN network is formed by stacking five layers of convolution blocks, and each convolution block consists of a full connection layer, a Relu layer, a full connection layer and a Relu layer;
step S12: for the obtained text information, character embedding is carried out by using a char-CNN-RNN text encoder model, and the model consists of two parts: ConvAutoencorder for image feature extraction and CharEmbedding for obtaining text embedding; finally outputting the text embedded vector containing the image information;
step S2: the model structure of the first stage, the main structure is a generative confrontation network GAN, including a generator and a discriminator; performing feature fusion on the obtained scene graph vector matrix and the text embedding vector to obtain fusion features; the generator generates Gaussian distribution by the fusion characteristics through a full connection layer to obtain a condition variable, then the condition variable is spliced with random noise to be used as the input of the generator, and finally an image is generated through a group of up-sampling layers; compressing the text embedding vector by the discriminator, performing spatial repetition to obtain a characteristic tenor, inputting the image generated by the generator into a down-sampling layer to obtain an image tenor, and finally inputting the characteristic tenor and the image tenor into the convolutional layer to obtain a confidence score through a single-node full-connection layer;
step S21: the fusion of scene graph information and text information is realized, the scene graph information is taken as the main part, and the text information is taken as the auxiliary part; the text information passes through a full-connection layer lacking nodes, partial text information is reserved, and the text information is spliced with scene graph information;
step S22: obtaining condition variables from random samples in a Gaussian distributionConcatenating with randomly sampled noise z as input to train generator G0And a discriminator D0The objective function is as follows:
wherein, the real image I0And the characteristic input t is derived from the actual data distribution pdata,pzThe index is a quasi-normal distribution,is a word-embedded vector derived by a precoder, z denotes from pzOf randomly extracted noise, mu0Sum Σ0Is thatThe regularization parameter is lambda which is obtained by generating Gaussian distribution through the full connection layer;
step S3: the network model of the second stage also takes GAN as a main body and consists of a generator and a discriminator, the model input is a text embedded vector and an image generated in the first stage, and the high-resolution image is generated by emphasizing the use of text information in the second stage; the structure of the discriminator is roughly consistent with that of the discriminator in the first stage, and only the step length of the convolution layer on the input size is changed to be 2 times of that of the original convolution layer, so that the down-sampling layer is changed from 3 to 4; on the generator, generating Gaussian distribution by using a text embedding vector through a full connection layer to obtain a condition variable, then performing spatial repetition to obtain a characteristic tensor, simultaneously performing down sampling on the output of the first stage to obtain a characteristic tensor of 1 × 1, splicing the two characteristic tensors, and performing a series of residual blocks to obtain an image through up sampling;
step S31: the text description of each image is multiple, so that the obtained text embedding vectors are multiple, and one text embedding vector and the image generated in the first stage are selected as the input of the second-stage generator during each training; the discriminator reserves the image with the highest confidence score as a final image;
step S32: gaussian hidden variable of the second stageAnd the generator output of the first stageTrain generator G for input1And a discriminator D1The objective functions are respectively:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110301738.1A CN113052784B (en) | 2021-03-22 | 2021-03-22 | Image generation method based on multiple auxiliary information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110301738.1A CN113052784B (en) | 2021-03-22 | 2021-03-22 | Image generation method based on multiple auxiliary information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113052784A true CN113052784A (en) | 2021-06-29 |
CN113052784B CN113052784B (en) | 2024-03-08 |
Family
ID=76514125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110301738.1A Active CN113052784B (en) | 2021-03-22 | 2021-03-22 | Image generation method based on multiple auxiliary information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052784B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113918754A (en) * | 2021-11-01 | 2022-01-11 | 中国石油大学(华东) | Image subtitle generating method based on scene graph updating and feature splicing |
CN116958766A (en) * | 2023-07-04 | 2023-10-27 | 阿里巴巴(中国)有限公司 | Image processing method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751698A (en) * | 2019-09-27 | 2020-02-04 | 太原理工大学 | Text-to-image generation method based on hybrid network model |
CN111340122A (en) * | 2020-02-29 | 2020-06-26 | 复旦大学 | Multi-modal feature fusion text-guided image restoration method |
CN111968193A (en) * | 2020-07-28 | 2020-11-20 | 西安工程大学 | Text image generation method based on StackGAN network |
-
2021
- 2021-03-22 CN CN202110301738.1A patent/CN113052784B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751698A (en) * | 2019-09-27 | 2020-02-04 | 太原理工大学 | Text-to-image generation method based on hybrid network model |
CN111340122A (en) * | 2020-02-29 | 2020-06-26 | 复旦大学 | Multi-modal feature fusion text-guided image restoration method |
CN111968193A (en) * | 2020-07-28 | 2020-11-20 | 西安工程大学 | Text image generation method based on StackGAN network |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113918754A (en) * | 2021-11-01 | 2022-01-11 | 中国石油大学(华东) | Image subtitle generating method based on scene graph updating and feature splicing |
CN116958766A (en) * | 2023-07-04 | 2023-10-27 | 阿里巴巴(中国)有限公司 | Image processing method |
CN116958766B (en) * | 2023-07-04 | 2024-05-14 | 阿里巴巴(中国)有限公司 | Image processing method and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113052784B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111858954B (en) | Task-oriented text-generated image network model | |
CN108875807A (en) | A kind of Image Description Methods multiple dimensioned based on more attentions | |
CN109934767A (en) | A kind of human face expression conversion method of identity-based and expressive features conversion | |
CN110544218B (en) | Image processing method, device and storage medium | |
AU2019202063A1 (en) | Synthesizing new font glyphs from partial observations | |
CN111508048A (en) | Automatic generation method for human face cartoon with interactive arbitrary deformation style | |
CN113052784A (en) | Image generation method based on multiple auxiliary information | |
CN108121975A (en) | A kind of face identification method combined initial data and generate data | |
CN111369646B (en) | Expression synthesis method integrating attention mechanism | |
CN113191375A (en) | Text-to-multi-object image generation method based on joint embedding | |
CN110097615B (en) | Stylized and de-stylized artistic word editing method and system | |
Huang et al. | C-Rnn: a fine-grained language model for image captioning | |
CN111368734A (en) | Micro expression recognition method based on normal expression assistance | |
Ye et al. | Multi-style transfer and fusion of image’s regions based on attention mechanism and instance segmentation | |
Sanjay et al. | Early Renaissance Art Generation Using Deep Convolutional Generative Adversarial Networks | |
Bie et al. | Facial expression recognition from a single face image based on deep learning and broad learning | |
CN115270917A (en) | Two-stage processing multi-mode garment image generation method | |
CN115278106A (en) | Deep face video editing method and system based on sketch | |
CN114677569A (en) | Character-image pair generation method and device based on feature decoupling | |
CN113268983A (en) | Role-oriented story ending generation method | |
Bagwari et al. | An edge filter based approach of neural style transfer to the image stylization | |
Zhang et al. | Calligraphy fonts generation based on generative ad-versarial networks | |
Kaddoura | Real-world applications | |
Wang et al. | Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head | |
Zhang et al. | Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |