CN115775284A - Network architecture method for generating image by multi-path text in stages - Google Patents

Network architecture method for generating image by multi-path text in stages Download PDF

Info

Publication number
CN115775284A
CN115775284A CN202211505806.7A CN202211505806A CN115775284A CN 115775284 A CN115775284 A CN 115775284A CN 202211505806 A CN202211505806 A CN 202211505806A CN 115775284 A CN115775284 A CN 115775284A
Authority
CN
China
Prior art keywords
image
stage
generation
feature
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211505806.7A
Other languages
Chinese (zh)
Inventor
俞俊
沈铭
丁佳骏
刘贝利
范梦婷
杨苏杭
赵天宁
陈盛款
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202211505806.7A priority Critical patent/CN115775284A/en
Publication of CN115775284A publication Critical patent/CN115775284A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a novel grading multi-path network architecture of a text generation image based on residual learning and multi-scale learning, which is used for improving the extraction of different-scale features of the image, generating an image with fine granularity of details and improving the generation effect of a cross-modal task of the text generation image. The invention provides a new and improved generation countermeasure neural network architecture to improve the definition of image generation. The characteristic diagram formed by the information of the adjacent stage information and the text information is directly transmitted to the tail of the current stage by using the staged residual connection to participate in the image generation of the current stage, so that the requirement of long-time storage is avoided, and the generation effect of the current stage is improved. The multi-scale learning utilizes a plurality of parallel paths with different convolution kernel sizes to extract the features of the input image, and properly integrates feature maps from different spaces to obtain higher-quality features and fine-grained text details.

Description

Network architecture method for generating image by multi-path text in stages
Technical Field
The invention relates to the field of Residual Learning (Residual Learning) and Multi-scale Learning (Multi-scale Learning), in particular to a staged Multi-path (Multi-path) network architecture method for text-generated images (T2I Synthesis).
Background
Text-to-image (T2I) refers to the generation of an image that correctly reflects the semantics of a Text description, a challenging task that links vision and language. Due to its great potential in many application fields, such as interactive artistic creation and computer-aided painting, it has become one of the most popular studies in multi-model learning tasks, drawing considerable attention, and has important applications in many fields, such as image editing, computer-aided design, electronic games, etc.
Most existing methods solve this problem by using a generation countermeasure network (GAN) which has a strong image generation capability. The images generated by the T2I task also need to match the textual description, which uses a conditional generation countermeasure network (cGAN), which is conditionally constrained to both natural language descriptions and noise, rather than starting directly from noise.
The original T2I model GAN-INT-CLS can only generate images of 64 × 64 resolution. To generate higher resolution images, stack-GAN was proposed, successfully generating 256 × 256 resolution images, after which Stack GAN-v2, improved based on Stack-GAN, showed more stable training behavior and further improved the quality of the generated images. The architecture and performance of the model have improved over the last few years, and the quality and resolution of the image have improved significantly. In generating high resolution images, attnGAN and DM-GAN build correlations between sub-regions of the generated low resolution images and words in the text description based on the model of attention. RiFe-GAN and RiFeGAN2 select and refine compatible candidate texts from a priori knowledge using an attention-based text matching model. The OP-GAN architecture is particularly focused on single objects while generating a background that fits the overall image description. Bridge-GAN creates a transitional space with interpretable tokens as a Bridge to connect text and images.
Since he jemmy et al proposed ResNet, residual connection has been a basic structure in deep networks. The rain stripe layer modeling task can be converted into a simulation task through a residual error learning method, so that the optimization difficulty is relieved, and a better background image is reconstructed. DnCNN successfully implicitly removed the potential clean image in the hidden layer through residual learning. Through a Depth Residual Learning (DRL) network, the non-linear mapping from the input blurred image to the output deblurred image can be directly estimated.
Multi-scale learning has shown effectiveness in computer vision. The basic idea of the strategy is to fuse features with different resolutions and enhance the representation capability of the neural network. The multi-scale residual error network (MSRN) utilizes convolution kernels with different sizes to construct a multi-scale residual error block, and the reconstruction capability is enhanced. Multi-scale dense cross-connect networks (MDCNs), further take advantage of the features of previous layers. Against a multi-path residual network (AMPRN), features can be aggregated from different paths, enhancing the flow and gradient of information through a large number of hop connections.
In the T2I task of high resolution image generation, a multi-stage framework is widely adopted. The StackGAN model first defines a two-stage model of two cascaded GANs (as shown in fig. 1) and successfully generates realistic high-resolution images (256 × 256). Their successor StackGAN-v2 then further improves the architecture by means of a tree-like multi-stage structure. On a multi-stage framework basis, a Symmetric Distillation Network (SDN) can deliver hierarchical knowledge unimpeded, with the DA-GAN translating each word into a sub-region of the image. AttnGAN introduces a text-level visual attention mechanism, captures fine-grained image-text correlations at a later stage, and refines the image to a high-resolution image, with each word in the input sentence having a different level of information describing the image content. And a dynamic memory component is introduced in the DM-GAN later stage, so that even if the initial image is not generated well, a high-quality image can be generated, and a more vivid image is successfully generated.
However, existing models still have some limitations and drawbacks. These models typically first generate a low resolution image with the original shape and color, and then at a later stage generate a realistic high resolution image. Due to similar loss constraints and inheritance of feature information, the low resolution images generated in the first stage and the high resolution images generated in the subsequent stages always have homogeneous features in a coarse to fine scheme. Subsequent stages must repeat the construction of the entire image, including the original shape and color of the object, which were generated by the previous adjacent stages. In this case, the stages following the model must retain most of the feature information details of the input.
Disclosure of Invention
The invention provides a method based on Residual Learning (Residual Learning) and multi-scale Learning
The Multi-scale image generation method based on the Multi-scale matching comprises the steps of (Multi-scale Learning) and generating a staged Multi-path network architecture of a text generation image (T2I Synthesis), wherein the staged Multi-path network architecture is used for improving the characteristics of different scales of an extracted image, generating an image with finer details and improving the generation effect of a cross-modal task of the text generation image. The invention provides a new and improved generation countermeasure neural network architecture to improve the definition of image generation. The feature graph formed by the information of the adjacent stage information and the text information is directly transmitted to the tail of the current stage by using the staged residual connection, so that the feature information details of the image generated in the previous stage can be reserved and participate in the image generation of the current stage, the requirement on long-time storage is avoided, the details of the generated image are absorbed in the current layer for modification and supplement, and the generation efficiency of the current stage is improved. The multi-scale learning utilizes a plurality of parallel paths with different convolution kernel sizes to extract the features of the input image, and properly integrates feature maps from different spaces to obtain features with higher quality and fine-grained details. Experiments on a plurality of models and data sets show that a multi-path model architecture consisting of staged residual connection and multi-scale modules can effectively improve the performance of text generated images and the quality of generated images.
A network architecture method for generating images by multi-path texts in stages comprises
Adding a multi-stage residual error learning mechanism to a generation type countermeasure network of the multi-stage framework;
the multi-stage residual learning mechanism is represented as:
Figure BDA0003968991620000031
image h of i-1 stage generated in previous stage i-1 And also the feature f of its fine sentence wo (h i-1 W) directly moving to the end of the i-th stage, fusing the features f learned from the feature extraction module i Participating in the image generation of the i stage;
replacing the convolution filter in each stage with a multi-scale module;
the mathematical expression of the multiscale module is as follows:
Figure BDA0003968991620000032
wherein
Figure BDA0003968991620000033
Is the p-th path module of the extracted feature,
Figure BDA0003968991620000034
respectively, are the outputs of the corresponding paths,
Figure BDA0003968991620000035
representing the feature map generated for each path, stitched, and then passed through a feature fusion block
Figure BDA0003968991620000036
Selecting proper characteristics in each path and carrying out self-adaptive fusion; use of
Figure BDA0003968991620000037
And fusing the feature maps.
Preferably, the multi-scale module comprises: a multi-scale path and feature fusion block FFB;
the multi-scale path includes: three parallel paths;
the feature fusion block uses a 3 × 3 convolution layer, and the mathematical expression is as follows:
Figure BDA0003968991620000041
wherein
Figure BDA0003968991620000042
Finger ECA model, fuse i One filter number is C i Convolution layer with convolution kernel size of 3.
After the features of the method of the invention are added, the overall flow of image generation is as follows:
inputting a Text information of natural language for describing image, and obtaining a sentence feature vector (sense feature) f through a pre-trained Text Encoder (Text Encoder) s And word feature matrices (word features).
Sentence characteristic f s Spliced with an N (0, 1) normally distributed noise vector z, it is passed through a full connection layer (FC with resipe) and then an F consisting of 4 upsampled layers (upsampling) 0 Module to obtain h 0 =F 0 (z,f s (s)), by means of a first generator G 0 An original image G with a resolution of 64 × 64 × 3 will be obtained 0 (h 0 ). The following modules will use our proposed Multhe ti-Path Structure in Multi-stage frame completes the detail supplement and resolution improvement of the image.
At h i (i =0, 1) before passing into the module of the next stage, the feature vector of the word is passed through F each time wo (Word Operation), and f obtained wo And h i (i =0,1) spliced vector H i (i =0,1) residual block divided into paths of three scales (1 × 1,3 × 3,5 × 5)
Figure BDA0003968991620000043
Synthesizing hidden layer feature maps of different sizes
Figure BDA0003968991620000044
Passing feature maps of different sizes through a blending module (Fusion Block)
Figure BDA0003968991620000045
And H i (i =0,1) residual concatenation followed by a 3 × 3 upsampling layer yields h i (i =1, 2), by means of a generator G i (i =1,2) generates images of higher resolution of 128 × 128 × 3 and 256 × 256 × 3.
The invention has the following beneficial effects that in order to enable the subsequent stage to be more concentrated on the details with rich fine granularity and improve the quality of the finally generated image, the invention provides a novel stage multi-path structure frame confrontation generation network architecture from the text to the image. The framework utilizes the phase residual connection to reserve the detail of the characteristic information of the image generated in the previous phase and transmits the detail to the image generation process in the current phase, thereby avoiding the requirement of long-time storage. The other part of the structure is a multi-scale module, and 3 parallel paths with different convolution kernels are used for extracting input features to generate an image with finer details. Experimental results show that the structure can enable the network to focus on modifying the details of the generated image, so that clues with higher quality can be obtained for fine-grained detail reasoning. Compared with a benchmark model, the image generation architecture adopting the staged multi-path text can obtain higher performance.
Drawings
FIG. 1 is a multi-stage network architecture that is widely used in the task of text generation images.
Fig. 2 is an improved structure of T2I residual learning and multi-scale module proposed by the present invention.
Fig. 3 is an overall framework of the staged multipath structure T2I model proposed by the present invention.
Detailed Description
The technical solution of the present invention is further specifically described below by way of specific examples in conjunction with the accompanying drawings.
The invention aims to concentrate on enriching fine-grained details in the subsequent generation stage of the image, improve the quality of the finally generated image and provide a new multi-path structure aiming at a multi-stage frame. Fig. 3 shows the framework of the T2I model using our proposed multipath structure. This multipath structure is mainly composed of two parts. The detail of the feature information of the image generated in the adjacent stage before the transfer is reserved and transferred by the staged residual connection, so that the requirement of long-time memory is avoided, and the generation efficiency of the current stage is improved. The multi-scale module extracts input features from three parallel paths with different convolution kernel sizes, so that the features with higher quality can be obtained, and better fine-grained details can be generated.
Residual learning and multi-scale mechanisms are commonly used strategies in underlying image processing tasks such as image super-resolution, image derailment and defogging, but such effective mechanisms are rarely employed in GAN models. Experimental results show that by adopting the strategies of residual error learning and multi-scale learning, the task of generating the image by the text can obtain considerable performance improvement. Images generated by the multi-path structure model are more vivid than the corresponding baseline model because they can not only be generated by extracting abundant and diverse feature maps through different paths, but also the generation efficiency is reduced by the proposed phase residual connection.
A network architecture for staged multi-path text generation of images, comprising two improved structures (shown in fig. 2):
structure one, multi-stage residual join
For a generative countermeasure network with a multi-stage framework, assume that it has m stages:
{(F i ,U psamp le i ,C i )|i=0,l,...,m-1}
will hide the state (h) 0 ,h 1 ,...,h m-1 ) Generating as input images from small to large sizes
Figure BDA0003968991620000061
Mathematically, its forward propagation process can be expressed in the form:
Figure BDA0003968991620000062
z is a noise vector, usually sampled from a gaussian distribution, s is a global sentence vector, and w is a word vector matrix from a text encoder such as LSTM or DAMSM. Function f s (. Cndot.) denotes a vector operation on a sentence, e.g. Conditional amplification, f wo (. Cndot.) represents operations on words, such as the attention model of AttnGAN and dynamic memory operations in DMGAN. F op (. Cndot.) represents operations on the input feature map and text features of previous adjacent stages, such as joint convolution in stackGAN-v2 and concatenation in AttnGAN. F is to be i (·)、U psamplei (. Cndot.) and G (. Cndot.) are modeled as neural networks.
The first stage has outlined the general shape and color of the object. The goal of the subsequent stages of the network is to gradually correct and enrich the fine-grained features and then generate a high-resolution realistic image. In the coarse-to-fine image generation model architecture, a low-resolution image generated in a first stage and a high-resolution image generated in a subsequent stage share similar information to a great extent so as to improve the generation efficiency of the subsequent stage, and the scheme introduces a multi-stage residual error learning mechanism and can be expressed in the following mathematical form:
Figure BDA0003968991620000063
image h of i-1 stage generated in previous stage i-1 And also the feature f of its fine sentence wo (h i-1 W) directly moving to the end of the i-th stage, fusing the features f learned from the feature extraction module i And participating in the image generation of the ith stage. The multi-stage residual learning join avoids the need for long-term memory, and the i-th stage layer can focus on modifying and supplementing the details of the generated image. The input to the first stage is highly abstract text semantics whose semantic feature mapping is largely inconsistent with the modality of the image. Therefore, the multi-stage residual learning mechanism is not suitable for the image generation applied to the first stage.
Structure two, multi-scale module
In a typical T2I synthesis, only one path always uses a convolution filter with a constant kernel size of 3 × 3 to extract features, resulting in a network that cannot fully exploit information from different aspects. To further exploit the current stage input, including previously generated image and text features, we add a multi-scale module.
The mathematical expression can be written as follows:
Figure BDA0003968991620000071
wherein
Figure BDA0003968991620000072
Is the p-th path module of the extracted feature,
Figure BDA0003968991620000073
respectively, are the outputs of the corresponding paths,
Figure BDA0003968991620000074
representing the feature map generated for each path, spliced, and passed through a feature fusion block
Figure BDA0003968991620000075
And selecting proper characteristics in each path and performing adaptive fusion. Use of
Figure BDA0003968991620000076
The multi-scale module provided by the fusion feature diagram mainly comprises a multi-scale path and a Feature Fusion Block (FFB). Multi-scale modules are explored, utilizing information in different aspects. Specifically, the section consists of three parallel paths, each with two ResBlocks. This method uses larger (5 × 5) and smaller (1 × 1) filters to extract features from different spatial angles compared to the original module. The path with the larger initial convolution kernel is used for extracting global structure characteristics, and the path with the smaller size of the convolution kernel is used for acquiring local detail information. The feature fusion block integrates all features from the multi-scale path together in a cascaded manner. Then, using Efficient Channel Attentions (ECAs), which is a very lightweight attentive mechanism module, weights can be calculated to redistribute the importance of the feature maps of each channel to select the appropriate information. Finally, a 3 x 3 convolutional layer is used to adaptively fuse the feature maps of each path.
The mathematical representation is as follows:
Figure BDA0003968991620000077
finger ECA model, fuse i Is one filter number of C i Convolution layer with convolution kernel size of 3.
By applying these mechanisms, the model will benefit from rich features of different scales, and can obtain high-quality image features of finer granularity.
Example 1
The experimental methods of the present invention and the detailed parameters and details thereof are further described below.
(1) Text sentence feature and word feature extraction
Using the data sets CUB-200 and Oxford-102, the CUB-200 data set contained 11788 images of 200 bird categories, 150 of which (8,855 images) were used for training and the remaining 50 (2,933 images) were used for testing. Oxford-102 contains pictures of 8,189 flowers from 102 different categories, 7,034 of which were used for training and 1,155 of which were used for testing. There are 10 text descriptions for each image in the CUB-200 dataset and the Oxford-102 dataset.
(2) Text sentence feature and word feature extraction
Feature extraction is performed on natural language text descriptions in the dataset, and features are extracted from the text using a pre-trained bi-directional long short term memory network (BilSTM). In a two-way long-short term memory network, each word corresponds to two hidden states, one for each direction. Thus, its two hidden states are connected as semantic information for a word. Finally, a word feature matrix e belonging to R is obtained D×T Wherein the ith column vector e of the matrix i Representing the characteristics of the ith word, D =256 representing the dimension of the characteristics of the word, T =25 representing the number of the words, and connecting the hidden states of the last layer of the bidirectional long-short term memory network as the global sentence characteristics
Figure BDA0003968991620000081
(3) Building improved networks for staged multi-path text-generated images
By adopting the AttnGAN as a reference model, the multi-stage stacked network increases the resolution of an image by stacking a generator and a discriminator, and generates an image with richer details. For the generator of the model, given random noise z-N (0, 1) and a condition variable C, the dimensions are respectively 100 and 256, and an improved text generation image neural network architecture is constructed according to the third diagram, so that images with the resolution of 64 × 64, 128 × 128 and 256 × 256 can be generated in multiple stages.
(4) Establishment of loss function
The joint condition generation and the unconditional generation are used for training the anti-neural network together, and the objective function of the model comprises two items, namely unconditional loss and conditional loss. Arbiter D of the ith stage i The loss of (a) is defined as follows:
Figure BDA0003968991620000091
corresponding ith stage generator G i The loss of (a) also consists of two part losses:
Figure BDA0003968991620000092
wherein x i Is the actual image, s, of the corresponding text description in the data set i Is generator G i The generated false image.
(5) Model training
And alternately training the arbiter and the generator in a training process according to the loss function. The relevant training parameters are set as follows: training epoch is 800, batch size is 20, adam optimizer, discriminator and generator initial learning rate is 2e-4.
The generator model is fixed during the training of the discriminator, and the gradient information is only transmitted on the discriminator; gradient information is transmitted from the discriminator to the generator during generator training, but the model of the discriminator is not subjected to gradient updating, and only parameters of a generator network are optimized. And finally, updating model parameters through a Back-Propagation (BP) algorithm until the model converges.
After training, the stored generator model can generate a corresponding high-resolution image according to the specified text description, and the values of the evaluation indexes FID and IS can be calculated by using the mean value and covariance of the generated image, so that the performance of the model can be quantized.
(6) Evaluating the index of the model
The Inclusion Score (IS) and Frechet Inclusion Distance (FID) metrics were used to quantify the performance of the invention. Each model generated 30,000 images for the CUB-200 dataset and 11,550 images for the Oxford-102 dataset, provided that a given text description from the unseen test set. The IS IS defined by the KL-divergence between the conditional distribution and the edge class distribution, calculated using a pre-trained inclusion v3 network. A large IS means that the generated model outputs highly diverse images of all classes and each image explicitly belongs to a particular class, the higher IS value the better the quality of the generated image. The FID calculates the Frechet distance between the synthetic image and the real image according to the features extracted from the previously trained Incepotion v3 network. A lower FID means that the generated image distribution is closer to the real image distribution, and the lower the FID value, the better the generated image quality.

Claims (2)

1. A network architecture method for generating images from staged multi-path texts is characterized by comprising
Adding a multi-stage residual error learning mechanism to a generation type countermeasure network of the multi-stage framework;
the multi-stage residual learning mechanism is represented as:
Figure FDA0003968991610000011
image h of i-1 stage generated in previous stage i-1 And also the feature f of its fine sentence wo (h i-1 W) directly moving to the end of the i-th stage, fusing the features f learned from the feature extraction module i Participating in the generation of the image in the i stage;
replacing the convolution filter in each stage with a multi-scale module;
the mathematical expression of the multiscale module is as follows:
Figure FDA0003968991610000012
wherein
Figure FDA0003968991610000013
Is the p-th path module of the extracted feature,
Figure FDA0003968991610000014
respectively, are the outputs of the corresponding paths,
Figure FDA0003968991610000015
representing the feature map generated for each path, stitched, and then passed through a feature fusion block
Figure FDA0003968991610000016
Selecting proper characteristics in each path and carrying out self-adaptive fusion;
use of
Figure FDA0003968991610000017
And fusing the feature maps.
2. The method of claim 1, wherein the multi-scale module comprises: a multi-scale path and feature fusion block FFB;
the multi-scale path includes: three parallel paths;
the feature fusion block uses a 3 × 3 convolution layer, and the mathematical expression is as follows:
Figure FDA0003968991610000018
wherein
Figure FDA0003968991610000019
Finger ECA model, fuse i One filter number is C i Convolution layer with convolution kernel size of 3.
CN202211505806.7A 2022-11-29 2022-11-29 Network architecture method for generating image by multi-path text in stages Pending CN115775284A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211505806.7A CN115775284A (en) 2022-11-29 2022-11-29 Network architecture method for generating image by multi-path text in stages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211505806.7A CN115775284A (en) 2022-11-29 2022-11-29 Network architecture method for generating image by multi-path text in stages

Publications (1)

Publication Number Publication Date
CN115775284A true CN115775284A (en) 2023-03-10

Family

ID=85390723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211505806.7A Pending CN115775284A (en) 2022-11-29 2022-11-29 Network architecture method for generating image by multi-path text in stages

Country Status (1)

Country Link
CN (1) CN115775284A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863032A (en) * 2023-06-27 2023-10-10 河海大学 Flood disaster scene generation method based on generation countermeasure network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863032A (en) * 2023-06-27 2023-10-10 河海大学 Flood disaster scene generation method based on generation countermeasure network
CN116863032B (en) * 2023-06-27 2024-04-09 河海大学 Flood disaster scene generation method based on generation countermeasure network

Similar Documents

Publication Publication Date Title
Han et al. A survey on visual transformer
Robert et al. Hybridnet: Classification and reconstruction cooperation for semi-supervised learning
Zhang et al. Accurate and fast image denoising via attention guided scaling
CN108615036A (en) A kind of natural scene text recognition method based on convolution attention network
CN113361250A (en) Bidirectional text image generation method and system based on semantic consistency
Ghorbani et al. Probabilistic character motion synthesis using a hierarchical deep latent variable model
CN117521672A (en) Method for generating continuous pictures by long text based on diffusion model
Gendy et al. Lightweight image super-resolution based on deep learning: State-of-the-art and future directions
CN113486890A (en) Text detection method based on attention feature fusion and cavity residual error feature enhancement
CN116721334B (en) Training method, device, equipment and storage medium of image generation model
CN113140020A (en) Method for generating image based on text of countermeasure network generated by accompanying supervision
Cui et al. Representation and correlation enhanced encoder-decoder framework for scene text recognition
CN115775284A (en) Network architecture method for generating image by multi-path text in stages
Sun et al. Second-order encoding networks for semantic segmentation
CN115330620A (en) Image defogging method based on cyclic generation countermeasure network
CN117788629B (en) Image generation method, device and storage medium with style personalization
Qiao et al. Tell me where i am: Object-level scene context prediction
Liu et al. Survey on gan‐based face hallucination with its model development
CN117058673A (en) Text generation image model training method and system and text generation image method and system
CN114332565A (en) Method for generating image by generating confrontation network text based on distribution estimation condition
Selva Castelló A comprehensive survey on deep future frame video prediction
Chen et al. Y-Net: Dual-branch joint network for semantic segmentation
Chen et al. SCPA‐Net: Self‐calibrated pyramid aggregation for image dehazing
Luhman et al. High fidelity image synthesis with deep vaes in latent space
CN112465929A (en) Image generation method based on improved graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination