CN109671125B - Highly-integrated GAN network device and method for realizing text image generation - Google Patents

Highly-integrated GAN network device and method for realizing text image generation Download PDF

Info

Publication number
CN109671125B
CN109671125B CN201811542578.4A CN201811542578A CN109671125B CN 109671125 B CN109671125 B CN 109671125B CN 201811542578 A CN201811542578 A CN 201811542578A CN 109671125 B CN109671125 B CN 109671125B
Authority
CN
China
Prior art keywords
unit
convolution
network
generator
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811542578.4A
Other languages
Chinese (zh)
Other versions
CN109671125A (en
Inventor
宋井宽
陈岱渊
高联丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201811542578.4A priority Critical patent/CN109671125B/en
Publication of CN109671125A publication Critical patent/CN109671125A/en
Application granted granted Critical
Publication of CN109671125B publication Critical patent/CN109671125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the field of deep learning, and discloses a highly-integrated GAN network device and a method for realizing text image generation, which solve the problems of small size, low quality and unstable network training process of generated images in the prior art, and effectively realize the generation of clear high-quality semantic images from input texts. The highly converged GAN network device of the present invention comprises: the system comprises a text compiler, a condition adding module, a generator and three independent discriminators; based on the highly fused GAN network device, a high-quality RGB image matching text semantic information can be generated under the condition of only one generator and three independent discriminators. In order to further optimize the network structure of the generator, feature maps with different sizes generated by the network middle layer are fully utilized, and the generator adopts a pyramid network structure to generate high-dimensional 256 × 256 features with rich semantic information from low-dimensional 64 × 64 features besides residual generation blocks in a residual network.

Description

Highly-integrated GAN network device and method for realizing text image generation
Technical Field
The invention relates to the field of deep learning, in particular to a highly-integrated GAN network device and a method for realizing text image generation.
Background
Although the generation of pictures from texts has more application scenes in real life, such as image compilation, cross-modal data generation and the like, only a few relevant researches on the task exist at the present stage. Initially, a GAN network is used as a basic network structure in a text image generation method, the size of a generated image is small, and the image quality is low, for example, GAN-INT-CLS [1] can only generate 64 × 64 images; in order to increase the size, the subsequent method adopts multiple GAN networks to train in steps, but these networks usually have complex network structures and high requirements on computing hardware, resulting in complex and long time consuming network training process, such as StackGAN [2], stackGAN + + [3], attnGAN [4] these schemes are divided into two steps to train two deep networks separately, not only one end-to-end network, but also complexity is increased and the whole training process is very unstable.
Reference:
[1]Reed,Scott,et al.2016.Generative adversarial text to image synthesis.arXiv preprint arXiv:1605.05396
[2]Zhang,H.;Xu,T.;Li,H.;Zhang,S.;Huang,X.;Wang,X.;and Metaxas,D.2017a.Stackgan:Text to photorealistic image synthesis with stacked generative adversarial networks.arXiv preprint
[3]Zhang,H.;Xu,T.;Li,H.;Zhang,S.;Wang,X.;Huang,X.;and Metaxas,D.2017b.Stackgan++:Realistic image synthesis with stacked generative adversarial networks.arXiv:1710.10916.
[4]Xu,T.,Zhang,P.,Huang,Q.,Zhang,H.,Gan,Z.,Huang,X.,&He,X.2017.Attngan:Fine-grained text to image generation with attentional generative adversarial networks.arXiv preprint.
disclosure of Invention
The technical problem to be solved by the invention is as follows: the highly-integrated GAN network device and the method for realizing the text image generation solve the problems of small size, low quality and unstable network training process of the generated image in the traditional technology, and effectively realize the generation of clear high-quality semantic images from input texts.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a highly converged GAN network device, comprising: the system comprises a text compiler, a condition adding module, a generator and three independent discriminators;
the text compiler is used for outputting compiled feature expressions to the texts input into the text compiler;
the condition increasing module is used for sampling a condition characteristic expression with a certain dimensionality from the compiled characteristic expression output by the text compiler, splicing the condition characteristic expression with noise in a channel dimensionality and inputting the spliced condition characteristic expression into a generator network;
the generator comprises a full-connection layer, seven sequentially-connected residual generation blocks connected with the full-connection layer, and three accumulated generation blocks which are sequentially connected and are correspondingly connected with the last three residual generation blocks one by one;
the full connection layer is used for performing feature dimension increasing on the features output by the condition increasing module and converting the shapes of the features into 4-dimensional features;
the residual generation block is used for generating features of different sizes;
the accumulation generating block is used for fusing the features with different sizes by adopting a pyramid network structure so as to generate RGB images with different sizes;
the three independent discriminators are connected with the three accumulation generating blocks of the generator in a one-to-one correspondence manner and are used for discriminating the quality of the RGB images with different sizes output by the generator and transmitting discrimination results back to the generator; the way of returning the discrimination result to the generator is as follows: the generated RGB images with different sizes are respectively input into corresponding independent discriminators, the quality of the images is discriminated through a loss function limited in the independent discriminators, the gradient of the images is calculated and then is transmitted to the whole generator network through backward propagation, and parameters of the independent discriminators and the whole generator network are updated.
As a further optimization, a perception loss function is set in the generator and used for improving semantic consistency and image diversity of the generated images.
For further optimization, each discriminator is provided with a matching pairwise loss function for discriminating whether the generated image is semantically matched with the input characters or not and a local image loss function for discriminating whether the generated image is locally real or not, and the last discriminator is also provided with a category information loss function for classifying the generated image.
As a further optimization, the residual generation block comprises an upsampling block, two 3*3 convolution units and an accumulator; the input signal of the up-sampling block is connected with the output signal of the first-stage residual error generation block; the output signal of the up sampling block is connected with one input signal of the accumulator, and the output signal of the up sampling block is connected with the other input signal of the accumulator after being sequentially subjected to convolution operation of two 3*3 convolution units.
As a further optimization, the accumulation generation block comprises a 1*1 convolution unit, an upsampling block, two 3*3 convolution units and an accumulator; the input signal of the 1*1 convolution unit is connected with the output signal of the residual generation block; the output signal of the 1*1 convolution unit is connected with the input signal of the upper sampling block; the output signal of the up-sampling block and the output signal of the previous-stage accumulation generation block are connected with two input signals of an accumulator; the output signal of the accumulator outputs higher dimensional characteristics through a 3*3 convolution unit, the output signal of the 3*3 convolution unit is connected with the input signal of another 3*3 convolution unit, and an RGB image is output through the other 3*3 convolution unit.
As a further optimization, the three independent discriminators comprise: a first discriminator, a second discriminator, and a third discriminator.
As a further optimization, the first and second discriminators each include: the system comprises a multilayer convolution network unit, a first 4*4 convolution unit, a second 4*4 convolution unit, a first full connection layer, a stereo copying unit, a channel splicing unit and a 1*1 convolution unit; the input signal of the multilayer convolution network unit is connected with the RGB image output by the accumulation generation block; the output signal of the multilayer convolution network unit is connected with one input signal of the channel splicing unit and is connected to the local image loss function through the first 4*4 convolution unit; the input signal of the first full-connection layer is connected with the text characteristic expression output by the text compiler; the output signal of the first full connection layer is subjected to stereo copy through a stereo copy unit and then is connected with the other input signal of the channel splicing unit; and the output signal of the channel splicing unit is connected to the matched pairwise loss function after being subjected to convolution operation of a 1*1 convolution unit and a second 4*4 convolution unit in sequence.
As a further optimization, the third discriminator includes: the system comprises a multilayer convolution network unit, a first 4*4 convolution unit, a second 4*4 convolution unit, a third 4*4 convolution unit, a first full connection layer, a second full connection layer, a stereo copying unit, a channel splicing unit and an 1*1 convolution unit; the input signal of the multilayer convolution network unit is connected with the RGB image output by the accumulation generation block; the output signal of the multilayer convolution network unit is connected with one input signal of the channel splicing unit and is connected to the local image loss function through the first 4*4 convolution unit; the input signal of the first full-connection layer is connected with the text characteristic expression output by the text compiler; the output signal of the first full connection layer is subjected to stereo copy through a stereo copy unit and then is connected with the other input signal of the channel splicing unit; output signals of the channel splicing units are connected to the matched pairwise loss function after being subjected to convolution operation of the 1*1 convolution unit and the second 4*4 convolution unit in sequence; and the output signal of 1*1 convolution unit is connected to the class information loss function through the third 4*4 convolution unit and the second full connection layer.
In addition, the invention also provides a method for realizing text image generation based on the highly integrated GAN network device, which comprises the following steps:
inputting a text into a trained text compiler, and outputting a compiled feature expression;
sampling a condition characteristic expression with a certain dimensionality by adopting a condition increasing module, splicing the condition characteristic expression with noise in a channel dimensionality, and inputting the spliced condition characteristic expression into a generator network;
in a generator network, feature dimension raising is carried out through a full connection layer, the shape of the feature is transformed to 4-dimensional features, and then the feature is input into seven continuous residual error generating blocks; inputting the characteristics of different dimensionalities output by the last three residual generation blocks into corresponding accumulative generation blocks, and outputting RGB images of different sizes through convolution operation of the accumulative generation blocks;
the generated RGB images with different sizes are respectively input into corresponding independent discriminators, the quality of the images is discriminated through a loss function limited in the independent discriminators, the gradient of the images is calculated and then is propagated to the whole generator network in a backward mode, and parameters of the independent discriminators and the whole generator network are updated.
As a further optimization, the determining the quality of the image by the loss function limited in the independent determiner specifically includes: judging whether the generated image is semantically matched with the input characters or not through the limited matching pairwise loss function in the three independent judgers, and judging whether the generated image is locally real or not through the limited local image loss function; in addition, for the last independent discriminator of the three independent discriminators, the generated images are also classified by a restricted class information loss function.
The invention has the beneficial effects that:
1) By using a pyramid network structure with fused features for reference, intermediate features generated in a deep network are effectively utilized to generate high-quality image features which are more matched with text semantics.
2) The structure of a generator of the GAN network is optimized by effectively utilizing the perception loss function, and the semantic information of the image features is enriched.
3) Various discriminant loss functions are effectively utilized, such as: the pair loss function, the local image loss function and the classification loss function are adopted, so that the structure of the GAN discriminator is optimized, the discrimination capability is improved, and the quality of the generated image is further improved.
4) By adopting the GAN network device structure provided by the invention, the training process can be stabilized, and the training time is reduced.
Drawings
FIG. 1 is a schematic diagram of a highly converged GAN network architecture in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a residual generation block;
FIG. 3 is a schematic diagram of the structure of an accumulative total block;
fig. 4 is a schematic structural diagram of the discriminator.
Detailed Description
The invention aims to provide a highly-integrated GAN network device and a method for realizing text image generation, solves the problems of small size, low quality and unstable network training process of the generated image in the prior art, and effectively realizes the generation of clear and high-quality semantic images from input texts.
The core idea of the invention is as follows: in order to reduce the training cost as much as possible, a highly unified and structured GAN network device is designed, so that the device can still generate high-quality RGB images matched with text semantic information under the condition of only one generator and three independent discriminators. In order to further optimize the network structure of the generator, feature graphs of different sizes generated by the network middle layer are fully utilized, the generator adopts a pyramid network structure to generate high-dimensional 256 × 256 features rich in semantic information step by step from low-dimensional 64 × 64 features except for residual generation blocks in a residual network.
The embodiment is as follows:
as shown in fig. 1, the highly converged GAN network device in this embodiment includes: the system comprises a text compiler, a condition adding module, a generator and three independent discriminators;
the text compiler is used for outputting compiled feature expressions to the texts input into the text compiler;
the condition increasing module is used for sampling condition characteristic expression with a certain dimensionality from compiled characteristic expression output by the text compiler, splicing the condition characteristic expression with noise in channel dimensionality and inputting the condition characteristic expression into a generator network;
the generator comprises a full-connection layer, seven sequentially-connected residual generation blocks connected with the full-connection layer, and three accumulated generation blocks which are sequentially connected and are correspondingly connected with the last three residual generation blocks one by one;
the full connection layer is used for performing feature dimension increasing on the features output by the condition increasing module and converting the shape to 4-dimensional features;
the residual generation block is used for generating features of different sizes;
the accumulation generating block is used for fusing the features with different sizes by adopting a pyramid network structure so as to generate RGB images with different sizes;
the three independent discriminators are connected with the three accumulation generating blocks of the generator in a one-to-one correspondence mode and used for discriminating the quality of the RGB images with different sizes output by the generator and transmitting discrimination results back to the generator.
In a specific implementation, each size of feature is first generated by a residual generation block. As shown in fig. 2, a residual generation block comprises an upsampling block, two 3*3 convolution units and an accumulator; the input signal of the up-sampling block is connected with the output signal of the first-stage residual error generation block; the output signal of the up sampling block is connected with one input signal of the accumulator, and the output signal of the up sampling block is connected with the other input signal of the accumulator after being sequentially subjected to convolution operation of two 3*3 convolution units.
In order to enrich the feature expression of each size, the invention provides an accumulative generation block, and features of different sizes are fused by adopting a pyramid network structure. As shown in fig. 3, one accumulation generation block includes one 1*1 convolution unit, one upsampling block, two 3*3 convolution units, and one accumulator; the input signal of the 1*1 convolution unit is connected with the output signal of the residual generation block; the output signal of the 1*1 convolution unit is connected with the input signal of the upsampling block; the output signal of the up-sampling block and the output signal of the previous-stage accumulation generation block are connected with two input signals of an accumulator; the output signal of the accumulator outputs higher dimensional characteristics through a 3*3 convolution unit, the output signal of the 3*3 convolution unit is connected with the input signal of another 3*3 convolution unit, and an RGB image is output through the other 3*3 convolution unit.
The structure of the discriminator in the invention is shown in fig. 4, wherein the broken line frame part is a unique structure of the third discriminator, and the rest parts are structures of the three discriminators; each of the discriminators includes: the system comprises a multilayer convolution network unit, a first 4*4 convolution unit, a second 4*4 convolution unit, a first full connection layer, a stereo copying unit, a channel splicing unit and a 1*1 convolution unit; the input signal of the multilayer convolution network unit is connected with the RGB image output by the accumulation generation block; the output signal of the multilayer convolution network unit is connected with one input signal of the channel splicing unit and is connected to the local image loss function through the first 4*4 convolution unit; the input signal of the first full-connection layer is connected with the text feature expression output by the text compiler; the output signal of the first full connection layer is subjected to stereo copy through a stereo copy unit and then is connected with the other input signal of the channel splicing unit; and the output signal of the channel splicing unit is connected to the matched pairwise loss function after being subjected to convolution operation of a 1*1 convolution unit and a second 4*4 convolution unit in sequence.
In the training process, we observe that the generated image has less difference to different objects, and in order to further increase the difference degree, we only limit the classification information loss function to the classifier (the third classifier) with 256 × 256 large-size images in this embodiment. Thus, the third discriminator may have the following configuration in addition to the above configuration: a second full link layer and a third 4*4 convolution unit; 5363 the output signal of the 1*1 convolution unit is connected to the class information loss function via a third 4*4 convolution unit and a second fully connected layer.
In addition, we also improve semantic consistency and image diversity of the generated images by limiting the perceptual loss function to the generator.
Based on the highly converged GAN network device in the embodiment, the present invention also provides a method for generating an image by using the device, which comprises the following implementation steps:
step 1: inputting the text into a trained text compiler and outputting the compiled feature expression. But the feature dimension is higher at this time, which is not beneficial to network learning to accurate mapping, a condition increasing module is adopted to sample out condition feature expression with proper dimension, and then the condition feature expression is spliced with noise in channel dimension and input into a generator network. In which the condition addition module is constructed based on the Variational Auto-Encoder (VAE) theory, in order to make the random distribution constructed by the condition variables close enough to the standard gaussian distribution, we limit the KL divergence loss function to the condition addition module.
And 2, step: the method comprises the steps of performing feature dimension raising through a full connection layer, converting the feature dimension and the shape of the feature into 4-dimensional features, inputting the features into 7 continuous residual generation blocks, respectively inputting 64 × 64, 128 × 128 and 256 × 256 features with different dimensions into an accumulation generation block in order to fully enhance feature expressions with different dimensions, and outputting RGB images with different dimensions through convolution operation.
The perceptual loss function is limited for images of 256 × 256 size, whose gradient is computed to update the entire generator network by back propagation.
And step 3: to ensure the quality of the image for each size, a separate discriminator follows for each size. All sizes are constrained to match the pairwise loss function, the local image loss function, except that the 256 by 256 image additionally constrains the categorical information loss function. In the process of forward propagation, the invention can generate RGB images with three different sizes at one time, output the RGB images to corresponding independent discriminators, enable the discriminators to discriminate whether semantic matching is performed between the generated images and input characters based on a matched pair loss function, discriminate whether the generated images are local and real based on a local image loss function, and generate the images based on classification information loss function classification. In the back propagation process, the three discriminators calculate the gradients and then transmit the gradients back to the whole generator together, and parameters of the three independent discriminators and the whole generator network are updated.

Claims (7)

1. A highly converged GAN network apparatus, comprising: the system comprises a text compiler, a condition adding module, a generator and three independent discriminators;
the text compiler is used for outputting compiled feature expressions to the texts input into the text compiler;
the condition increasing module is used for sampling condition characteristic expression with a certain dimensionality from compiled characteristic expression output by the text compiler, splicing the condition characteristic expression with noise in channel dimensionality and inputting the condition characteristic expression into a generator network;
the generator comprises a full-connection layer, seven sequentially-connected residual generation blocks connected with the full-connection layer, and three accumulated generation blocks which are sequentially connected and are correspondingly connected with the last three residual generation blocks one by one;
the full connection layer is used for performing feature dimension increasing on the features output by the condition increasing module and converting the shape to 4-dimensional features;
the residual generation block is used for generating features of different sizes; the residual error generating block comprises an up-sampling block, two 3*3 convolution units and an accumulator; the input signal of the up-sampling block is connected with the output signal of the upper-stage residual error generation block; the output signal of the up-sampling block is connected with one input signal of the accumulator, and the output signal of the up-sampling block is connected with the other input signal of the accumulator after being subjected to convolution operation of two 3*3 convolution units in sequence;
the accumulation generating block is used for fusing the features with different sizes by adopting a pyramid network structure so as to generate RGB images with different sizes; the accumulation generating block comprises a 1*1 convolution unit, an up sampling block, two 3*3 convolution units and an accumulator; the input signal of the 1*1 convolution unit is connected with the output signal of the residual generation block; the output signal of the 1*1 convolution unit is connected with the input signal of the upsampling block; the output signal of the up-sampling block and the output signal of the previous-stage accumulation generation block are connected with two input signals of an accumulator; the output signal of the accumulator outputs higher dimensional characteristics through a 3*3 convolution unit, the output signal of the 3*3 convolution unit is connected with the input signal of another 3*3 convolution unit, and an RGB image is output through the other 3*3 convolution unit;
the three independent discriminators are connected with the three accumulation generating blocks of the generator in a one-to-one correspondence manner and are used for discriminating the quality of the RGB images with different sizes output by the generator and returning discrimination results to the generator; each discriminator is provided with a matching pairwise loss function for discriminating whether the generated image is semantically matched with the input characters or not and a local image loss function for discriminating whether the generated image is locally real or not, and the last discriminator is also provided with a category information loss function for classifying the generated image;
the mode of returning the judgment result to the generator is as follows: the generated RGB images with different sizes are respectively input into corresponding independent discriminators, the quality of the images is discriminated through a loss function limited in the independent discriminators, the gradient of the images is calculated and then is propagated to the whole generator network in a backward mode, and parameters of the independent discriminators and the whole generator network are updated.
2. The apparatus of claim 1, wherein the generator has a perceptual loss function for improving semantic consistency and image diversity of the generated image.
3. The highly converged GAN network device of claim 1, wherein the three independent discriminators comprise: a first discriminator, a second discriminator, and a third discriminator.
4. A highly converged GAN network device according to claim 3, wherein the first discriminator and the second discriminator each comprise: the system comprises a multilayer convolution network unit, a first 4*4 convolution unit, a second 4*4 convolution unit, a first full connection layer, a stereo copying unit, a channel splicing unit and a 1*1 convolution unit; the input signal of the multilayer convolution network unit is connected with the RGB image output by the accumulation generation block; the output signal of the multilayer convolution network unit is connected with one input signal of the channel splicing unit and is connected to the local image loss function through the first 4*4 convolution unit; the input signal of the first full-connection layer is connected with the text characteristic expression output by the text compiler; the output signal of the first full connection layer is subjected to stereo copy through a stereo copy unit and then is connected with the other input signal of the channel splicing unit; and output signals of the channel splicing units are connected to the matched pairwise loss function after being subjected to convolution operation of the 1*1 convolution unit and the second 4*4 convolution unit in sequence.
5. A highly converged GAN network device according to claim 3, wherein the third discriminator comprises: the system comprises a multilayer convolution network unit, a first 4*4 convolution unit, a second 4*4 convolution unit, a third 4*4 convolution unit, a first full connection layer, a second full connection layer, a stereo copying unit, a channel splicing unit and an 1*1 convolution unit; the input signal of the multilayer convolution network unit is connected with the RGB image output by the accumulation generation block; the output signal of the multilayer convolution network unit is connected with one input signal of the channel splicing unit and is connected to the local image loss function through the first 4*4 convolution unit; the input signal of the first full-connection layer is connected with the text characteristic expression output by the text compiler; the output signal of the first full connection layer is subjected to stereo copy through a stereo copy unit and then is connected with the other input signal of the channel splicing unit; output signals of the channel splicing units are connected to the matched pairwise loss function after being subjected to convolution operation of the 1*1 convolution unit and the second 4*4 convolution unit in sequence; and the output signal of 1*1 convolution unit is connected to the class information loss function through the third 4*4 convolution unit and the second full connection layer.
6. A method for generating image by text, wherein the text is processed by a highly converged GAN network device according to any one of claims 1 to 5, comprising the following steps:
inputting a text into a trained text compiler, and outputting a compiled feature expression;
sampling a condition characteristic expression with a certain dimensionality by adopting a condition increasing module, and then splicing the condition characteristic expression with noise in a channel dimensionality to input the condition characteristic expression into a generator network;
in a generator network, feature dimension raising is carried out through a full connection layer, the shape of the feature is transformed to 4-dimensional features, and then the feature is input into seven continuous residual error generating blocks; inputting the characteristics of different dimensions output by the last three residual generation blocks into corresponding accumulation generation blocks, and outputting RGB images of different sizes through convolution operation of the accumulation generation blocks;
the generated RGB images with different sizes are respectively input into corresponding independent discriminators, the quality of the images is discriminated through a loss function limited in the independent discriminators, the gradient of the images is calculated and then is propagated to the whole generator network in a backward mode, and parameters of the independent discriminators and the whole generator network are updated.
7. The method of claim 6,
the distinguishing of the quality of the image through the loss function limited in the independent discriminator specifically includes: judging whether the generated image is semantically matched with the input characters or not through a limited matching pairwise loss function in the three independent judgers, and judging whether the generated image is locally real or not through a limited local image loss function; in addition, for the last independent discriminator of the three independent discriminators, the generated images are also classified by a restricted class information loss function.
CN201811542578.4A 2018-12-17 2018-12-17 Highly-integrated GAN network device and method for realizing text image generation Active CN109671125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811542578.4A CN109671125B (en) 2018-12-17 2018-12-17 Highly-integrated GAN network device and method for realizing text image generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811542578.4A CN109671125B (en) 2018-12-17 2018-12-17 Highly-integrated GAN network device and method for realizing text image generation

Publications (2)

Publication Number Publication Date
CN109671125A CN109671125A (en) 2019-04-23
CN109671125B true CN109671125B (en) 2023-04-07

Family

ID=66144473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811542578.4A Active CN109671125B (en) 2018-12-17 2018-12-17 Highly-integrated GAN network device and method for realizing text image generation

Country Status (1)

Country Link
CN (1) CN109671125B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163267A (en) * 2019-05-09 2019-08-23 厦门美图之家科技有限公司 A kind of method that image generates the training method of model and generates image
CN110335212B (en) * 2019-06-28 2021-01-15 西安理工大学 Defect ancient book Chinese character repairing method based on condition confrontation network
CN110572696B (en) * 2019-08-12 2021-04-20 浙江大学 Variational self-encoder and video generation method combining generation countermeasure network
CN110909181A (en) * 2019-09-30 2020-03-24 中国海洋大学 Cross-modal retrieval method and system for multi-type ocean data
CN110930469B (en) * 2019-10-25 2021-11-16 北京大学 Text image generation method and system based on transition space mapping
CN110717555B (en) * 2019-12-12 2020-08-25 江苏联著实业股份有限公司 Picture generation system and device based on natural language and generation countermeasure network
CN111858882B (en) * 2020-06-24 2022-08-09 贵州大学 Text visual question-answering system and method based on concept interaction and associated semantics
CN111898461B (en) * 2020-07-08 2022-08-30 贵州大学 Time sequence behavior segment generation method
CN113140019B (en) * 2021-05-13 2022-05-31 电子科技大学 Method for generating text-generated image of confrontation network based on fusion compensation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416752A (en) * 2018-03-12 2018-08-17 中山大学 A method of image is carried out based on production confrontation network and removes motion blur
CN108765319A (en) * 2018-05-09 2018-11-06 大连理工大学 A kind of image de-noising method based on generation confrontation network

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10353463B2 (en) * 2016-03-16 2019-07-16 RaayonNova LLC Smart contact lens with eye driven control system and method
US9971958B2 (en) * 2016-06-01 2018-05-15 Mitsubishi Electric Research Laboratories, Inc. Method and system for generating multimodal digital images
US10600185B2 (en) * 2017-03-08 2020-03-24 Siemens Healthcare Gmbh Automatic liver segmentation using adversarial image-to-image network
CN107392973B (en) * 2017-06-06 2020-01-10 中国科学院自动化研究所 Pixel-level handwritten Chinese character automatic generation method, storage device and processing device
AU2017101166A4 (en) * 2017-08-25 2017-11-02 Lai, Haodong MR A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks
CN107862377A (en) * 2017-11-14 2018-03-30 华南理工大学 A kind of packet convolution method that confrontation network model is generated based on text image
CN108537742B (en) * 2018-03-09 2021-07-09 天津大学 Remote sensing image panchromatic sharpening method based on generation countermeasure network
CN108460717A (en) * 2018-03-14 2018-08-28 儒安科技有限公司 A kind of image generating method of the generation confrontation network based on double arbiters
CN108510532B (en) * 2018-03-30 2022-07-15 西安电子科技大学 Optical and SAR image registration method based on deep convolution GAN
CN108460812B (en) * 2018-04-04 2022-04-29 北京红云智胜科技有限公司 System and method for generating emoticons based on deep learning
CN108596265B (en) * 2018-05-02 2022-04-08 中山大学 Video generation model based on text description information and generation countermeasure network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416752A (en) * 2018-03-12 2018-08-17 中山大学 A method of image is carried out based on production confrontation network and removes motion blur
CN108765319A (en) * 2018-05-09 2018-11-06 大连理工大学 A kind of image de-noising method based on generation confrontation network

Also Published As

Publication number Publication date
CN109671125A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN109671125B (en) Highly-integrated GAN network device and method for realizing text image generation
CN107480206B (en) Multi-mode low-rank bilinear pooling-based image content question-answering method
WO2023280064A1 (en) Audiovisual secondary haptic signal reconstruction method based on cloud-edge collaboration
CN105678292A (en) Complex optical text sequence identification system based on convolution and recurrent neural network
CN109377532B (en) Image processing method and device based on neural network
CN112131347A (en) False news detection method based on multi-mode fusion
CN110852295B (en) Video behavior recognition method based on multitasking supervised learning
Yang et al. Open domain dialogue generation with latent images
CN114443899A (en) Video classification method, device, equipment and medium
US20230177384A1 (en) Attention Bottlenecks for Multimodal Fusion
CN109413068B (en) Wireless signal encryption method based on dual GAN
CN114663678A (en) ECO-GAN-based image enhancement system and method
CN116309913B (en) Method for generating image based on ASG-GAN text description of generation countermeasure network
CN115953582A (en) Image semantic segmentation method and system
CN115661904A (en) Data labeling and domain adaptation model training method, device, equipment and medium
CN115631504A (en) Emotion identification method based on bimodal graph network information bottleneck
CN115858728A (en) Multi-mode data based emotion analysis method
CN115546907A (en) In-vivo detection method and system for multi-scale feature aggregation
KR102217414B1 (en) 4D Movie Effect Generator
Majumder et al. Variational fusion for multimodal sentiment analysis
Bílková et al. Perceptual license plate super-resolution with CTC loss
KR20210035535A (en) Method of learning brain connectivity and system threrfor
CN116452906B (en) Railway wagon fault picture generation method based on text description
CN112598764B (en) Character image generation method for transferring scene style
KR20220109219A (en) Device and method for deciding whether to concatenate between character strings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant