CN114529622A - Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training - Google Patents

Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training Download PDF

Info

Publication number
CN114529622A
CN114529622A CN202210033454.3A CN202210033454A CN114529622A CN 114529622 A CN114529622 A CN 114529622A CN 202210033454 A CN202210033454 A CN 202210033454A CN 114529622 A CN114529622 A CN 114529622A
Authority
CN
China
Prior art keywords
image
training
task
branch
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210033454.3A
Other languages
Chinese (zh)
Inventor
魏莹
张见威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210033454.3A priority Critical patent/CN114529622A/en
Publication of CN114529622A publication Critical patent/CN114529622A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for generating a confrontation network to generate a high-quality image by introducing self-supervision compound task training, which comprises the following steps: (1) preparing an original image data set and a spliced image data set; (2) designing a composite task for implementing self-supervised learning for generating a countermeasure network; (3) building a model, and constructing an antagonistic training branch and an automatic supervision compound task branch; (4) training a model and storing parameters of a network; (5) and generating the image by using the trained generator network. The method for generating the high-quality image by the confrontation network by introducing the self-supervision compound task training simultaneously utilizes the information in the image and the information between the images, constructs a compound task comprising three subtasks, guides the network to learn more stable and more universal characteristics in the image, and simultaneously constructs the local discriminator to improve the capability of the network for extracting the local information of the image, thereby obviously improving the training effect of the network and improving the quality of the finally generated image.

Description

Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training
Technical Field
The invention belongs to the technical field of computer image generation, and particularly relates to a method and a device for generating a confrontation network to generate a high-quality image by introducing self-supervision compound task training.
Background
The real image data set is an indispensable tool for training the network in the field of computer vision, and a large number of real images can help the network to well learn useful feature representations, so that the network can play a good role in subsequent tasks. However, in the process of manufacturing a real image data set, due to differences of image acquisition devices, the acquired original image also needs to be subjected to alignment operations such as size adjustment and uniform resolution, which usually requires huge labor cost, which is an important reason that the network training cost is high. The generated image is an image which is generated by a well-trained generated model and is similar to a real image in a training set, the generated image can be directly mapped from random noise by the generated model, the existing data set is expanded by using a large amount of generated images generated by the generated model, one of the methods for solving the problems is that the generated model generates vivid and diversified generated images by learning the real images in the existing data set, and the data set manufacturing cost caused by manually collecting and processing data can be greatly reduced.
The generation of countermeasure networks is a generation model for intense heat in recent years, which is proposed by Goodfellow et al in the article "genetic adaptive networks" (neuroips, 2014). In generating the countermeasure network, the generator is responsible for receiving random noise and generating an image, and the discriminator is responsible for receiving a real image sample and a sample generated by the generator and judging whether the received sample is a real image. In the training process of the network, the generator and the discriminator are continuously optimized in mutual confrontation. However, the generation of the countermeasure network has the disadvantages of "catastrophic forgetting" of the discriminator and unstable training process, and even causes the problem of mode collapse. One current solution is to perform self-supervised learning for generating an antagonistic network by introducing additional auxiliary tasks, so that a discriminator learns more general and stable characteristics to improve the stability of a training process. However, the existing auxiliary task is usually a single task, which easily causes the web-learned features to have a relatively obvious task bias, for example, the rotation task proposed by Gidaris et al in the article "unused representation learning by prediction image rotation" (ICLR,2018) performs a random rotation operation on the input image and requires the network to determine the rotation angle corresponding to the received image, which can effectively help the web-learned structural features of the image, but the color, texture, and other features of the image are easily ignored by the network because the help of the task of determining the rotation angle is not great. The existing self-supervision auxiliary tasks proposed aiming at generating the countermeasure network have the problems of single task and incomplete covered characteristics, and have stronger bias for guiding the process of network learning characteristics, which is not favorable for network learning general and stable characteristics and can also influence the quality of subsequently generated images.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provide a method and a device for generating a high-quality image of an confrontation network by introducing self-supervision compound task training, and guides more stable and universal characteristics in a network learning image by designing and introducing a reasonable compound task, so that the training effect of the network is improved, and the quality of the finally generated image is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a method for generating a confrontation network to generate a high-quality image by introducing self-supervision compound task training, which comprises the following steps:
preparing a training data set, wherein the training data set comprises original image data and spliced image data; the original image data is used for a training process of a confrontation training branch, and the spliced image data is used for a training process of an automatic supervision compound task branch;
designing three subtasks to form a composite task, wherein the composite task is used for constructing a self-supervision composite task branch and providing supervision information for model training, the three subtasks are respectively a rotation prediction task, a position prediction task and a common feature extraction task, the rotation prediction task is used for correctly judging tags corresponding to image blocks contained in each spliced image, the position prediction task is used for correctly judging tags corresponding to the image blocks contained in each spliced image, the common feature extraction task is used for firstly correctly judging which original image each image block belongs to and then extracting common features among homologous image blocks;
building a model, and respectively building an antagonistic training branch and an automatic supervision compound task branch, wherein the antagonistic training branch comprises a local discriminator and a generator, and the automatic supervision compound task branch comprises a classifier with three output heads;
training the built model to obtain a trained generator network; the training specifically comprises the following steps: the method comprises the steps that an original data set is used as input of a confrontation training branch, spliced image data is used as input of an automatic supervision composite task branch, networks in the two branches are trained, and the automatic supervision composite task branch is responsible for providing supervision information for a local discriminator in the confrontation training branch in the training process;
and inputting the image to be processed into a trained generator network for image generation.
As a preferred technical solution, the preparation of the stitched image data is specifically as follows:
for original images in a batch, cutting out 4 image blocks with a certain overlap ratio omega from the upper left area, the upper right area, the lower left area and the lower right area of each image, wherein the overlap ratio omega is equal to the ratio of the side length of the overlapped part between the adjacent image blocks to the side length of the original image;
randomly disordering the obtained image blocks of one batch, and performing one-time rotation transformation, wherein the rotation angle is randomly selected from a set R (0 degrees, 90 degrees, 180 degrees and 270 degrees);
adjusting the size of the obtained image blocks of one batch by using a bilinear interpolation method to enable the side length of the image blocks to be equal to half of the side length of the original image;
and splicing the obtained image blocks by taking 4 image blocks as a group to obtain a batch of spliced images with the same size as the original image, thereby finishing the production of spliced image data.
As a preferred technical solution, the three subtasks are specifically designed as follows:
in the rotation prediction task, for a batch of obtained spliced images, each image block comprises 4 image blocks corresponding to a rotation angle, and each image block is endowed with a pseudo label l through the rotation angler,lr∈{0°,90°,180°,270°};
In the position prediction task, for a batch of obtained spliced images, 4 image blocks contained in each image respectively correspond to a fixed area position in an original image to which the image blocks belong, and a pseudo label l is given to each image block through the position informationl,llE is left at the upper left, the upper right, the lower left and the lower right;
in the common feature extraction task, for a batch of obtained spliced images, 4 image blocks contained in each image respectively correspond to one original image, the 4 image blocks belonging to the same original image are defined as homologous image blocks, and features with higher similarity between the homologous image blocks are defined as common features.
As a preferred technical scheme, the concrete steps of constructing the model are as follows:
in the confrontation training branch, a local-D and a generator G of a local discriminator are constructed, the network structure of the local discriminator is divided into two parts, and a characteristic block module is added between the two parts; the first part receives an original image as input and extracts image features, the feature blocking module is responsible for processing the image features output by the first part into image block features, the second part receives the image block features as input and generates the final output of the local discriminator, the local-D task of the local discriminator is to correctly judge whether the blocked features come from a real image or a generated image, and the loss function of the branch is recorded as LadvIn a network against the original generationThe proposed countermeasure loss is consistent, and the specific expression is as follows:
Figure BDA0003467359760000031
where x is the true image sampled from the original data set, Pdata(x)Is the distribution of the true data, z is the random noise sampled from the prior distribution, Pz(z)Is prior distribution, D is a local discriminator, G is a generator;
in the self-supervision compound task branch, a network architecture consistent with a local discriminator in an antagonistic training branch is used for building a classifier C, the classifier network is also divided into two parts, wherein the network architecture of the first part is the same as that of the first part of the local-D, the network architecture and the first part share network weight, the second part comprises three output heads of the classifier, the two heads are both formed by a full connection layer and respectively responsible for outputting the results of the rotation prediction task and the position prediction task, the third head is formed by a multi-layer perceptron comprising a hidden layer and responsible for outputting the results of the common feature extraction task, and the total loss function of the branch is recorded as LCT
As an optimal technical scheme, a plurality of loss functions are adopted for the branch of the self-supervision compound task to carry out combined optimization, and the total loss function L of the branch of the self-supervision compound taskCTIs defined as:
LCT=Lrot+Lloc+LcFE
wherein L isrot、Lloc、LCFERespectively representing rotation prediction task loss, position prediction task loss and common feature extraction task loss;
recording the real rotation label of each image block in the spliced image as lr_gtThe true position label is ll_gtThe rotation label predicted by the classifier for the image block is lrPredicted position label is llAnd a group of vectors output by the multilayer perceptron in the shared feature extraction task is marked as v1,v2,…,v{k.k=n×4}Using binary cross entropyCalculating the loss of the rotation prediction and the position prediction, and calculating the similarity between different block characteristics by using cosine similarity, wherein the calculation formula of the loss of the three tasks is as follows:
Lrot=CrossEntropy(lr,lr_gt)
Lloc=CrossEntropy(ll,ll_gt)
Figure BDA0003467359760000041
Figure BDA0003467359760000042
wherein tau is a temperature coefficient, n is a training batch size, CiAnd representing a set of subscripts corresponding to homologous image blocks of the ith image block, wherein I is an indication function, when a judgment condition is met, the function value is 1, and otherwise, the function value is 0.
As a preferred technical scheme, the specific steps of model training are as follows:
in the countermeasure training branch, a generator G and a local-D discriminator are alternately and iteratively trained, the input of the local discriminator is a batch of images sampled from an original image data set, and the training target is to correctly judge the authenticity of the images in a certain area in the input images; the input of the generator is random noise, and the training target is to output a generated image which is as real as possible and can cheat the local discriminator;
in the self-supervision compound task branch, the input of the classifier is a batch of spliced images, three output heads respectively output the results of three subtasks, and the branch passes through a total loss function LCTTraining the network;
the confrontation training branch and the self-supervision compound task branch are trained simultaneously, the two branches are connected in the training process by sharing the network weight of the local-D and first parts of the three-head classifier C, and the total loss function of the model is defined as:
Figure BDA0003467359760000043
wherein L isadv(G, D) is to combat training loss, LCT(C) In order to automatically monitor the loss of the compound task, the local discriminators local-D and the generator G are alternately updated and are simultaneously updated with the three-head classifier C in the model training process.
As a preferred technical solution, the inputting the image to be processed into the trained generator network for image generation specifically includes:
random noise is input into a trained generator network, and a high-quality generated image similar to the training set image can be obtained through forward propagation.
The invention provides a system for generating a confrontation network to generate a high-quality image by introducing self-supervision compound task training, and the method is applied to the method for generating the confrontation network to generate the high-quality image by introducing self-supervision compound task training and comprises a data set module, a compound task module, a model building module, a model training module and an image generating module;
the data set module is used for preparing a training data set, and the training data set comprises original image data and spliced image data; the original image data is used for a training process of a confrontation training branch, and the spliced image data is used for a training process of an automatic supervision compound task branch;
the composite task module is used for designing three subtasks to form a composite task, the composite task is used for constructing a self-supervision composite task branch and providing supervision information for model training, the three subtasks are respectively a rotation prediction task, a position prediction task and a common feature extraction task, the rotation prediction task is used for correctly judging tags corresponding to image blocks contained in each spliced image, the position prediction task is used for correctly judging tags corresponding to the image blocks contained in each spliced image, and the common feature extraction task is used for firstly correctly judging which original image each image block belongs to and then extracting common features among homologous image blocks;
the model building module is used for respectively building an antagonistic training branch and an automatic supervision compound task branch, the antagonistic training branch comprises a local discriminator and a generator, and the automatic supervision compound task branch comprises a classifier with three output heads;
the model training module is used for training the built model to obtain a trained generator network; the training specifically comprises the following steps: the method comprises the steps that an original data set is used as input of a confrontation training branch, spliced image data is used as input of an automatic supervision composite task branch, networks in the two branches are trained, and the automatic supervision composite task branch is responsible for providing supervision information for a local discriminator in the confrontation training branch in the training process;
and the image generation module is used for inputting the image to be processed into the trained generator network for image generation.
Yet another aspect of the present invention provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the method of generating an anti-network generated high quality image by introducing self-supervised task-compounding training.
Yet another aspect of the present invention provides a computer-readable storage medium storing a program which, when executed by a processor, implements the method for generating a high-quality image of an anti-net by introducing an auto-supervised task-compounding training.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) aiming at the self-supervision learning of the generated confrontation network, the invention provides a compound auxiliary task based on multi-level information, and meanwhile, the information in the image and the information between the images are used as supervision during network training, so that the training process is stabilized, the universality of the characteristics extracted by the network can be improved, and the quality of the generated image is improved.
(2) The invention provides a local-D (local-local discriminant), so that the network pays attention to more local feature information in the feature extraction process and is matched with the self-supervision compound task branch, and the self-supervision learning effect of the whole model is improved.
(3) The common feature extraction task provided by the invention introduces the idea of comparison learning into the self-supervision learning for generating the countermeasure network, improves the quality of feature extraction by utilizing the advantages of the comparison learning, simultaneously keeps the size of a training batch at a smaller value, and brings larger network performance gain with smaller training cost.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating an overall method for generating a high-quality image of an anti-net by introducing an auto-supervised multitask training according to an embodiment of the present invention.
FIG. 2 is a flow chart of an image processing module according to an embodiment of the present invention; the method comprises three states of image data, wherein the state 1 is an initial state of the image data in a batch, the state 2 is a state after each image is cut into 4 image blocks, and the state 3 is a state after all the image blocks are randomly disordered, randomly rotated and spliced to obtain a batch of spliced image data.
Fig. 3 is an overall structure diagram of a network model according to an embodiment of the present invention.
Fig. 4 is a block diagram of a generator network according to an embodiment of the present invention.
Fig. 5 is a block diagram of a deconvolution module in a generator network according to an embodiment of the present invention.
Fig. 6 is a structural diagram of a local arbiter network according to an embodiment of the present invention.
Fig. 7 is a structural diagram of a convolution module in the local arbiter according to an embodiment of the present invention.
FIG. 8 is a block diagram of a feature partitioning module according to an embodiment of the present invention.
FIG. 9 is a block diagram of a system for generating a high quality image of an anti-net by introducing an auto-supervised multitask training according to an embodiment of the present invention.
Fig. 10 is a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, the embodiment provides a method for generating a high-quality image of an anti-network by introducing an auto-supervised multitask training, including the following steps:
s1, preparing a training data set comprising an original image data set and a spliced image data set;
furthermore, the original image data set is used as an input of a countertraining branch, and public data sets of Cifar-10, CelebA, ImageNet 32X 32, STL-10 and the like are obtained as the original image data set of the method through a network search engine.
Further, the stitched image data set is used as an input of an auto-supervised compound task branch, and is obtained by performing a certain processing on the original image data set, and the processing flow refers to fig. 2. Firstly, cutting 4 image blocks with a certain side length overlapping rate omega of 0.4 from the upper left area, the upper right area, the lower left area and the lower right area of each image for an original image in a batch, then randomly disordering the image blocks, carrying out one-time rotation transformation, randomly selecting a rotation angle from a set R (0 degrees, 90 degrees, 180 degrees and 270 degrees), adjusting the size of the image blocks by using a bilinear interpolation method to enable the side length of the image blocks to be equal to half of the side length of the original image, and finally splicing the image blocks by taking 4 image blocks as a group to obtain a batch of spliced images with the same size as the original image, namely obtaining a required spliced image data set.
S2, designing a compound task, wherein the compound task is used for constructing a powerful supervision signal for the training process of the network;
further, in order to fully utilize the information of the image, the present embodiment designs a composite task including three sub-tasks, which are respectively a rotation prediction task, a position prediction task, and a common feature extraction task, wherein the first two tasks are designed based on intra-image information, and the third task is designed based on inter-image information.
Furthermore, the rotation prediction task uses the rotation information of the image block to construct a pseudo label lr,lrE {0 °, 90 °, 180 °, 270 ° }, which requires the network to correctly determine the rotation labels corresponding to the 4 image blocks included in the stitched image. The network can only successfully accomplish this task if the structural information in the image blocks is well understood.
Furthermore, the position prediction task utilizes the position information of the image block in the original image to construct a pseudo label ll,llAnd e is left-up, right-up, left-down and right-down, and the task requires the network to correctly judge the original position labels corresponding to the 4 image blocks contained in the spliced image. Specifically, in the manufactureWhen the image data sets are spliced, image blocks are obtained by cutting four areas, namely the upper left area, the upper right area, the lower left area and the lower right area of the original image, so that each image block corresponds to a position label in the original image, and a network can correctly predict the position labels of the image blocks only by better understanding the image blocks and the structural characteristics of the original image.
Furthermore, the common feature extraction task utilizes information among different images and requires a network to extract common features among homologous image blocks. Defining 4 image blocks cut out from the same original image as homologous image blocks, and the feature with high similarity between the image blocks is a common feature, wherein the task is characterized in that: the 4 homologous image blocks jointly form an original image with complete semantics, and a certain overlapping area exists between the homologous image blocks, so that some common features exist between the homologous image blocks, and the common features can help the network to correctly distinguish the homologous image blocks from the non-homologous image blocks, so that in the process of completing the task, the extraction capability of the network on the common features is continuously enhanced, and the network is helped to learn representative features in the image.
S3, building a network model;
referring to fig. 3, the whole network model building process includes four parts: and realizing a generator by using a ResNet-50 network, realizing a local discriminator by using the ResNet-50 network, constructing a three-head classifier, and designing a loss function of each branch and a total loss function of the network.
Further, the concrete steps of building the network model are as follows:
s31, constructing a generator network:
referring to fig. 4, the structure of the constructed generator network; for input random noise, firstly, learning is carried out through a full-connection layer with 4096 output channels, and the size of the obtained vector is adjusted to be 4 multiplied by 256; then passing through three continuous deconvolution modules; and then carrying out batch normalization, ReLU activation and 3 × 3 convolution (the number of output channels is 3, the step length is 1), and finally obtaining final output with the size of 32 × 32 × 3 after Sigmoid activation.
Referring to fig. 5, the structure of the deconvolution module includes the following specific operations: carrying out batch normalization, ReLU activation, upsampling, 3X 3 convolution, ReLU activation and 3X 3 convolution on input X to obtain X1(ii) a Performing up-sampling and 1 × 1 convolution on input X to obtain X2The final deconvolution module output is X1+X2. The number of output channels of the convolution operation involved in the deconvolution module is 256, and the step length is 1.
S32, constructing a local discriminator network:
please refer to fig. 6 for the structure of the local arbiter network; the local arbiter network is divided into two parts, wherein the first part is composed of a plurality of convolution modules and a ReLU activation operation and is responsible for outputting the characteristics of the middle layer; the second part is composed of a feature blocking module and a full connection layer, and is responsible for processing the characteristics of the middle layer and generating final output. The input image, having an initial size of 32 x 3, is first passed through a first convolution module comprising the steps of: the input X is convolved by 3 × 3, activated by ReLU, convolved by 3 × 3, and convolved by 1 × 1 (with step size of 2) to obtain X1The input X is convolved by 1 × 1 (step size 2) or 1 × 1 (step size 1) to obtain X2The output is X1+X2(ii) a Then passing through three successive convolution modules; after primary ReLU activation is carried out on the output of the convolution module, the obtained intermediate layer characteristics are input into the characteristic block partitioning module for processing, and block partitioning characteristics are output; and finally, inputting the block characteristics into a full-connection layer with the number of output channels being 1 to obtain a final output result.
Further, referring to fig. 7, the structure of three consecutive convolution modules specifically includes the following steps: ReLU activation and 3 × 3 convolution repeated twice on input X, and 1 × 1 convolution (step size is 2) once, to obtain X1The input X is convolved 1 × 1 (step size 1) and 1 × 1 (step size 2) to obtain X2The output of the final convolution module is X1+X2. The number of output channels of all convolution operations involved in the convolution module is 128, and the step size of all 3 × 3 convolutions is 1. It should be noted that in the latter two convolution modules, the 1 × 1 convolution operation with step size 2 is left out, and thereforeThe size of the intermediate layer features obtained after the final ReLU activation was 8 × 8 × 128.
As shown in fig. 8, for the input middle layer feature F, average pooling is performed first (average pooling kernel size is 2 × 2, step size is 2), and then 2 × 2 blocking operation is performed on the obtained feature in the length and width dimensions to obtain a blocking feature FpThe size is 2 × 2 × 128, which is four times the number of input intermediate layer features F.
S33, constructing a three-head classifier network:
the structure of the three-head classifier network is also divided into two parts, wherein the structure of the first part is the same as that of the local discriminator, the first part shares weight and synchronously updates parameters, and the first part is mainly responsible for extracting intermediate layer characteristics from the input spliced image; the second part contains the feature blocking module and three output headers. The structure of the feature blocking module please refer to fig. 8, which is responsible for processing the middle layer features into blocking features and outputting the blocking features to the three headers. Two of the three output heads are respectively composed of a simple full connection layer, the number of output channels is 4, and the output heads are respectively responsible for predicting the rotation angle and the original position label of the image block; the third head is composed of a multilayer perceptron comprising a hidden layer and is responsible for finishing the common feature extraction task, the number of output channels of the hidden layer is consistent with the number of input channels of the block feature, a ReLU activation function is used, and finally the number of output channels of the output layer is 64. A method of performing an additional non-linear transformation of features using a multi-layered perceptron was first proposed in the article "a simple frame for coherent learning of visual representations" (ICML,2020) by Chen et al, which can help the network learn a higher quality representation of features.
S4, designing a loss function:
total loss function L of the model proposed in this embodimentfinalConsists of two branches of loss functions, namely: loss function L against training branchesadvAnd loss function L of the branch of the self-supervision compound taskCT. The expression of the model total loss function is as follows:
Figure BDA0003467359760000091
wherein L isadvIs consistent with the loss function in classical generation countermeasure networks, which was first proposed by Goodfellow et al in the article "genetic adaptive networks. L isadvThe specific expression of (a) is as follows:
Figure BDA0003467359760000101
where x is the true image sampled from the original data set, Pdata(x)Is the distribution of the true data, z is the random noise sampled from the prior distribution, Pz(z)Is a prior distribution, D is a local discriminator, and G is a generator.
LCTConsists of the loss of three subtasks, namely: rotation prediction task loss LrotPosition prediction task loss LlocCommon feature extraction task loss LCFEIt is defined as:
LCT=Lrot+Lloc+LCFE
Lrot=CrossEntropy(lr,lr_gt)
Lloc=CrossEntropy(ll,ll_gt)
Figure BDA0003467359760000102
Figure BDA0003467359760000103
wherein lr_gtAnd ll_gtThe true rotation label and the true position label, l, of each image block in the mosaic image are respectively representedrAnd llRotation label and position label, Cro, respectively, representing the classifier as a prediction of the image blockssEncopy (. cndot.) is a binary cross-entropy function, v1,v2,…,v{k.k=n×4}Representing a group of vectors output by the multilayer perceptron, sim (-) is a cosine similarity function, I is an indication function, when a judgment condition is met, the function value is 1, otherwise, the function value is 0, tau is a temperature coefficient, the default value is 0.3, n is the size of a training batch, the default value is 64, CiThe set of indices corresponds to a homologous image block representing the ith image block.
S4, training the constructed model;
inputting an original image data set into a generator and a local discriminator for alternate training, inputting a spliced image data set into a three-head classifier for training, and performing loss optimization by using an Adam algorithm with a self-adaptive learning rate, wherein the training batch size is 64, and 30 ten thousand times of training are performed in total.
Further, in the training process, random noise is input into a generator for the confrontation training branch to obtain a generated image, when the generated image is input into the discriminator, the confrontation training loss and the back propagation gradient are calculated, and the generator adjusts parameters to optimize so that the generated image closer to a real image tends to be generated; when a batch of real images sampled from an original data set are input into a local discriminator, the antithetical training loss and the back propagation gradient are calculated, and the discriminator adjusts parameters to optimize, so that the discrimination capability of the generated images and the real images tends to be improved; for the self-supervision compound task branch, when a batch of spliced images are input into a three-head classifier, the classifier simultaneously carries out three subtasks in the compound task, calculates the total loss and the back propagation gradient of the compound task, adjusts parameters for optimization, so that the classifier tends to extract more comprehensive and more universal image characteristics to better complete the compound task, and shares the parameters with a local discriminator, so that the classifier and the local discriminator are updated simultaneously in the training process, and the local discriminator also obtains the characteristic extraction capability brought by the compound task. In the training process of the model, the generator and the local discriminator form a mutual confrontation relationship, namely the generator aims to generate a vivid image which can cheat the discriminator, the discriminator aims to continuously improve the capability of distinguishing the vivid image from the generated image, and when the confrontation between the two reaches the balance, the image generation capability of the generator and the feature extraction capability of the discriminator reach a higher level. When training is completed, the parameters of the entire model are saved.
S5, generating an image;
by removing the local discriminators in the confrontation training branch and the entire self-supervised composite task branch, image generation can be accomplished using only the generator network: random noise is input into a generator, and an image with high quality similar to the original data set can be obtained through forward propagation.
The method for generating the high-quality image by the confrontation network by introducing the self-supervision compound task training simultaneously utilizes the information in the image and the information between the images, constructs a compound task comprising three subtasks, guides the network to learn more stable and more universal characteristics in the image, and simultaneously constructs a local discriminator to improve the capability of the network for extracting the local information of the image, thereby obviously improving the training effect of the network and improving the quality of the finally generated image.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present invention is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present invention.
Based on the same idea as the method for generating the high-quality image by generating the countermeasure network through introducing the self-supervision multitask training in the embodiment, the invention also provides a system for generating the high-quality image by generating the countermeasure network through introducing the self-supervision multitask training, and the system can be used for executing the method for generating the high-quality image by generating the countermeasure network through introducing the self-supervision multitask training. For illustrative purposes, only those portions relevant to the embodiments of the present invention are shown in the schematic structural diagram of the system for generating a high-quality image against a network by introducing an automatic supervision multitasking training, and it will be understood by those skilled in the art that the illustrated structure does not constitute a limitation of the apparatus, and may include more or less components than those illustrated, or may combine some components, or may be arranged differently.
Referring to fig. 9, in another embodiment of the present application, there is provided a system 100 for generating a high-quality image of an antagonistic network by introducing an auto-supervised compound task training, the system including a data set module 101, a compound task module 102, a model building module 103, a model training module 104, and an image generating module 105;
the data set module 101 is configured to prepare a training data set, where the training data set includes original image data and stitched image data; the original image data is used for a training process of a confrontation training branch, and the spliced image data is used for a training process of an automatic supervision compound task branch;
the composite task module 102 is configured to design three subtasks to form a composite task, where the composite task is used to construct a self-supervision composite task branch and provide supervision information for model training, the three subtasks are a rotation prediction task, a position prediction task, and a common feature extraction task, the rotation prediction task is used to correctly determine tags corresponding to image blocks included in each stitched image, the position prediction task is used to correctly determine tags corresponding to image blocks included in each stitched image, and the common feature extraction task is used to correctly determine which original image each image block belongs to first, and then extract common features between homologous image blocks;
the model building module 103 is used for respectively building an antagonistic training branch and an automatic supervision compound task branch, wherein the antagonistic training branch comprises a local discriminator and a generator, and the automatic supervision compound task branch comprises a classifier with three output heads;
the model training module 104 is configured to train the built model to obtain a trained generator network; the training specifically comprises the following steps: the method comprises the steps that an original data set is used as input of a confrontation training branch, spliced image data is used as input of an automatic supervision composite task branch, networks in the two branches are trained, and the automatic supervision composite task branch is responsible for providing supervision information for a local discriminator in the confrontation training branch in the training process;
the image generation module 105 is configured to input an image to be processed to a trained generator network for image generation.
It should be noted that, the system for generating a high-quality image by generating an anti-network through introducing an auto-supervised composite task training of the present invention corresponds to the method for generating a high-quality image by generating an anti-network through introducing an auto-supervised composite task training of the present invention one to one, and the technical features and the advantages thereof described in the above embodiment of the method for generating a high-quality image by generating an anti-network through introducing an auto-supervised composite task training are both applicable to the embodiment of generating a high-quality image by generating an anti-network through introducing an auto-supervised composite task training, and specific contents thereof may be referred to the description in the embodiment of the method of the present invention, and are not described herein again, thus it is stated that.
In addition, in the implementation of the system for generating a high-quality image for an anti-network by introducing the self-supervision multitask training according to the above embodiment, the logical division of the program modules is only an example, and in practical applications, the above function distribution may be performed by different program modules according to needs, for example, due to the configuration requirements of corresponding hardware or the convenience of implementation of software, that is, the internal structure of the system for generating a high-quality image for an anti-network by introducing the self-supervision multitask training is divided into different program modules to perform all or part of the above described functions.
Referring to fig. 10, in an embodiment, an electronic device 200 for implementing a method for generating an anti-network-generated high-quality image by introducing an auto-supervised compound task training is provided, and the electronic device may include a first processor 201, a first memory 202 and a bus, and may further include a computer program stored in the first memory 202 and executable on the first processor 201, such as an auto-supervised compound task training anti-network-generated high-quality image program 203.
The first memory 202 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The first memory 202 may in some embodiments be an internal storage unit of the electronic device 200, such as a removable hard disk of the electronic device 200. The first memory 202 may also be an external storage device of the electronic device 200 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 200. Further, the first memory 202 may also include both an internal storage unit and an external storage device of the electronic device 200. The first memory 202 can be used for storing not only application software installed in the electronic device 200 and various types of data, such as codes for generating a network-opposing-network-generating high-quality image program 203 for the self-supervision multitasking training, but also temporarily storing data that has been output or will be output.
The first processor 201 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The first processor 201 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 200 by running or executing programs or modules (e.g., federal learning defense programs, etc.) stored in the first memory 202 and calling data stored in the first memory 202.
Fig. 10 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device 200, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The self-supervised multitasking training and generation countermeasure network generation high-quality image program 203 stored in the first memory 202 of the electronic device 200 is a combination of a plurality of instructions, and when executed in the first processor 201, can realize that:
preparing a training data set, wherein the training data set comprises original image data and spliced image data; the original image data is used for a training process of a confrontation training branch, and the spliced image data is used for a training process of an automatic supervision compound task branch;
designing three subtasks to form a composite task, wherein the composite task is used for constructing a self-supervision composite task branch and providing supervision information for model training, the three subtasks are respectively a rotation prediction task, a position prediction task and a common feature extraction task, the rotation prediction task is used for correctly judging tags corresponding to image blocks contained in each spliced image, the position prediction task is used for correctly judging tags corresponding to the image blocks contained in each spliced image, the common feature extraction task is used for firstly correctly judging which original image each image block belongs to and then extracting common features among homologous image blocks;
building a model, and respectively building an antagonistic training branch and an automatic supervision compound task branch, wherein the antagonistic training branch comprises a local discriminator and a generator, and the automatic supervision compound task branch comprises a classifier with three output heads;
training the built model to obtain a trained generator network; the training specifically comprises the following steps: the method comprises the steps that an original data set is used as input of a confrontation training branch, spliced image data is used as input of an automatic supervision composite task branch, networks in the two branches are trained, and the automatic supervision composite task branch is responsible for providing supervision information for a local discriminator in the confrontation training branch in the training process;
and inputting the image to be processed into a trained generator network for image generation.
Further, the integrated modules/units of the electronic device 200 may be stored in a non-volatile computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. The method for generating the confrontation network to generate the high-quality image by introducing the self-supervision compound task training is characterized by comprising the following steps:
preparing a training data set, wherein the training data set comprises original image data and spliced image data; the original image data is used for a training process of a confrontation training branch, and the spliced image data is used for a training process of an automatic supervision compound task branch;
designing three subtasks to form a composite task, wherein the composite task is used for constructing a self-supervision composite task branch and providing supervision information for model training, the three subtasks are respectively a rotation prediction task, a position prediction task and a common feature extraction task, the rotation prediction task is used for correctly judging tags corresponding to image blocks contained in each spliced image, the position prediction task is used for correctly judging tags corresponding to the image blocks contained in each spliced image, the common feature extraction task is used for firstly correctly judging which original image each image block belongs to and then extracting common features among homologous image blocks;
building a model, and respectively building an antagonistic training branch and an automatic supervision compound task branch, wherein the antagonistic training branch comprises a local discriminator and a generator, and the automatic supervision compound task branch comprises a classifier with three output heads;
training the built model to obtain a trained generator network; the training specifically comprises the following steps: the method comprises the steps that an original data set is used as input of a confrontation training branch, spliced image data is used as input of an automatic supervision composite task branch, networks in the two branches are trained, and the automatic supervision composite task branch is responsible for providing supervision information for a local discriminator in the confrontation training branch in the training process;
and inputting the image to be processed into a trained generator network for image generation.
2. The method for generating the confrontation network to generate the high-quality image by introducing the self-supervision compound task training as claimed in claim 1, wherein the preparation of the stitched image data is specifically as follows:
for original images in a batch, cutting out 4 image blocks with a certain overlap ratio omega from the upper left area, the upper right area, the lower left area and the lower right area of each image, wherein the overlap ratio omega is equal to the ratio of the side length of the overlapped part between the adjacent image blocks to the side length of the original image;
randomly disordering the obtained image blocks of one batch, and performing one-time rotation transformation, wherein the rotation angle is randomly selected from a set R (0 degrees, 90 degrees, 180 degrees and 270 degrees);
adjusting the size of the obtained image blocks of one batch by using a bilinear interpolation method to enable the side length of the image blocks to be equal to half of the side length of the original image;
and splicing the obtained image blocks by taking 4 image blocks as a group to obtain a batch of spliced images with the same size as the original image, thereby finishing the production of spliced image data.
3. The method for generating the confrontation network generation high-quality image by introducing the self-supervision compound task training as claimed in claim 1, wherein the three subtasks are specifically designed as follows:
in the rotation prediction task, for a batch of obtained spliced images, each image block comprises 4 image blocks corresponding to a rotation angle, and each image block is endowed with a pseudo label l through the rotation angler,lr∈{0°,90°,180°,270°};
In the position prediction task, for a batch of obtained spliced images, 4 image blocks contained in each image respectively correspond to a fixed area position in an original image to which the image blocks belong, and a pseudo label l is given to each image block through the position informationl,llE is left at the upper left, the upper right, the lower left and the lower right;
in the common feature extraction task, for a batch of obtained spliced images, 4 image blocks contained in each image respectively correspond to one original image, the 4 image blocks belonging to the same original image are defined as homologous image blocks, and features with higher similarity between the homologous image blocks are defined as common features.
4. The method for generating the confrontation network to generate the high-quality image by introducing the self-supervision compound task training according to the claim 1, characterized in that the specific steps of constructing the model are as follows:
in the confrontation training branch, a local-D and a generator G of a local discriminator are constructed, the network structure of the local discriminator is divided into two parts, and a characteristic block module is added between the two parts; the first part receives an original image as input and extracts image features, the feature blocking module is responsible for processing the image features output by the first part into image block features, the second part receives the image block features as input and generates the final output of the local discriminator, the local-D task of the local discriminator is to correctly judge whether the blocked features come from a real image or a generated image, and the loss function of the branch is recorded as LadvConsistent with the countermeasure loss proposed in the originally generated countermeasure network, the specific expression is as follows:
Figure FDA0003467359750000021
where x is the true image sampled from the original data set, Pdata(x)Is the distribution of the true data, z is the random noise sampled from the prior distribution, Pz(z)Is prior distribution, D is a local discriminator, G is a generator;
in the self-supervision compound task branch, a network architecture consistent with a local discriminator in an antagonistic training branch is used for building a classifier C, the classifier network is also divided into two parts, wherein the network architecture of the first part is the same as that of the first part of local-D, the first part and the second part share network weight, and the second partThree output heads containing a classifier, wherein two heads are respectively composed of a full connection layer and are respectively responsible for outputting results of a rotation prediction task and a position prediction task, the third head is composed of a multilayer perceptron comprising a hidden layer and is responsible for outputting results of a common feature extraction task, and the total loss function of the branch is recorded as LCT
5. Method for generating confrontation network-generated high-quality images by introducing self-supervision task composition training according to claim 4, characterized in that a plurality of loss functions are used for combined optimization of the self-supervision task composition branch, the total loss function L of which isCTIs defined as:
LCT=Lrot+Lloc+LCFE
wherein L isrot、Lloc、LCFERespectively representing rotation prediction task loss, position prediction task loss and common feature extraction task loss;
recording the real rotation label of each image block in the spliced image as lr_gtThe true position label is ll_gtThe rotation label predicted by the classifier for the image block is lrPredicted position label is llAnd a group of vectors output by the multilayer perceptron in the shared feature extraction task is marked as v1,v2,...,v{k.k=n×4}The loss of the rotation prediction and the position prediction is calculated by using binary cross entropy, the similarity between different block characteristics is calculated by using cosine similarity, and the calculation formula of the loss of the three tasks is as follows:
Lrot=CrossEntropy(lr,lr_gt)
Lloc=CrossEntropy(ll,ll_gt)
Figure FDA0003467359750000031
Figure FDA0003467359750000032
wherein tau is a temperature coefficient, n is a training batch size, CiAnd representing a set of subscripts corresponding to homologous image blocks of the ith image block, wherein I is an indication function, when a judgment condition is met, the function value is 1, and otherwise, the function value is 0.
6. The method for generating the confrontation network to generate the high-quality image by introducing the self-supervision compound task training as the claim 1, wherein the specific steps of the model training are as follows:
in the confrontation training branch, a generator G and a local-D discriminator are alternately and iteratively trained, the input of the local discriminator is a batch of images sampled from an original image data set, and the training target is to correctly judge the authenticity of the images in a certain area in the input images; the input of the generator is random noise, and the training target is to output a generated image which is as real as possible and can cheat the local discriminator;
in the self-supervision compound task branch, the input of the classifier is a batch of spliced images, three output heads respectively output the results of three subtasks, and the branch passes through a total loss function LCTTraining the network;
the confrontation training branch and the self-supervision compound task branch are trained simultaneously, the two branches are connected by sharing the network weight of the local discriminator local-D and the first part of the three-head classifier C in the training process, and the total loss function of the model is defined as:
Figure FDA0003467359750000033
wherein L isadv(G, D) is to combat training loss, LCT(C) In order to automatically monitor the loss of the compound task, the local discriminators local-D and the generator G are alternately updated and are simultaneously updated with the three-head classifier C in the model training process.
7. The method for generating a high-quality image of an anti-network by introducing an auto-supervised compound task training according to claim 1, wherein the image to be processed is input to a trained generator network for image generation, specifically:
random noise is input into a trained generator network, and a high-quality generated image similar to the training set image can be obtained through forward propagation.
8. The system for generating the confrontation network to generate the high-quality image by introducing the self-supervision compound task training is characterized by being applied to the method for generating the confrontation network to generate the high-quality image by introducing the self-supervision compound task training in any one of claims 1 to 7, and comprising a data set module, a compound task module, a model building module, a model training module and an image generating module;
the data set module is used for preparing a training data set, and the training data set comprises original image data and spliced image data; the original image data is used for a training process of a confrontation training branch, and the spliced image data is used for a training process of an automatic supervision compound task branch;
the composite task module is used for designing three subtasks to form a composite task, the composite task is used for constructing a self-supervision composite task branch and providing supervision information for model training, the three subtasks are respectively a rotation prediction task, a position prediction task and a common feature extraction task, the rotation prediction task is used for correctly judging tags corresponding to image blocks contained in each spliced image, the position prediction task is used for correctly judging tags corresponding to the image blocks contained in each spliced image, and the common feature extraction task is used for firstly correctly judging which original image each image block belongs to and then extracting common features among homologous image blocks;
the model building module is used for respectively building an antagonistic training branch and an automatic supervision compound task branch, the antagonistic training branch comprises a local discriminator and a generator, and the automatic supervision compound task branch comprises a classifier with three output heads;
the model training module is used for training the built model to obtain a trained generator network; the training specifically comprises the following steps: the method comprises the steps that an original data set is used as input of a confrontation training branch, spliced image data is used as input of an automatic supervision composite task branch, networks in the two branches are trained, and the automatic supervision composite task branch is responsible for providing supervision information for a local discriminator in the confrontation training branch in the training process;
and the image generation module is used for inputting the image to be processed into the trained generator network for image generation.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the method of generating an anti-net generated high quality image by introducing an auto-supervised task composite training set forth in any one of claims 1-7.
10. A computer-readable storage medium storing a program which, when executed by a processor, implements the method of generating an antagonistic network generated high quality image by the introduction of an unsupervised multitask training according to any one of claims 1 to 7.
CN202210033454.3A 2022-01-12 2022-01-12 Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training Pending CN114529622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210033454.3A CN114529622A (en) 2022-01-12 2022-01-12 Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210033454.3A CN114529622A (en) 2022-01-12 2022-01-12 Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training

Publications (1)

Publication Number Publication Date
CN114529622A true CN114529622A (en) 2022-05-24

Family

ID=81620731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210033454.3A Pending CN114529622A (en) 2022-01-12 2022-01-12 Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training

Country Status (1)

Country Link
CN (1) CN114529622A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115564024A (en) * 2022-10-11 2023-01-03 清华大学 Feature distillation method and device for generating network, electronic equipment and storage medium
CN115661113A (en) * 2022-11-09 2023-01-31 浙江酷趣智能科技有限公司 Moisture-absorbing and sweat-releasing fabric and preparation process thereof
CN115796242A (en) * 2023-02-10 2023-03-14 南昌大学 Electronic digital information anti-forensics method
CN116091748A (en) * 2023-04-10 2023-05-09 环球数科集团有限公司 AIGC-based image recognition system and device
CN116862766A (en) * 2023-06-28 2023-10-10 北京金阳普泰石油技术股份有限公司 Intelligent mapping and iterative seamless splicing method and device based on edge generation model

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115564024A (en) * 2022-10-11 2023-01-03 清华大学 Feature distillation method and device for generating network, electronic equipment and storage medium
CN115564024B (en) * 2022-10-11 2023-09-15 清华大学 Characteristic distillation method, device, electronic equipment and storage medium for generating network
CN115661113A (en) * 2022-11-09 2023-01-31 浙江酷趣智能科技有限公司 Moisture-absorbing and sweat-releasing fabric and preparation process thereof
CN115796242A (en) * 2023-02-10 2023-03-14 南昌大学 Electronic digital information anti-forensics method
CN115796242B (en) * 2023-02-10 2023-05-16 南昌大学 Electronic digital information anti-evidence obtaining method
CN116091748A (en) * 2023-04-10 2023-05-09 环球数科集团有限公司 AIGC-based image recognition system and device
CN116862766A (en) * 2023-06-28 2023-10-10 北京金阳普泰石油技术股份有限公司 Intelligent mapping and iterative seamless splicing method and device based on edge generation model

Similar Documents

Publication Publication Date Title
CN114529622A (en) Method and device for generating confrontation network to generate high-quality image by introducing self-supervision compound task training
US11256960B2 (en) Panoptic segmentation
Zhang et al. Deep exemplar-based video colorization
Tasar et al. StandardGAN: Multi-source domain adaptation for semantic segmentation of very high resolution satellite images by data standardization
CN109492627B (en) Scene text erasing method based on depth model of full convolution network
CN111598182A (en) Method, apparatus, device and medium for training neural network and image recognition
CN114038006A (en) Matting network training method and matting method
Zheng et al. T-net: Deep stacked scale-iteration network for image dehazing
Yang et al. CODON: on orchestrating cross-domain attentions for depth super-resolution
CN115984323A (en) Two-stage fusion RGBT tracking algorithm based on space-frequency domain equalization
Kan et al. A GAN-based input-size flexibility model for single image dehazing
CN109871790B (en) Video decoloring method based on hybrid neural network model
Shen et al. ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
CN116758449A (en) Video salient target detection method and system based on deep learning
CN110414593B (en) Image processing method and device, processor, electronic device and storage medium
US20230073175A1 (en) Method and system for processing image based on weighted multiple kernels
CN116612416A (en) Method, device and equipment for dividing video target and readable storage medium
CN116562366A (en) Federal learning method based on feature selection and feature alignment
CN110826563A (en) Finger vein segmentation method and device based on neural network and probability map model
CN112529081B (en) Real-time semantic segmentation method based on efficient attention calibration
Liu et al. Dsma: Reference-based image super-resolution method based on dual-view supervised learning and multi-attention mechanism
CN112084371B (en) Movie multi-label classification method and device, electronic equipment and storage medium
Lu et al. Siamese Graph Attention Networks for robust visual object tracking
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN113610736A (en) Night image enhancement method and system based on cyclic generation of residual error network and QTP loss item

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination