CN117408921A

CN117408921A - Image complement method based on generation countermeasure network inversion reasoning and self-encoder

Info

Publication number: CN117408921A
Application number: CN202311459854.1A
Authority: CN
Inventors: 宋斌; 张志勇; 张中亚; 王业晨; 孔功胜; 张丽丽; 李玉祥; 赵长伟; 向菲; 张婷
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-01-16

Abstract

The invention discloses an image complement method based on generation countermeasure network inversion reasoning and a self-encoder, which belongs to the field of artificial intelligence. Second, the closest potential codes are inferred using a learning-based GAN inversion, and a complete image is reconstructed using the trained GAN. Finally, the quantitative comparison shows that the invention has higher complement quality when the image has large-area damage. The image complement method based on the generated countermeasure network inversion reasoning and the self-encoder is good at mining information from the inside of the image, and can generate more semantic results with the help of the generated countermeasure network model which is trained in advance, so that more vivid and reasonable high-resolution image results are generated.

Description

Image complement method based on generation countermeasure network inversion reasoning and self-encoder

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an image complement method based on generation countermeasure network inversion reasoning and a self-encoder.

Background

In daily life, images play a vital information transfer role, and ensuring that they are intact is vital to the integrity of the information transfer. However, image files may be subject to breakage or masking for various reasons, which can seriously affect the implementation of computer vision tasks, such as photo damage or monitoring that the image content is occluded. To ensure complete transfer of image information, scientists have proposed a number of approaches in the field of image processing and computer vision, where image completion is a key research point. The image complement method mainly utilizes the computer vision technology to complement the missing information, so as to restore the content of the original image. In this way, subsequent image analysis and processing tasks can be more accurate and reliable.

The current image complement methods can be broadly divided into two main categories, namely, a traditional method and a deep learning-based method. The traditional method comprises the following steps: structural-based image complement, texture-based image complement, sparse representation-based image complement, and the like. Structural repair methods are implemented by using partial differential equations, however, the robustness of these algorithms is poor, and uncertainty problems such as unclear after repair may occur. The texture-based method uses existing range texture information to construct the missing information part, and effectively solves the problem of unclear full range. However, it should be noted that this method is limited in the ability to acquire high-level semantic information, and thus is poor in processing high-resolution images of complex textures and the like. Based on the sparse representation method, the method can effectively represent the known information of the image. However, when the range in which the completion is required is excessively large, limited by the limitation of the known information, this method still has an undesirable situation in terms of the completion effect.

Disclosure of Invention

It is an object of the present invention to provide an image complement method based on generation of an antagonistic network inversion reasoning and self-encoder to solve the above mentioned problems in the background art.

In order to achieve the above object, the present invention provides an image complement method based on generation of an antagonism network inversion reasoning and self-encoder, comprising the steps of:

s1, extracting an online public data set by using a Python code and performing unified image transformation processing to obtain a damaged image data set;

s2, constructing an inversion reasoning and self-encoder model based on a generated type countermeasure network, and training by using a damaged image data set to obtain a trained neural network model;

s3, obtaining a damaged image to be complemented, carrying out normalization processing and inputting the damaged image to a trained neural network model to obtain a complement result of the picture.

Preferably, the step S1 specifically includes:

s11, acquiring various public data sets through a Kaggle official network and a Github;

s12, writing a Python program, performing pre-data processing on the data set, and performing image processing to ensure that the data set can be used in the model training process.

Preferably, the step S2 specifically includes:

s21, designing a self-encoder-based generation type countermeasure network, learning a mapping from random noise to a low-dimensional feature map through a generator of a training generation type countermeasure network, and converting the feature map generated by the generator into a high-definition resolution image through an encoder;

s22, adopting a learning-based inversion strategy of the generated type countermeasure network, fixing the trained generated type countermeasure network, designing and training an encoder network, and predicting the closest potential encoding knowledge from a given damaged image.

Preferably, the generator and the arbiter in the step S21 are implemented following the widely used deep convolution generating type countermeasure network DC-GAN.

Preferably, the encoder for reducing the dimension in step S21 is composed of 3 downsampling layers, and all the encoder is composed of 4×4 convolution kernels, the number of the convolution kernels is 64, 128, 16, and the activation functions are LeakyReLU, leakyReLU, tanh; the decoder for the dimension increase consists of 3 upsampling layers, the convolution kernels are 4×4 in size, the number of convolution kernels is 128, 64, 3, and ReLU, reLU, tanh is adopted as the activation function.

Preferably, the encoder network structure in the step S22 has a total of 9 layers, the first 7 layers adopt convolution layers, and the second 2 layers are implemented by using full connection layers.

Preferably, a small-batch gradient descent algorithm is adopted to update model parameters during neural network model training optimization, the batch_size is set to be 4, any loss function is optimized by using an Adam optimizer, and the learning rate is set to be 0.0001 to finely adjust the network.

Therefore, the image complement method based on the generation countermeasure network inversion reasoning and the self-encoder has the following beneficial effects:

(1) Building and training a neural network model, and based on the trained neural network model, the experimental qualitative result shows that the generated image has more reasonable structure and more real texture detail;

(2) The invention is better at mining information from the inside of the image, and can generate more semantic results with the help of the generated type countermeasure network model which is trained in advance, thereby generating more vivid and more reasonable high-resolution image results;

(3) The invention has higher complement quality when the complement task involves a large area of absence.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a block flow diagram of a process performed by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network structure in an embodiment of the present invention;

FIG. 3 is a neural network training diagram in an embodiment of the invention;

FIG. 4 is a graph of the qualitative complementation results of the present invention during an experiment;

FIG. 5 is a graph of the quantitative complementation results of the present invention during an experiment.

Detailed Description

Examples

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-3, the invention discloses an image complement method based on generation of an antagonism network inversion reasoning and self-encoder, which comprises the following steps:

The data set used in this example totals 130000 sheets. From this 10% was taken as test set and the rest as training set. The experimental environment of the invention: python version Python3.6, trained using the open source Python machine learning library pythorch1.10.0 in combination with cuda 11.3. The model parameters during training and optimizing the neural network model are updated by adopting a small-batch gradient descent algorithm in all experiments, the batch_size is set to be 4, all loss functions are optimized by using an Adam optimizer, and the learning rate is set to be 0.0001 for fine tuning the network until the network converges.

Training hardware conditions: the GPU model is NVIDIA TITAN Xp GPU, and the video memory is 12GB; the CPU model is 12 cores Intel (R) Xeon (R) Platinum 8255.

Then constructing a neural network model, wherein the model is formed by two stages, firstly designing a self-encoder-based generation type countermeasure network, the network learns the mapping from noise to a low-dimensional feature map through training a generator of the generation type countermeasure network, and then converting the feature map generated by the generator into a high-definition resolution image, so that noise distribution is mapped to a high-resolution real image, the learning difficulty of a mapping relation is reduced, and a generator and a discriminator of the generation type countermeasure network are realized by following a widely used deep convolution generation type countermeasure network DC-GAN. The encoder ER for dimension reduction consists of 3 downsampling layers, all adopt 4×4 convolution kernels, the number of the convolution kernels is 64, 128, 16 respectively, and the activation functions adopt LeakyReLU, leakyReLU, tanh respectively. The decoder for increasing the dimension consists of 3 upsampling layers, the convolution kernels are 4 multiplied by 4, the number of the convolution kernels is 128, 64 and 3 respectively, and ReLU, reLU, tanh is adopted as an activation function respectively; secondly, we use a learning-based generated-type countermeasure network inversion strategy to design an encoder to infer the closest potential encoding knowledge from the corrupted image, and then reconstruct the complete image using the trained generated-type countermeasure network. The encoder model architecture in the inversion strategy is shown in table 1:

table 1 encoder structure

To qualitatively evaluate the effect of each aspect of the model, the present model was compared to several models of Pconv, gated Conv, and comdgan. Fig. 4 is a qualitative visual comparison of the various models, and it can be seen that the results produced by both Pconv, gated Conv methods include partially distorted content, certain artifact effects and color differences, and the results and performance of the comdgan model are similar to those of the present method, but it generally produces discordant content and unmasked regions. The completion method is more good at mining information from the inside of the image, and can generate more semantic results with the help of the generated type countermeasure network model which is trained in advance, so that different mask areas can be better processed, and more vivid and more reasonable high-resolution image results are generated.

To integrate the effects of the various aspects of the comparative model, structural Similarity (SSIM), peak signal-to-noise ratio (PSNR), frechet Inception Distance (FID value) were used as experimental evaluation indexes.

Structural similarity the similarity degree of two images is evaluated according to the brightness, structure and contrast of the images, the value range is [0,1], and the SSIM value is more prone to 1, which shows that the higher the similarity of the two images is, the better the image complement quality is.

Peak signal-to-noise ratio is that scientific researchers calculate the difference of pixels of two images by adopting the widest measuring method to evaluate the image complement quality, and the larger the value is, the more true the sample of the generated image is, so the better the complement quality is.

Frechet Inception Distance is an index for measuring distances between multidimensional variable distributions, and is particularly suitable for evaluating the diversity and quality of a generated image. The smaller the Frechet Inception Distance value, the more rich the diversity of the generated image, and the higher the quality of the image.

The method is superior to the comparative method in terms of 3 indexes of Structural Similarity (SSIM), peak signal-to-noise ratio (PSNR) and Frechet Inception Distance (FID), and the method shows extremely excellent performance, which proves that the method has higher complement quality when the complement task involves a large area of missing.

The quantitative results of the experiment are shown in FIG. 5.

It should be noted that, parts not described in detail in this application are all prior art.

Therefore, the image complement method based on the generated countermeasure network inversion reasoning and the self-encoder is good at mining information from the inside of the image, and can generate more semantic results with the help of the generated countermeasure network model which is trained in advance, so that more realistic and reasonable high-resolution image results are generated, and the image complement method has higher complement quality when the complement task relates to large-area missing.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. An image complement method based on generation of an antagonism network inversion reasoning and self-encoder is characterized by comprising the following steps:

2. The method for generating image complement for countering network inversion reasoning and self-encoder according to claim 1, wherein said step S1 specifically comprises:

s12, writing a Python program, performing pre-data processing on the data set, and performing image processing.

3. The method for generating image complement for countering network inversion reasoning and self-encoder according to claim 1, wherein said step S2 specifically comprises:

s22, adopting a learning-based inversion strategy of the generated type countermeasure network, fixing the trained generated type countermeasure network, designing and training an encoder network, and predicting approaching potential encoding knowledge from a given damaged image.

4. The image complement method based on generating countermeasure network inversion reasoning and self-encoder as recited in claim 3, wherein: the generator and the arbiter in said step S21 follow the deep convolution generating type countermeasure network DC-GAN.

5. The method of generating image complement for countering network inversion reasoning and self-encoder as recited in claim 4, wherein: the encoder for reducing the dimension in the step S21 consists of 3 downsampling layers, 4×4 convolution kernels are adopted, the number of the convolution kernels is 64, 128 and 16, and the activation functions are LeakyReLU, leakyReLU, tanh; the decoder for the dimension increase consists of 3 upsampling layers, the convolution kernels are 4×4 in size, the number of convolution kernels is 128, 64, 3, and ReLU, reLU, tanh is adopted as the activation function.

6. The image complement method based on generating countermeasure network inversion reasoning and self-encoder as recited in claim 3, wherein: the encoder network structure in the step S22 has a 9-layer structure, the first 7 layers adopt convolution layers, and the second 2 layers are realized by using full connection layers.

7. The image complement method based on generating countermeasure network inversion reasoning and self-encoder of claim 1, wherein: and updating model parameters during training and optimizing the neural network model by adopting a small-batch gradient descent algorithm, setting batch_size to be 4, optimizing any loss function by using an Adam optimizer, and setting learning rate to be 0.0001 to finely adjust the network.