CN114387170A

CN114387170A - Image repairing method for improving edge incoherence phenomenon of repairing area

Info

Publication number: CN114387170A
Application number: CN202011116700.9A
Authority: CN
Inventors: 张文强; 陈飞宇; 张睿; 张传法; 邓苇
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2022-04-22

Abstract

The invention provides an image repairing method for improving the edge incoherence phenomenon of a repairing area, belonging to the field of image repairing in the computer vision, and the method comprises the following steps: preprocessing a data set, constructing a network model, optimizing the network model, screening the network model and repairing a defective image to be repaired, wherein the construction of the network model is mainly divided into the construction of a generation network model by utilizing image multi-resolution information and a soft attention mechanism and the construction of a relative judgment network model for judgment. The method can automatically and effectively repair the defect-removed picture, and compared with the prior method, the repaired picture has more continuity of texture, more detailed picture content and more vivid overall visual effect.

Description

Image repairing method for improving edge incoherence phenomenon of repairing area

Technical Field

The invention belongs to the field of computer vision image restoration, and particularly relates to an image restoration method for improving the edge incoherence phenomenon of a restoration area.

Background

In recent years, with the rapid development of the internet and the popularization of intelligent mobile devices, people increasingly like to record and share their drip life in the form of images and short videos. On the other hand, the way of recording history and inheriting culture with images and video as carriers is also getting more and more attention from governments and common people. However, old images are often damaged or there are often unwanted people or objects in the images for display. Solving these problems not only requires a skilled technician, but also wastes a lot of time. This makes the image restoration work more and more regarded by the scientific community. Image Inpainting (Image Inpainting) refers to the reconstruction of lost or damaged portions of images and video so that the viewer cannot perceive that the Image was once missing or repaired. The method aims to automatically recover the lost information according to the existing information of the image, and can be used for recovering the lost information in the old photo, removing watermark characters, hiding wrong objects and the like.

At present, with the continuous and deep research of artificial intelligence, image restoration and artificial intelligence are gradually becoming mainstream research directions. Image restoration methods based on Convolutional Neural Networks (CNNs) and generative countermeasure networks (GANs) have achieved good performance in the field of image restoration. However, the conventional image restoration method combining artificial intelligence still has the problems of inconsistent image texture of the restored area and the image texture of the original area, missing details of the restored content, distorted object structure of the restored picture and the like.

Disclosure of Invention

The present invention has been made to solve the above problems, and an object of the present invention is to provide an image inpainting method for improving an edge discontinuity phenomenon in an inpainting region.

The invention provides an image repairing method for improving the edge incoherence phenomenon of a repairing area, which is used for repairing a defective image and has the characteristics that the method comprises the following steps: step 1, acquiring an original image data set, dividing the original image data set into a test set and a training set, and then preprocessing an original image in the test set to obtain a preprocessed test set with incomplete images; step 2, constructing a relative countermeasure network model, wherein the relative countermeasure network model comprises a generating network and a relative judging network which utilize image multi-resolution information and a soft attention mechanism, the generating network is used for repairing incomplete images to obtain complete images, and the judging network is used for judging the complete images to obtain probability feature vectors for describing the truth and the falseness of the complete images; step 3, inputting original images of a training set into a relative countermeasure network model, then respectively carrying out network optimization training on a generation network and a relative discrimination network, respectively calculating relative countermeasure loss, deeply supervised space loss and regional space loss, optimizing network parameters of the generation network and the relative discrimination network based on a back propagation algorithm and an Adam optimizer, then carrying out iterative updating on the network parameters according to a set learning rate, and alternately updating the generation network and the relative discrimination network, wherein when a complete image of the generated network restoration is close to a picture of the original image when the complete image is not incomplete, the network training is completed; step 4, inputting the preprocessed incomplete images of the test set into a trained relative confrontation network model, judging whether the relative confrontation generation network is over-fitted or not through evaluation indexes, returning to the step 3 if judging that the relative confrontation generation network is over-fitted, and reserving the generation network after training as an image restoration network if judging that the relative confrontation generation network is not over-fitted; and 5, inputting the incomplete image to be repaired into an image repairing network to obtain a repaired complete image.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: wherein, the pretreatment in the step 1 specifically comprises the following steps: and generating a rectangular shielding surface to randomly shield any position picture of the original image in the test set so as to obtain the incomplete image.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: wherein the generating network in step 2 has a multi-resolution encoder and a dual channel decoder.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: wherein the multi-resolution encoder has 5 sub-networks for feature extraction, and each sub-network for feature extraction includes 4 convolutional layers, the convolution operations in the convolutional layers all use convolution kernels of size, and the ELU functions are all used in the convolutional layers as activation functions, in the same layer of feature extraction sub-networks, the first convolutional layer uses a convolutional kernel with the moving step length of 2, the next three convolutional layers use a convolutional kernel with the moving step length of 1, the first three convolutional layers of the previous layer of feature extraction sub-networks and the first three convolutional layers of the next layer of feature extraction sub-networks perform one-to-one corresponding feature transmission, meanwhile, the second convolutional layer and the third convolutional layer are subjected to one-time reduction to half of the original size in the transmission process, and the fourth convolutional layer of the feature extraction sub-network of the previous layer transmits the output features to the first convolutional layer of the feature extraction sub-network of the next layer.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: the dual-channel decoder is provided with two feature reduction sub-networks, the first feature reduction sub-network is formed by connecting four soft attention mechanism network layers and a deconvolution network layer in series, the second feature reduction sub-network is formed by connecting five deconvolution network layers in series, the first reduction sub-network is in jump connection with the multi-resolution decoder, and the corresponding network layers between the first reduction sub-network and the second reduction sub-network have feature transfer relations.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: the relative discrimination network in the step 2 divides the regions of the complete image, measures the truth of each region through a relative countermeasure loss function, and discriminates by adopting a LeakyReLU activation function.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: wherein, the learning rate in step 3 is 0.0001, the relative confrontation loss is calculated by a relative confrontation loss function, and the relative confrontation loss function is:

the deeply supervised spatial loss is calculated from a spatial loss function acting on the characteristics of each layer resolution output of the dual channel decoder, and the formula is as follows:

the regional spatial loss is calculated by a regional spatial loss function, wherein the regional spatial loss function is as follows:

L_loral＝||(x_r-x_f)⊙(1-M)||₁

in the formula (I), the compound is shown in the specification,

and

for loss of the generating network and loss of the relative discriminating network, x, respectively_rAnd x_fRespectively an original image and a complete image restored with a generating network,

and

are respectively based on x_rAnd x_fAverage value of (D)_RaFor relative discriminant networks, x is the number of missed portions_rAnd x_fThe picture contents in (1) are consistent,

for the original image of the corresponding nth layer resolution size, p_n(f_n) Representing the result of compressing the multi-channel feature from the n-th layer into a three-channel feature, which indicates the multiplication of the corresponding elements of the matrix, M is such that the original image x_rBecome incomplete pictureFor the occlusion plane of the image, M is a single-channel tensor, the numerical value corresponding to the missing region is 0, and the numerical value corresponding to the non-missing region is 1.

The image inpainting method for improving the edge incoherence of the inpainting area provided by the invention also has the following characteristics: wherein, the evaluation indexes in the step 4 are peak signal-to-noise ratio PSNR, average absolute error MAE, mean square error MSE and structure similarity SSIM.

Action and Effect of the invention

According to the image repairing method for improving the edge incoherence phenomenon of the repairing area, the advantages of the confrontation network in deep learning are combined, the generation network and the relative judgment network of multi-resolution information and a soft attention mechanism are adopted, the image repairing is completed through the iterative network model, the repairing accuracy is high, and a better repairing result is obtained.

Therefore, the image repairing method for improving the edge incoherence phenomenon of the repairing area can automatically and effectively repair the defect-removed image. Compared with the prior method, the repaired image texture has more continuity, the picture content has more detail, and the overall visual effect is more vivid.

Drawings

FIG. 1 is a flow chart of an image inpainting method of the present invention based on a relative confrontation generation network using image multi-resolution information and a soft attention mechanism;

FIG. 2 is a schematic diagram of an original image in an embodiment of the invention;

FIG. 3 is a diagram of a masked image in an embodiment of the invention;

FIG. 4 is a mask diagram of a mask in an embodiment of the invention;

fig. 5 is a schematic structural diagram of a relative countermeasure generation network in an embodiment of the invention;

FIG. 6 is a schematic diagram of a soft attention mechanism according to an embodiment of the present invention;

fig. 7 is a diagram of an image restoration result in an embodiment of the present invention.

Detailed Description

In order to make the technical means and functions of the present invention easy to understand, the present invention is specifically described below with reference to the embodiments and the accompanying drawings.

FIG. 1 is a flow chart of an image inpainting method of the present invention based on a relative confrontation generation network that utilizes image multi-resolution information and a soft attention mechanism.

As shown in fig. 1, an image inpainting method for improving the edge incoherence of an inpainting region according to the present invention is used for inpainting a defective image, and includes the following steps:

step 1, acquiring an original image data set, dividing the original image data set into a test set and a training set, and then preprocessing the original image in the test set to obtain a preprocessed test set with incomplete images.

In the invention, the pretreatment specifically comprises the following steps: and generating a rectangular shielding surface to randomly shield any position picture of the original image in the test set so as to obtain the incomplete image.

In the invention, the preprocessing of the data set can provide a proper data image for network model optimization and network model screening.

And 2, constructing a relative countermeasure network model, wherein the relative countermeasure network model comprises a generation network and a relative judgment network which utilize image multi-resolution information and a soft attention mechanism, the generation network is used for repairing the incomplete image to obtain a complete image, and the judgment network is used for judging the complete image to obtain a probability feature vector for describing the truth and the falseness of the complete image.

In the present invention, the soft attention mechanism is a method for constructing the similarity between the characteristics of the missing region and the reserved region. The soft idea is mainly reflected in that firstly, the sense field with the size of 3 × 3 and the sense field with the size of 5 × 5 are used to determine and calculate the value on the occlusion surface, secondly, the reverse thinking is used to determine whether the feature square block corresponding to the calculated value of the occlusion surface belongs to the non-reserved area, the feature square block corresponding to the non-reserved area and the feature square block corresponding to the reserved area are distinguished, and then the feature square block corresponding to the non-reserved area and the feature square block corresponding to the reserved area are used again, and the similarity between the feature square block in the reserved area and the feature square block in the non-reserved area is calculated through the cosine similarity theorem. And finally, taking the similarity as a weight, and carrying out weighted summation on the feature blocks of the reserved area to form a new feature block of the unreserved area.

In addition, the generation network is divided into an encoder and a decoder, and the encoder can effectively utilize the multi-resolution information of the image, while the generation network in the invention has a multi-resolution encoder and a dual-channel decoder.

Wherein the multi-resolution encoder has 5 sub-networks for feature extraction, and each sub-network for feature extraction includes 4 convolutional layers, the convolution operations in the convolutional layers all use convolution kernels of size, and the ELU functions are all used in the convolutional layers as activation functions, in the same layer of feature extraction sub-networks, the first convolutional layer uses a convolutional kernel with the moving step length of 2, the next three convolutional layers use a convolutional kernel with the moving step length of 1, the first three convolutional layers of the previous layer of feature extraction sub-networks and the first three convolutional layers of the next layer of feature extraction sub-networks perform one-to-one corresponding feature transmission, meanwhile, the second convolutional layer and the third convolutional layer are reduced to half of the original size in a one-time mode in the transmission process, and the fourth convolutional layer of the feature extraction sub-network of the previous layer transmits the output features to the first convolutional layer of the feature extraction sub-network of the next layer; the dual-channel decoder is provided with two feature reduction sub-networks, the first feature reduction sub-network is formed by connecting four soft attention mechanism network layers and a deconvolution network layer in series, the second feature reduction sub-network is formed by connecting five deconvolution network layers in series, the first reduction sub-network is in jump connection with the multi-resolution decoder, and the corresponding network layers between the first reduction sub-network and the second reduction sub-network have feature transfer relations.

In the invention, the relative discrimination network divides the complete image into regions, measures the truth of each region through a relative countermeasure loss function, and discriminates by adopting a LeakyReLU activation function.

And 3, inputting the original images of the training set into a relative countermeasure network model, then respectively carrying out network optimization training on the generation network and the relative judgment network, respectively calculating relative countermeasure loss, deeply supervised space loss and regional space loss, optimizing network parameters of the generation network and the relative judgment network based on a back propagation algorithm and an Adam optimizer, then carrying out iterative updating on the network parameters according to a set learning rate, and alternately updating the generation network and the relative judgment network, wherein when the generated network repaired complete image is close to the original image frame without defects, the network training is completed.

In the invention, the optimization of the relative confrontation network model specifically finds out proper network parameters, so that the output restored complete image of the generated network is more consistent with the original image. The screening of the relative confrontation network model mainly finds the image which can not participate in the model optimization process through proper indexes, so that the relative confrontation network model with the best repairing effect is obtained.

In the present invention, the learning rate is set to 0.0001.

Secondly, the relative confrontation loss is calculated by a relative confrontation loss function, and the formula of the relative confrontation loss function is as follows:

thirdly, the spatial loss of the deep supervision is calculated by a spatial loss function, and the spatial loss function acts on the characteristics of each layer resolution output of the dual-channel decoder, and the formula is as follows:

finally, the regional spatial loss is calculated by a regional spatial loss function, which is:

L_local＝||(x_r-x_f)⊙(1-M)||₁

in the above-mentioned formula, the compound of formula,

and

and

for the original image of the corresponding nth layer resolution size, p_n(f_n) Representing the result of compressing the multi-channel feature from the n-th layer into a three-channel feature, which indicates the multiplication of the corresponding elements of the matrix, M is such that the original image x_rAnd M is a single-channel tensor, the numerical value corresponding to the missing area is 0, and the numerical value corresponding to the non-missing area is 1.

And 4, inputting the preprocessed incomplete images of the test set into the trained relative confrontation network model, judging whether the relative confrontation generation network is over-fitted or not through the evaluation indexes, returning to the step 3 if judging that the relative confrontation generation network is over-fitted, and reserving the generation network after training as an image restoration network if judging that the relative confrontation generation network is not over-fitted.

In the invention, the evaluation indexes are peak signal-to-noise ratio PSNR, mean absolute error MAE, mean square error MSE and structure similarity SSIM.

And 5, inputting the incomplete image to be repaired into an image repairing network to obtain a repaired complete image.

Example (b):

the embodiment takes a network model of the CelebA-HQ face data set as an example.

1. Data set preprocessing

Fig. 2 is a schematic diagram of an original image in an embodiment of the present invention, fig. 3 is a schematic diagram of an image with a mask in an embodiment of the present invention, and fig. 4 is a schematic diagram of a mask in an embodiment of the present invention.

First, the CelebA-HQ dataset was divided into a training set and a test set as a whole. The training set includes 27000 original images, and the testing set includes 3000 original images, and the original images are shown in fig. 2. The original image from the CelebA-HQ test set is then randomly covered by a rectangular mask, as shown in FIG. 4, so that the defect image to be repaired is obtained, as shown in FIG. 3, and the white squares in FIG. 4 can have random positions.

2. Construction of network model

Fig. 5 is a schematic structural diagram of a relative countermeasure generation network in the embodiment of the present invention.

As shown in FIG. 5, the inputs in the network are a defect image and a mask image of the same size. In the soft attention mechanism, the soft attention module receives the feature output from the feature extraction sub-network encoding the same corresponding layer and the feature delivered by the previous layer to restore or repair with weighting.

FIG. 6 is a schematic diagram of a soft attention mechanism according to an embodiment of the present invention.

As shown in fig. 6, two features are tiled with a size of 3 × 3, and two sets of tiles are rearranged along the direction of the channel to obtain two sets of feature tiles. Meanwhile, the masks are divided into two groups of mask squares according to the same principle and the sizes of the receptive fields of 3 multiplied by 3 and 5 multiplied by 5. The first set of feature blocks from the encoder is used to calculate similarity, and the second set of feature blocks from the previous layer is the object being replaced. Two sets of mask blocks are used to determine the similarity to be used. Finally, the weighted features are output after passing through four parallel but interconnected convolutional layers.

The relative discrimination network will input the repaired complete image and the corresponding original image in sequence, as shown in fig. 5. The discriminator is operated by five layers of convolution, the convolution kernel size of each layer is 5 x 5, and the step size of the shift is 2. The relative discrimination network measures whether the input complete image is more true or false according to a certain size area.

3. Optimization of network models

And inputting the original images in the training set of CelebA-HQ into a relative confrontation network model, and respectively performing network optimization training on the generated network and the relative discrimination network. The purpose of the training is to make the relative discrimination network judge that the original image is more true and the restored complete image is more false, and to make the generation network generate the restored complete image which makes the relative discrimination network more true. Specifically, the learning rates of the generator and the discriminator in this embodiment are set to 0.0001, the epoch of the loop iteration is 50, and the trained network model is obtained after the optimization is completed.

4. Screening of network models

And (3) acting the trained network model on the incomplete image of the CelebA-HQ test set, judging whether the trained model is over-fitted through four indexes of PSNR, MAE, MSE and SSIM, and if the over-fitting phenomenon is judged, correspondingly reducing the value of epoch. And if judging that no overfitting occurs, keeping the generation network after training is completed.

In this embodiment, it is found that for the CelebA-HQ dataset, when the epoch is 50, a good repairing effect can be obtained, and no overfitting phenomenon occurs.

5. Repairing a defective image to be repaired

The incomplete image to be repaired and the corresponding mask are input into the screened generation network, so that a repaired complete image is obtained, as shown in fig. 7.

Through the 5 steps, a repaired complete image is finally obtained, and therefore the repairing method capable of well repairing the incomplete image is obtained.

Effects and effects of the embodiments

Compared with other methods, the repairing method of the embodiment can solve the problem of discontinuous content of the boundary of the repairing area, and meanwhile, the integrally repaired picture can be more vivid and reliable.

According to the image repairing method for improving the edge incoherence phenomenon of the repairing area, the advantages of the confrontation network in deep learning are combined, the image repairing is completed through the iterative network model by adopting the multi-resolution information and the generation network and the relative judgment network of the soft attention mechanism, the repairing accuracy is high, and a better repairing result is obtained.

Therefore, the image inpainting method for improving the edge incoherence of the inpainting area of the embodiment can automatically and effectively repair the defect-removed image. Compared with the prior method, the repaired image texture has more continuity, the picture content has more detail, and the overall visual effect is more vivid.

The above embodiments are preferred examples of the present invention, and are not intended to limit the scope of the present invention.

Claims

1. An image repairing method for improving the edge incoherence phenomenon of a repairing area is used for repairing a defective image and is characterized by comprising the following steps:

step 1, acquiring an original image data set, dividing the original image data set into a test set and a training set, and then preprocessing the original images in the test set to obtain a preprocessed test set with the defective images;

step 2, constructing a relative countermeasure network model, wherein the relative countermeasure network model comprises a generating network and a relative judging network which utilize image multi-resolution information and a soft attention mechanism, the generating network is used for repairing the incomplete image to obtain a complete image, and the judging network is used for judging the complete image to obtain a probability feature vector for describing the truth and falseness of the complete image;

step 3, inputting the original images of the training set into the relative countermeasure network model, then respectively performing network optimization training on the generation network and the relative judgment network, respectively calculating relative countermeasure loss, deeply supervised spatial loss and regional spatial loss, optimizing network parameters of the generation network and the relative judgment network based on a back propagation algorithm and an Adam optimizer, then performing iterative updating of the network parameters according to a set learning rate, and alternately updating the generation network and the relative judgment network, and when the complete images repaired by the generation network are close to the images of the original images without defects, completing the network training;

step 4, inputting the preprocessed incomplete images of the test set into the trained relative confrontation network model, judging whether the relative confrontation generation network is over-fitted through evaluation indexes, returning to the step 3 if over-fitting is judged, and reserving the generation network after training as an image restoration network if over-fitting is not judged;

and 5, inputting the incomplete image to be repaired into the image repairing network to obtain a repaired complete image.

2. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 1, wherein:

wherein the pretreatment in the step 1 specifically comprises: and generating a rectangular shielding surface to randomly shield any position picture of the original image in the test set so as to obtain the incomplete image.

3. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 1, wherein:

wherein the generation network in step 2 has a multi-resolution encoder and a dual channel decoder.

4. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 3, wherein:

wherein the multi-resolution encoder has 5 sub-networks of feature extraction, and each sub-network of feature extraction contains 4 convolutional layers, the convolutional operations in the convolutional layers all use convolutional kernels of size, and the convolutional layers all use ELU function as activation function,

in the same layer of feature extraction sub-networks, the first convolution layer uses convolution kernels with the moving step length of 2, the next three convolution layers use convolution kernels with the moving step length of 1, the first three convolution layers of the previous layer of feature extraction sub-network and the first three convolution layers of the next layer of feature extraction sub-network can perform one-to-one corresponding feature transmission, meanwhile, the second convolution layer and the third convolution layer can perform one-time reduction to half of the original size in the transmission process, and the fourth convolution layer of the previous layer of feature extraction sub-network can transmit output features to the first convolution layer of the next layer of feature extraction sub-network.

5. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 3, wherein:

wherein the dual-channel decoder has two feature reduction sub-networks, a first of the feature reduction sub-networks is formed by connecting four soft attention mechanism network layers and one deconvolution network layer in series, a second of the feature reduction sub-networks is formed by connecting five deconvolution network layers in series,

the first restoring sub-network is in hop connection with the multi-resolution decoder, and the corresponding network layer between the first restoring sub-network and the second restoring sub-network has a feature transfer relationship.

6. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 1, wherein:

wherein, the relative discriminant network in the step 2 divides the complete image into regions, and measures the truth of each region through a relative confrontation loss function,

and the relative discrimination network adopts a LeakyReLU activation function to discriminate.

7. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 1, wherein:

wherein the learning rate in the step 3 is 0.0001,

the relative confrontation loss is calculated by a relative confrontation loss function, and the formula of the relative confrontation loss function is as follows:

the deeply supervised spatial loss is calculated from a spatial loss function, and the spatial loss function is acted on the characteristics of each layer resolution output of the dual-channel decoder, and the formula is as follows:

L_local＝||(x_r-x_f)⊙(1-M)||₁

in the above-mentioned formula, the compound of formula,

and

respectively are uncookedLoss of the network and relative discriminant network loss, x_rAnd x_fRespectively an original image and a complete image restored with a generating network,

and

for the original image of the corresponding nth layer resolution size, p_n(f_n) Representing the result of compressing the multi-channel signature from the nth layer into a three-channel signature,

indicates multiplication of corresponding elements of the matrix, M is such that the original image x_rAnd M is a single-channel tensor, the numerical value corresponding to the missing area is 0, and the numerical value corresponding to the non-missing area is 1.

8. The image inpainting method for improving the edge incoherence of the inpainting area according to claim 1, wherein:

wherein, the evaluation indexes in the step 4 are peak signal-to-noise ratio PSNR, average absolute error MAE, mean square error MSE and structure similarity SSIM.