CN108537753B

CN108537753B - Image restoration method based on context feature space constraint

Info

Publication number: CN108537753B
Application number: CN201810317267.1A
Authority: CN
Inventors: 胡瑞敏; 廖良; 肖晶; 朱荣; 王中元; 陈宇; 姜金元
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2018-04-10
Filing date: 2018-04-10
Publication date: 2021-12-03
Anticipated expiration: 2038-04-10
Also published as: CN108537753A

Abstract

The invention provides an image restoration method based on context feature space constraint, which comprises a network model training part and an image restoration part, wherein the network model training part comprises a feature coding network and a feature decoding network for training a defective region, and adopts the trained feature coding network of the defective image to construct a context feature space and guide the training of the feature coding network of the image to be restored; the image restoration part comprises the steps of extracting features by using a feature coding network of an image to be restored, decoding the features by using a feature decoding network of a defect area, generating image content of the defect area, and outputting the restored image. The method can effectively extract the characteristics close to the corresponding defect region from the image to be repaired, simultaneously avoids the characteristic interference from different images, improves the robustness of the model and can obtain better image repairing effect.

Description

Image restoration method based on context feature space constraint

Technical Field

The invention belongs to the field of image processing, and particularly relates to an image restoration method facing large-area image area defect and based on context feature space constraint.

Background

The image restoration refers to the work of filling defect or shielding image areas by using visually reasonable contents, is widely applied to application scenes such as cultural relic protection, movie special effects, shielding object filling, error concealment in image/video transmission and the like, has important application value, and is a research hotspot of computer graphics and computer vision. Existing image restoration algorithms can be roughly classified into two categories according to the size and shape of the region to be restored: information diffusion-based methods and sample matching-based methods.

The information diffusion-based method is a method for smoothly diffusing image contents into the defect region according to information at the edge of the defect region, and finally realizing defect region filling. Such methods are primarily directed to the repair of linear structures or elongated regions, such as object lines or edges, often formally expressed as a problem of partial differential equation solution. However, such methods have difficulty in repairing cases of loss of repetitive texture or large area defects by using only the surrounding information of the defect area. The sample matching method is a method of searching the most matched image block in the image to be repaired block by block as a filling block by adopting a similar block matching criterion until the defect area is completely filled. The method mainly aims at image blocks with similar texture modes in known data, and focuses more on structure diffusion during repair. However, when there is a large area of lost image area, especially when the content to be generated cannot be found in the image to be repaired, many unknown structures and textures need to be filled, and the implementation effect of this kind of method is not ideal.

In recent years, image restoration methods based on deep learning and scene understanding have been developed. Pathak et al [ reference 1] at Berkeley division, university of California, 2016, first proposed a model for a context encoder that solves the problem of large area image block loss repair by learning using a convolutional neural network. The model is trained to extract semantic features of the image to be repaired and to decode the features for prediction of the content of the defective region. The method uses two loss functions at the decoding end to ensure the quality of the generated content: 1) reconstructing an overall structure of the loss constraint generation content; 2) the competitive loss constrains the authenticity of the generated content. Document 2 proposes a global competitive loss function that not only restricts the authenticity of the generated content using the competitive loss function, but also maintains the continuity of the generated content and the known content for the restored whole image. Document 3 proposes a face structure in which a loss function of face semantic parsing constrains the generated content in the face filling problem. The main idea of the context encoder and the extended work on the context encoder is to extract the features of an image to be restored, then decode the extracted features Z to generate prediction content, and implicitly construct an image context space by designing a loss function in the image space to restrict the quality of the generated content. The reason why the extracted features Z in the thought and how to express the context of the lost area are the key points of the accuracy of the predicted content, the existing work lacks the mining and learning of the feature distribution, and the generated images are easy to have the problems of inconsistent semantics, obvious artificial effect and the like.

In summary, the method based on information diffusion and based on sample matching uses local prior information of the image for content filling, and is only suitable for scenes with small lost areas or searchable texture patterns, and the application range is limited. The image restoration method based on deep learning does not excavate the context relationship between the defect area and the image to be restored, and the accuracy of generated content is poor. Therefore, in order to implement image restoration of a large area loss, a new method is urgently needed to mine the context relationship between the defect area and the image to be restored, and implement effective expression of the characteristics of the defect area and accurate prediction of the content.

[ document 1]]D.Pathak,P.

J.Donahue,T.Darrell and A.A.Efros,Context Encoders:Feature Learning by Inpainting,The IEEE Conference on Computer Vision and Pattern Recognition,2016,pp.2536-2544.

[ document 2] S.Iizuka, E.Simo-Serra, and H.Ishikawa, Global and localization contact image completion, In ACM Transactions on Graphics,2017,36(4): pp.1-14.

[ document 3] Y.Li, S.Liu, J.Yang, and M. -H.Yang, general Face Completion, The IEEE Conference on Computer Vision and Pattern Recognition,2017, pp.3911-3919.

Disclosure of Invention

The invention provides an image restoration method based on context feature space constraint, aiming at the defect problem of processing a large-area image area by the existing image restoration technology.

The technical scheme adopted by the invention is an image restoration method based on context feature space constraint, which comprises a network model training part and an image restoration part,

the network model training part comprises the following substeps,

step 1.1, preparing a training data set, wherein the method comprises the steps of carrying out scale transformation and random cutting on images in the training data set into preset resolution, covering an image area in the center as a defect area, using the rest of the defect image as an image to be repaired, and using the original image content of the defect area as annotation data;

step 1.2, constructing a network model and setting training parameters, wherein the network model comprises a feature coding network of a defect area, a feature decoding network of the defect area and a feature coding network of an image to be repaired;

step 1.3, training a feature coding network and a feature decoding network of the defect area, inputting the image content of the defect area into the feature coding network to extract a feature expression vector, and then inputting the feature expression vector into the feature decoding network to reconstruct the image content of the defect area;

step 1.4, when the preset corresponding iteration stop condition is met, saving the feature coding network and the feature decoding network of the defect area, entering step 1.5, otherwise, returning to step 1.3 to continue the next iteration training;

step 1.5, training a feature coding network of the image to be repaired, wherein the method comprises the steps of adopting the trained feature coding network of the defective image to construct a context feature space and guiding the training of the feature coding network of the image to be repaired;

step 1.6, when the preset corresponding iteration stop condition is met, saving the feature coding network of the image to be restored, or returning to the step 1.5 to continue the next iteration training;

the image restoration part comprises the following sub-steps,

step 2.1, preparing image restoration data, including carrying out scale transformation on an original image of a restoration object to enable a defect area to meet the requirement of a restoration network, and cutting a corresponding image to be restored from the transformed original image by taking the defect area as a center;

2.2, extracting features by using a feature coding network of the image to be restored, wherein the feature extraction comprises the step of inputting the image to be restored into the feature coding network of the image to be restored;

step 2.3, decoding the network decoding characteristics by using the characteristics of the defect area, wherein the decoding method comprises the step of inputting the characteristic vector extracted in the step 2.2 into the characteristic decoding network of the defect area for decoding;

step 2.4, generating the image content of the defect area, and outputting the generated image content of the defect area according to the decoding result obtained in the step 2.3;

and 2.5, outputting the repaired image, wherein the generated image content of the defect area is filled into the image to be repaired, and the obtained result is the repaired image.

Furthermore, in step 1.3, when training the feature coding network and feature decoding network of the defect region, the parameters in the feature coding network and decoding network of the defect region are optimized in a way of minimizing the output layer error, which is defined as follows,

let the original defect region image content be x₁The reconstructed defect region image content is x₂D is a discrimination network for discriminating whether the inputted image is a real image, an error of the output layer

The definition is that,

wherein the content of the first and second substances,

representing the reconstruction loss between the reconstructed content and the labeled data, and being used for restricting the integral structure of the generated content;

representing a competitive loss of the reconstructed content for restricting the authenticity of the reconstructed content, wherein

A probability density representing the real data; lambda [ alpha ]_recAnd λ_aError weights representing reconstruction loss and contention loss, respectively.

In step 1.5, when training the feature coding network of the image to be restored, the parameters in the feature coding network of the image to be restored are optimized according to the method of minimizing the error of the output layer,

let z be the feature extracted from the defect image by using the trained feature coding network model of the defect image₁The characteristic of the image to be repaired extracted from the image to be repaired corresponding to the defective image by the characteristic coding network of the image to be repaired is z₂The optimal characteristic extracted by the characteristic coding network of the image to be restored from other images to be restored in the same batch of training samples is z_bError of output layer

The definition is that,

wherein the content of the first and second substances,

representing the characteristic difference between the defect image and the corresponding image to be repaired, belonging to the self-similarity loss in the image;

representing the difference in characteristics between the defective image and its corresponding image to be repaired,

the characteristic difference between the defect image and other images to be repaired in the training set is represented, and the characteristic difference belongs to the dissimilar loss between the images, namely lambda_ssAnd λ_mdError weights representing intra-image self-similarity loss and inter-image mutual-similarity loss, respectively.

Moreover, the optimal feature z is extracted from other images to be restored in the same batch of training samples_bTo realizeComprising the following sub-steps of,

step 1.5.1, extracting the features of all images to be restored in the same batch of training samples by using a feature coding network of the images to be restored, and recording the features as feature sets { z };

step 1.5.2, extracting characteristics from the defective area image content corresponding to the image to be repaired by using the characteristic coding network of the defective area, and recording the characteristics as characteristic z₁；

Step 1.5.3, mixing z₁Sequentially comparing the features with the features extracted from other images to be repaired in the feature set { z }, and taking the feature with the minimum difference to obtain the best feature

The invention is based on the following two considerations:

1) the best features to describe the content of the defective area should come from itself;

2) the features extracted from the image to be repaired should be the features closest to those extracted from the content of the defective area.

The invention provides a method for extracting features and reconstructing images by using a deep convolutional neural network, provides two loss functions of self-similarity loss in the images and non-similarity loss between the images at a high-level semantic feature level, excavates the context relationship between the images to be repaired and the corresponding defect regions, effectively extracts the features close to the corresponding defect regions from the images to be repaired, simultaneously avoids the feature interference from different images, improves the robustness of a model, can obtain better image repairing effect and has important market value. Experiments prove that the method can realize the restoration of textures and semantics of the content with the loss of more than 25% of the area in the image.

Drawings

FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention;

fig. 2 is a schematic diagram of a deep convolutional network structure according to an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

In order to solve the problem of repairing a large-area defect region in an image, the content of the whole image is understood and the content of the defect region is generated through learning of the image to be repaired, so that a better texture can be effectively generated, and the generated content has consistent semantics. The invention discloses an image restoration method based on context feature space constraint, which generates an accurate image restoration result by excavating the relationship between the content of a large-area defect area and an image to be restored in the context feature space. Firstly, a coder-decoder structure realized by adopting a deep convolutional neural network is adopted to carry out semantic feature learning and region reconstruction on a large-area defect region. Secondly, guiding the coding of the image to be repaired by utilizing the learned semantic features, so that the features of the defect area are as close as possible to the features of the image to be repaired. And finally, decoding the characteristics of the image to be repaired by using the trained decoder for the characteristics of the defective area to realize the generation of the defective area.

As shown in fig. 1, the flow of the embodiment of the present invention includes two parts, the network model training part is: preparing a training data set, setting training parameters of a network model, training a feature coding network and a feature decoding network of a defect area, and respectively storing the feature coding network and the feature decoding network of the defect area after a training error is converged. And guiding the feature coding network training of the image to be repaired by the trained feature coding network model of the defect area, and storing the feature coding network of the image to be repaired after the training error is converged.

The image restoration part comprises: inputting an image to be repaired, extracting features of a feature coding network of the image to be repaired to obtain a feature vector, decoding network decoding features of the defect area to obtain generated image content of the defect area, and fusing the generated image content of the defect area and the image to be repaired to obtain a final repaired image.

The network model training part comprises the following specific steps:

step 1.1: training data set preparation. And carrying out scale transformation and random cutting on the images in the training data set into 128 × 128 resolution, covering up an image area of 64 × 64 in the center as a defective area, and using the residual images as original image content of the defective area of the image to be repaired as annotation data. In specific implementation, the cutting size and the network model structure can be preset correspondingly by a user.

Step 1.2: and constructing a network model based on the deep convolutional neural network and setting training parameters. The method specifically comprises the steps of setting each layer structure in the network model, the sizes of convolution kernels, step lengths and edge filling sizes, the weight corresponding to a loss layer in the network model, the learning rate of each branch layer, a model optimization algorithm, batch sizes and an initialization method of network model parameters.

As shown in the following table, the design network model includes three network types: the system comprises a feature coding network of a defect area, a feature decoding network of the defect area and a feature coding network of an image to be repaired. In the last column of data items in the table, the "output size" is the data size of the layer network output, and is represented by a ternary array, the first number represents the number of feature maps, and the second three numbers represent the feature map size. In specific implementation, the training parameters may be preset by the user, or experience values obtained through experiments may be used.

The specific settings of each network type are as follows:

the feature coding network of the defect area comprises 6 layers, wherein the first layer is an input layer, the second layer to the fifth layer are hidden layers, the sixth layer is an output layer, and the structures of the layers are as follows:

a first layer: and the input layer inputs the label data of the training data set in the step 1.1, namely the image content of the original defect area.

A second layer: hidden layers, including a convolutional layer with a convolutional kernel number of 64, kernel size 4 x 4, step size 2, and edge fill 1, a batch normalization layer, and a ReLu activation function layer.

And a third layer: hidden layers, including a convolution layer with a convolution kernel number of 128, kernel size 4 x 4, step size 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

Fourth time: hidden layers, including a convolution layer with a convolution kernel number of 256, kernel size 4 x 4, step size 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

And a fifth layer: hidden layers including a convolutional layer with a convolutional kernel number of 512, kernel size 4 x 4, step 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

A sixth layer: and the output layer comprises a convolution layer with the convolution kernel number of 4000, the kernel size of 4 multiplied by 4, the step length of 1 and the edge filling of 0, a batch normalization layer and a Tanh activation function layer, and outputs the feature vector extracted from the defect area.

The feature decoding network of the defect area comprises 6 layers, wherein the first layer is an input layer, the second layer to the fifth layer are hidden layers, the sixth layer is an output layer, and the structures of the layers are as follows:

a first layer: and the input layer is used for inputting the feature vectors extracted by the feature coding network of the defect area.

A second layer: hidden layers, including a transposed convolutional layer with 512 convolutional kernel number, 4 × 4 kernel size, step 1 and edge fill 0, a batch normalization layer and a ReLu activation function layer.

And a third layer: hidden layers, including a transposed convolutional layer with 256 convolutional kernel number, 4 x 4 kernel size, step 2, and edge fill 1, a batch normalization layer, and a ReLu activation function layer.

Fourth time: hidden layers, including a transposed convolutional layer with 128 convolutional kernel number, kernel size 4 x 4, step size 2, and edge fill 1, a batch normalization layer, and a ReLu activation function layer.

And a fifth layer: hidden layers, including a transposed convolutional layer with 64 convolutional kernel number, kernel size 4 x 4, step size 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

A sixth layer: and the output layer comprises a transposed convolution layer with convolution kernel number of 3, kernel size of 4 multiplied by 4, step size of 2 and edge filling of 1 and a Tanh activation function layer, and outputs the generated defect area image content.

The feature coding network of the image to be restored comprises 7 layers, wherein the first layer is an input layer, the second layer to the sixth layer are hidden layers, the seventh layer is an output layer, and the structures of the layers are as follows:

a first layer: and inputting the image to be restored and the marking data of the training data set in the step 1.1 by the input layer.

And a third layer: hidden layers, including a convolutional layer with a convolutional kernel number of 64, kernel size 4 x 4, step size 2, and edge fill 1, a batch normalization layer, and a ReLu activation function layer.

Fourth time: hidden layers, including a convolution layer with a convolution kernel number of 128, kernel size 4 x 4, step size 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

And a fifth layer: hidden layers, including a convolution layer with a convolution kernel number of 256, kernel size 4 x 4, step size 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

A sixth layer: hidden layers including a convolutional layer with a convolutional kernel number of 512, kernel size 4 x 4, step 2 and edge fill 1, a batch normalization layer and a ReLu activation function layer.

A seventh layer: and the output layer comprises a convolution layer with the convolution kernel number of 4000, the kernel size of 4 multiplied by 4, the step length of 1 and the edge filling of 0, a batch normalization layer and a Tanh activation function layer, and outputs the feature vector extracted from the defect area.

Preferably, all weight terms in the three network models are initialized by using a Gaussian distribution algorithm, an optimization algorithm RMSprop algorithm is used in the training process, the learning rate of each branch is set to be 0.001, and the batch size is set to be 64.

Step 1.3: for training defect areasA feature encoding network and a feature decoding network. Inputting the image content of the defect area into a feature coding network to extract feature expression vectors, and inputting the feature expression vectors into a decoding network to reconstruct the image content of the defect area. Suppose the original defect area image content is x₁The reconstructed defect region image content is x₂D is a discrimination network for discriminating whether the inputted image is a real image, an error of the output layer

Is defined as:

wherein

The invention provides that the image content of the defect area is taken out from the training data set according to batches, the parameters in the characteristic coding network and the decoding network of the defect area are continuously optimized according to the method of minimizing the error of the output layer until all data in the training data set are utilized, namely, the training is finished once, and the step 1.4 is carried out for iterative judgment.

Preferably, loss and competition are reconstructedLost error weight λ_recAnd λ_aSet to 0.99 and 0.01, respectively.

Step 1.4: and when the preset corresponding iteration stopping condition is met, saving the defect area network model, including the feature coding network and the feature decoding network of the defect area, entering the step 1.5, and otherwise, returning to the step 1.3 to continue the next iteration training. And (2) training the feature coding network and the feature decoding network of the defect area constructed in the step (1.2) by using the training data set constructed in the step (1.1), wherein the condition for stopping the training process can be two, one is that the value of the error of the output layer is smaller than a set threshold value, the other is that iterative training reaches a certain number of times, and the trained feature coding network model and the trained feature decoding network model are respectively stored, the feature coding network model guides the training of the feature coding network of the image to be repaired, and the feature decoding network model is used for reconstructing the features in the image repairing stage to generate the image of the defect area.

Step 1.5: and training a feature coding network of the image to be repaired. Based on the following two considerations: 1) the best features to describe the content of the defective area should come from itself; 2) the features extracted from the image to be repaired are the features closest to the features extracted from the defective area content, so the invention adopts the trained feature coding network of the defective image to construct the context feature space and guides the training of the feature coding network of the image to be repaired. The feature extracted from the defect image by using the trained feature coding network model of the defect image is assumed to be z₁The characteristic of the image to be repaired extracted from the image to be repaired corresponding to the defective image by the characteristic coding network of the image to be repaired is z₂The optimal characteristic extracted by the characteristic coding network of the image to be restored from other images to be restored in the same batch of training samples is z_bError of output layer

Is defined as:

wherein

in the formula, the former represents the characteristic difference between a defect image and a corresponding image to be repaired, and the latter represents the characteristic difference between the defect image and other images to be repaired in a training set, and the loss is different from each other among the images. Lambda [ alpha ]_ssAnd λ_mdError weights representing intra-image self-similarity loss and inter-image mutual-similarity loss, respectively.

The invention provides that the image content of the image to be restored is taken out from the training data set according to batches, the parameters in the feature coding network of the image to be restored are continuously optimized according to the method of minimizing the error of the output layer until all data in the training data set are utilized, namely, the training is finished once, and then the step 1.6 is carried out for iterative judgment.

Preferably, the error weight λ of the intra-image self-similarity loss and the inter-image mutual-similarity loss is_ssAnd λ_mdSet to 0.99 and 0.01, respectively.

Preferably, the optimal feature z is extracted from other images to be restored in the same batch of training samples_bThe specific implementation of the method comprises the following substeps:

step 1.5.1: and extracting the features of all the images to be restored in the same batch of training samples by using the feature coding network of the images to be restored, and recording the features as feature sets { z }.

Step 1.5.2: extracting characteristics from the defective area image content corresponding to the image to be repaired by using the characteristic coding network of the defective area, and recording the characteristics as characteristics z₁。

Step 1.5.3: will z₁Sequentially comparing the features with the features extracted from other images to be repaired in the feature set { z }, and taking the feature with the minimum difference, namely the best feature

Step 1.6: and when the preset corresponding iteration stopping condition is met, storing the network model of the image to be repaired, namely the feature coding network of the image to be repaired, ending the work of the training part of the network model, and otherwise, returning to the step 1.5 to continue the next iteration training. And (2) training the feature coding network of the image to be restored constructed in the step (1.2) by using the training data set constructed in the step (1.1), wherein the conditions for stopping the training process can be two, one is that the value of the error of the output layer is smaller than a set threshold value, and the other is that the feature coding network model of the trained image to be restored is stored when the iterative training reaches a certain number of times, and is used for extracting the features of the image to be restored in the image restoration stage.

The image restoration part comprises the following sub-steps:

step 2.1: preparing image repairing data, including carrying out scale transformation on a tested image (an original image of a repairing object) to enable a defect area to meet the size of 64 x 64 required by a repairing network, namely, the size of the defect area corresponds to the size of a network model training part; and with the defect area as the center, cutting the transformed image into a 128 x 128 resolution image as an image to be repaired, namely the image to be repaired corresponds to the size of the image to be repaired in the network model training part.

Step 2.2: and extracting the features by using the feature coding network of the image to be restored, wherein the feature extraction comprises the step of inputting the image to be restored into the feature coding network of the image to be restored.

Step 2.3: and (3) decoding the network decoding characteristics by using the characteristics of the defect area, wherein the step 2.2 of extracting the characteristic vector is input into the characteristic decoding network of the defect area for decoding.

Step 2.4: and generating the image content of the defective area, wherein the image content of the defective area is decoded in the step 2.3 and then output according to the decoding result.

Step 2.5: and outputting the repaired image, wherein the generated image content of the defect area is filled into the image to be repaired, and the obtained result is the repaired image.

In specific implementation, a person skilled in the art can implement automatic operation of the above processes by using software technology.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clearness of understanding and no unnecessary limitations are to be understood therefrom, for those skilled in the art may make modifications and alterations within the scope of the invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image restoration method based on context feature space constraint is characterized in that: the content of the whole image is understood through the learning of the image to be repaired, the content of the defect area is generated, the generated content has consistent semantics, the network model training part and the image repairing part are included,

the network model training part comprises the following substeps,

the image restoration part comprises the following sub-steps,

2. The image inpainting method based on the context-feature-space constraint according to claim 1, characterized in that: in step 1.3, when training the feature coding network and the feature decoding network of the defect area, the parameters in the feature coding network and the decoding network of the defect area are optimized in a mode of minimizing the error of an output layer, the error of the output layer is defined as follows,

The definition is that,

wherein the content of the first and second substances,

3. The image inpainting method based on the context-feature-space constraint according to claim 1, characterized in that: step 1.5, when training the feature coding network of the image to be restored, optimizing the parameters in the feature coding network of the image to be restored according to the method of minimizing the error of the output layer,

let z be the feature extracted from the defect image by using the trained feature coding network model of the defect image₁The characteristic of the image to be repaired extracted from the image to be repaired corresponding to the defective image by the characteristic coding network of the image to be repaired is z₂The feature coding network of the image to be restored is used for other images to be restored from the same batch of training samplesThe best feature extracted on the restored image is z_bError of output layer

The definition is that,

wherein the content of the first and second substances,

4. The image inpainting method based on the context-feature-space constraint of claim 3, wherein: extracting optimal characteristic z from other images to be repaired in the same batch of training samples_bThe implementation of (a) comprises the following sub-steps,

step 1.5.2, feature coding net using defect areaExtracting the characteristics from the defect area image content corresponding to the image to be repaired, and recording as the characteristics z₁；